So I followed them instructions to clear the ECC - their wrong, it just cleared the bios settings. Apparently the log within insight diagnostics is what it was complaining about. Tested at first with the original HP RAM - ECC 100% pass, put the new RAM in 100% pass! No ECC Errors! I ran it a couple of times and each time results were fine. Did a complete test - everything passed. Looked in the log where the previous ECC error was to find "Malformed NVRAM detected. Device: HP Smart Array Controller. Slot 0 Property name: World Wide ID" So I've gone from a RAM problem to a array controller RAM problem?!
I've searched for that error and theirs a couple of sites mentioning clearing the NVRAM - which is apparently the same procedure as I mentioned before by switching 6 to on and removing the mother board battery. This didn't work - I tried it a couple of times and still the same error message.
Also read this could be another seating issue so tried re-seating it.
The server also decided to reset during the day today! which was great!
if its saying the Smart Array Controller is that not pointing to the Raid Controller memory?
Well I'm assuming so, this error wasn't there till I resolved the RAM ECC problem.
In the array diagnostics theirs no errors, in insight no errors, just that "Malformed NVRAM detected. Device: HP Smart Array Controller. Slot 0 Property name: World Wide ID" in the log, and no error lights are lighting up.
I'm not 100% sure what tools are available to diagnose test further. Maybe worth a call to HP support.
No warranty or support package left on this server, so I'm assuming HP won't help unless I cough up some cash?
Yes, you may end up needing to replace the RAID controller if that is infact the fault. How old is it, some servers can get really out of wack after 6 or so years and end up with rather complicated and difficult to track errors. Had some really old DL380 servers that had dodgey drive backplains and that was not fun to diagnose or repair. That was nto in a school though, rather a seporate business that was using it for testing.
Interesting, we have (or did have) exactly the same problem with the exact same model of server, never really 100% got to the bottom of it though it does seem to have stopped after all the WD HDs that shipped with the server (and replaced over and over again with other WD disks after many issues) were replaced with Seagate ones.
Hasn't happened in quote a while now.
So apparently the server isn't as old as I thought, haven't been working here too long, turns out the server had only been installed in 2009 so we do still have warranty on it (Of course all the really important documents that came with the server were misplaced.)
So have spent over three hours today talking to HP support...repeating everything on here over and over. The best suggestion has to be upgrading the firmware to a different smart array firmware for a different model, great suggestion hp! he suggested something that I haven't tried which was a "Power Drain" in which the server is unplugged and the power key is pressed for 20seconds, this sounds like a bit of rubbish to me but worth a try. Will try it tomorrow evening.
Hopefully they will sort it, I've found a few people mentioning on the net ml350 with the same issues.
I'll look tomorrow at the make of the drives not sure off the top of my head.
Update if anyone is interested!
After trying the power drain on Friday, was called this morning to hear that both hardware failure lights on red again, so the site manager restarted the server. Spoke to HP who have told me they think it is either a UPS problem or the power supply and if its neither of them they will replace the mainboard which will also replace the Smart Array.
Kinda getting the sense they don't know what the problem is either!
Thanks for all the help everyone.
So HP finally gave in and let an engineer come out. Apparently the NVRAM problem is nothing, apparently the power supply has had issues - I sent them the part number and revision and was told that it has problems. So they replaced the power supply and back plane. So far so good, no restarts, they have told me if there is any further issues they will replace the motherboard.
HP support can be good! you just sort of have to play them abit! Obviously with the more complex stuff they tend to hang around on it but otherwise they tend to be pretty top notch when it comes to replacing failed/faulty parts. Only time i had trouble was with the core switch - but the replacement parts for it where around 4k!
Originally Posted by beany1
I have the same problem, my server every few weeks shuts down, after reading the error report, i get a blue screen erorr. The problem is we have only just had this server installed about a year ago. Im really worried that this may be aserious problem.
What steps should i take to try and resolve this isse?
Prolly should start a new thread for that aaqib.
That's a different problem from me as I never had any blue screen errors.
Start a new thread and give details of the blue screen errors etc.