I am at wits end with a server that has baffled my efforts to diagnose the following symptoms...
This all started about 2 months ago or so.. Nothing changed at the time so I can't blame a config change...
OS is Server 2003 R2 32bit (SP2) on Dell Power Edge 2850
Mirrored OS drives (2x73GB 15K) and 4 RAID5 Disks (300gb 10K RPM) all on PERC4e.
6GB RAM (using BIOS SPARE BANK ENABLED)
When backing up the server using Backup Exec 10D the server would just stop responding and eventually restart. Removed agent and reisntalled (both manually and push) and no change. If I attemtp to copy large files from other network locations it stops responding. When attempting to install the current patch (malicious software patch) it stops. It would seem to be OS issue but coupled with these issues we have had several hardware related anomolies.
Two failed drives (that were fixed using REBUILD in Dell Open Manage)
A PROC_INIT error failure that was fixed by a reboot.
A recent blue screen RAM PARITY CHECK / MEMORY PARITY ERROR. These all seem like RANDOM hardware failures aside from the disk failure which happended twice and was the same disk each time.
And the server kept complaining about a PCIe Riser problem (EB113) that Dell fixed with a reseat of the Riser.
Things we have done thus far:
Updated BIOS to current
Updated Disk FW to current
Updated PERC FW to current
RAM memory test
Ran all Diagnostics using DELL OPENMANAGE DIAGS.
Performed Repair on Windows..
The server boots fine and runs great offering up SOL 2005 and a web app that usses JBOSS serving up digitized newspapers that are fairly large (upwards of 25 -30 BM's). Its only when you try to stress the disks that things go awry. After the Windows Repair we try and install service pack 2 and not only does it cause the server to become unresponsive, it attempts to extract its files on a drive other than the %system% drive...
Where's Dr. House when you need him.. Any ideas or where to start...