Hardware Thread, Raid fails after drive swap in Technical; ProLiant ML150 G2 – Windows 2003 standard
Raid Controller adaptec 2610sa with 4 160gb drives in raid 5 config
23rd September 2008, 04:10 AM #1
Raid fails after drive swap
ProLiant ML150 G2 – Windows 2003 standard
Raid Controller adaptec 2610sa with 4 160gb drives in raid 5 config
HP Storage Manager Agent reported a failed drive. I checked the server and saw a constant amber light on the drive. (the first one in the array at the bottom). Then the server shut down. I pulled the drive and replaced it with a 250gb one (new). Restarted the server. During the start up no raid was found, it then said that the bios was not installed and went into a network boot.
I shut down the server, put the old drive back in (I had put the suspect drive into another HP server and it seemed to work fine)
I again restarted the server. This time the raid was found but it came up on the screen “drive is missing or degraded”. The restart got and far as “windows is starting “. It then came up with an error
“Lsass.exe – system error security accounts manager initialization failed because of the following: directory services cannot start. Error status 0x00002e1. Please click OK to shut down this system and reboot into directory services restore mode, check the event log for more detailed information”
Another normal restart – the above error still came up but there was another window in the background saying “active directory is rebuilding indices” but seemed to stay on this
Another restart but this time into directory services restore mode. The server is now running in this mode. The light on the suspect drive is now a constant green, the other drives are flashing green.
The HP storage manager shows the RAID controller and it says all drives in the array are ‘optimal’. But is says that the logical drive is degraded.
While in this restore mode I thought I would hot swap the suspect drive and allow the raid to rebuild. As soon as I pulled the first drive in the array the server went into a blue screen and shut down. Again put the suspect drive back in and again the server is running in directory services mode.
Why can’t I just swap the first drive in the array? Is active directory broken?
Any help would be great?
IDG Tech News
23rd September 2008, 10:20 AM #2
Eeek, I suspect that the initial drive failed to write or read some data properly. When you changed over the drive while it was off it got confused when trying to read the volume information off the disks and reset the error counts on the system. When you put the old drive back in it was able to properly detect the array again and initialized it but had lost the error count information and marked the bad drive as good. As the drive was marked as good and the data was not consistent the controller attempted a rebuild of the affected areas unfortunately using the busted drive as the source drive.
Best procedure in that failed drive situation is to replace it while the system is still on if it is hot swap capable.
You may be able to recover the system so long as the data corruption has not spread to far but I would be looking at backups for AD or better yet another domain controller if one is available.
You may need to let it rebuild with the failed drive and hope that it does not cook all of the data, you should be able to boot it from just the good disks without the replacement drive in when you boot and have it boot then add in the replacement drive and rebuild the disk set.
Last edited by SYNACK; 23rd September 2008 at 10:50 AM.
23rd September 2008, 10:34 AM #3
Another good thing to have done, if your going with a RAID 5 setup for an OS volume was to have a hot spare, so that if the controller detects a failed drive it can rebuild from the spare.
That's if reseating the troubled drive while the system is on didn't rectify the original problem. A lot of the time that's all that is required.
23rd September 2008, 10:46 AM #4
Agree with Toledo - hot spares are well worth having.
One thing that could have happened - if you did not have Background Consistency Check enabled on the controller the Array parity could have slowly become slightly corrupt over time and now that a drive has actually failed is is too currupt to re-build the array properly.
23rd September 2008, 09:10 PM #5
Update - I now have a little less hair.
Promoted the backup domain controller and demoted the other. The server with the active directory error has had AD removed and restored.
The old PDC still has the RAID error. Management software still says the logical drive is degraded. I have tried replacing the suspect drive with a new one. As soon as I pull it the suspect disk the server crashes. With the new drive in place the server will not boot, saying no RAID installed.
Put the old suspect drive back in and the server start normally. The suspect drive has a constant green light on. The other drives have flashing green lights. I put a additional new drive in (so that now makes 5 disks) and created a hot spare out of it. I rebooted the server which started normally.
I was hoping the RAID with now use the hot spare, it doesn’t.
I think that the constant green light on the suspect drive means the disk is online but inactive. I could initialize the drive but will that delete the data in the whole array or just that disk?
The suspect drive is the first in the array. Does the controller write boot info to this drive? That’s the only reason I can think off why I cant hotswap.
Can I add another disk and make that part of the array? Will this delete data in the whole array? It will be 5 disks in the array including the suspect disk. When the array has finished rebuilding could this latest disk be removed and placed in the suspect drive slot?
I am new to RAID so go easy
23rd September 2008, 11:39 PM #6
Originally Posted by ozydave
The constant green light usually indicated that that is the drive that it fully online and the blinking indicates activity/rebuilding.
The array data for the volume set is written to the beginning of each drive in the set or at least it should be but I have never had much luck with the adaptec cards in comparison to other brands.
You can add the new disk to the array if the controller supports expansion but probably not while it is still rebuilding. This will also leave you with a 6 disk set and still a possible failed drive that would need to be replaced.
Your only options appear to be to let it finish rebuilding then hot swap the drive out or attempt to boot with only the good disks in (but not a good plan to turn it off while it is rebuilding)
When you finally get this sorted I would recommend upgrading the RAID controllers firmware to the latest version to hopefully make it a little more reliable and as the others have said grab another disk as a hot spare.
As the drives are SATA you should be able to remove the suspect drive after the rebuild is complete and install it in a standard computer to run a manufacturer diagnostic on it with something like seatools (segate) to see if it is the drive or controller to blame.
Googleing the controller does not bring up many nice comments about its abilities under load, you may want to look at replacing it outright with a higher model adaptec controller if the server is under much load. If you stick with adaptec there is a good chance that the RAID disk set will be transportable and it will simply be a case of plugging the drives into the new controller.
By RabbieBurns in forum General Chat
Last Post: 11th August 2008, 09:17 AM
By plexer in forum Hardware
Last Post: 14th February 2008, 12:57 PM
By contink in forum Hardware
Last Post: 25th January 2008, 08:12 AM
By dimsum in forum Hardware
Last Post: 7th December 2007, 03:36 PM
By tosca925 in forum Windows
Last Post: 28th September 2006, 09:22 PM
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)