Hardware Thread, RAID 5 parity problem in Technical; Hi, I was hoping for some advice & clarification on the RAID 5 arrays in our two servers.
One server ...
6th January 2010, 02:29 PM #1
- Rep Power
RAID 5 parity problem
Hi, I was hoping for some advice & clarification on the RAID 5 arrays in our two servers.
One server had a failed drive in it's RAID 5 array (3 x SCSI U320 drives), it ran OK on remaining two drives and I managed to source a replacement drive to rebuild the array without a hitch, parity verify now completes with no errors (the network manager had knowingly left this failed drive in the array for over two years and never done anything about it ). I am hoping this server is sorted now.
The other server doesn't have a clear-cut drive issue - just before Xmas I ran a parity check on the RAID 5 array and it failed after 5 minutes.
I checked the status of the drives in Intel Storage Console, one of the drives has 6 Grown errors, I fear there may well be an issue with this disk.... I have a spare hard disk for this server but am I right in thinking that if I replace the suspected bad disk, the array will not rebuild properly because of the parity check failure?
Last edited by nielpeel; 6th January 2010 at 03:00 PM.
6th January 2010, 03:15 PM #2
Do you have an option anywhere to rebuild the parity information? (this may only be avalible through the offline options via the bios type screen depending on the controller.
Failing that you could try setting up the spare drive as a hot spare and seeing if it would fail over nicely and rebuild (unlikely).
The safest bet would probably be a full backup of all the data (which you should do anyway) then swap out the drives. If it fails you would need to do a restore and possibly recreate the RAID volume to recreate accurate parity information.
6th January 2010, 03:40 PM #3
- Rep Power
I've been unable to find an option to repair the parity data of the array. It's an Intel SRCZCR controller. Maybe I should check in the Intel RAID BIOS, which I can do tomorrow if I can restart the server.
Unfortunately the Intel documentation is not very good on RAID parity checking and recovery...
I'd like to try and replace the suspected bad drive but I'm sure it will fail to rebuild - any ideas what would happen at that point (apart from total disaster, heh-heh). I could add the spare drive as a hot drive as you suggest and try and force a failover but it's just the worry of not knowing what will happen that stops me from doing this at the moment.
Last resort is your final option, I have good tape backups, and I would be confident in performing this task, I would just have to persuade the rest of the team of this!!
If not, then I guess we just leave the server alone until it does actually fail, or it gets replaced! However I am determined to have both servers as failure-proof as possible, so I will try my best.
Last edited by nielpeel; 6th January 2010 at 03:42 PM.
6th January 2010, 06:15 PM #4
- Rep Power
I normally test my tape backups by restoring a few random folders of data to a single folder on the server - is there a better way to test the integrity of the backups?
NTbackup is used here, verify is on.
6th January 2010, 08:24 PM #5
I'd copy it to a .wim on a network share using imagex (WinPE boot disk / usb or netbooting from WDS) as an extra backup, replace the dodgy drive, wipe the array and rebuild it with the known good disks, then restore the .wim file. It'll be an evening or weekend job though.
Then again, I have a virtualisaton setup that I can also restore the ,wim to if it turns out N+1 disks are actually not that hot.
6th January 2010, 09:26 PM #6
Either way I would be checking to make sure that a full backup was viable and present before doing anything. Given the controller I think that you are right in assuming that a drive swap would fail outright and the forced fail over would only work if the controller supports this and rebuilds the drives parity during the operation.
I think that the only guarenteed way foward would be to do a full wipe and rebuild with the known good drive. You could use WIMs which would probably be quicker but be cautious as this way would loose any non-default NTFS file ACLS.
Good luck with your hunting in the BIOS section, as a thought those intel ones sometimes come with a utility CD that can offer more options regarding the RAID subsystem but I would be careful as from personal experience not all the features of such cds are trustworthy or safe and are just as likely to eat your data as save it.
9th January 2010, 06:42 PM #7
- Rep Power
Update - I added the extra drive as a hot spare and ran a surface check from the intel storage console, but the drive failed this test spectacularly!! (that's eBay for ya).
So it looks like I'll be going down the route of tearing down the array and recreating it, then restoring the server from backup.
If we have some cash in the budget, it might be the right time to upgrade the disks to 146gb each, so we can give pupils more than 100 mb home folders!
By fox1977 in forum Windows
Last Post: 2nd January 2009, 11:32 PM
By Scruff in forum Hardware
Last Post: 2nd October 2008, 12:37 PM
By FN-GM in forum Hardware
Last Post: 12th May 2008, 06:53 PM
By TronXP in forum Hardware
Last Post: 19th March 2008, 12:36 PM
Last Post: 3rd September 2007, 03:22 PM
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)