Server raid failure
What a day. Server at work suffered no power outage, rendering 2 drives in a 4 drive raid 6 failed. Have put in 2 new drives, rebuilt one, but even with 2 the server was working.
At this point, we have three drives working - I had to get home, and the tech didn't feel confident setting the second one off... So I got him to switch it all off.
So tomorrow, do I get the last drive rebuilt, then run a backup, or do I do a backup before rebuilding the last drive?
Also - how do you back up domain controllers? At my main school, we have 2 dcs, but don't back them up, but in this school, we only have one server doing it all, the dc is the hypervhost.
Do I create a vm dc, and then veeam that? So we can rebuild a hypervhost, and launch the dc vm to create a physical one?
Worryingly, the backup drive, which I used veeam free to back up vm's, is buggered. It was a buffalo external 1 tb drive, the unit won't turn on, and when I took the HDD out, wasn't initialised! How safe should I feel with 3 working drives??
Two of four drives failed due to a 'power event'
It seems prudent to me to consider that the two surviving drives may have sustained damage and now at increased likelihood of failing at anytime.
You only have three drives online. If the those two drives were to fail you will be down until you have completed a full bare metal restore.
Which takes longer? The backup or the rebuild? If the backup can complete on a busy server during the day in less than a couple of hours, do the backup first. Otherwise assuming building a disk takes less than a working day get your raid back to having four disks first than run a backup.
The worst that could happen is the two drives fail before the build/backup has completed and you can only recover from yesterday's backup. But if that happens, it would have happened what every path you chose.
Keep your management in the loop, your infrastructure is hanging together by a thread they need to be aware that despite your best efforts you are one unlucky break from being down for a couple of days. Don't be afraid to be seen to ask for help either.
Hmm missed the bit about your Veeam backup disk being dead. If your backup disk is dead, keep the infrastructure down until you've got a new one. Then bring it up and do what ever takes the least amount of time first.
Originally Posted by psydii
If that is the rebuild, you might want to downside taking a copy of critical files while that is going on (MIS data, Finance Data, HR data, student course work)
Worst case looks to me like a four figure data recovery bill if two of those disks fail before you've got either the two new ones built, or a backup completed.
Hi, thanks for the reply.
I have kept the head informed, it's a first school we support, he is aware that we could loose everything.
Veeam backup of storage server and mail server take about 2 hrs, to get the 4 th disk back about 2 and a half. The network is not currently up. I'm thinking do the final rebuild, then we are back to optimal running?
If a disk died now, would we go back to having just the two working disks?
Assuming the third disk has completed rebuilding into the array. But given the recent history of the system if one of the disks fails, then there is nothing but luck stopping the the second one going.
Originally Posted by mattianuk
How big are the disks?
How big are the VMs?
What is your connectivity?
How many client machines during the day?
It completed, I did it through the controller bios, so it said it was fine.
Disks are 500gb
Vm's - largest is the storage at 140gb. Exchange is next at 80.
The OS is on a different array, it's just the data array with issues, it def has 3 working drives out of 4 currently.
There are around 20 clients, unless laptop trolleys are used. But at the moment, I left the server off over night, and have told staff not to use the network tomorrow, being a small first school that is not the end of the world.
Backups would be to local USB HDD drive.
Also, luckily, sims and cms hosted at county, also, disks are less then a year old, infact, only used from September, so when doing rebuild, here should be no reason for them to die. I know at doesn't mean they won't!!
USB 2 or 3?
Originally Posted by mattianuk
Storage vm's backup with veeam takes 50mins, tried it earlier, but hung on 99%. Exchange takes 50mb.
I take it as the OS, which is also the dc, is on a different array, the OS should be fine?
Ok. I was just double checking your figures with a back of the envelope calculation. Right, With such a small set of data I would lean towards getting a backup taken onto a reliable disk as your next step. Then add the disk.
Originally Posted by mattianuk
It might be worth opening a ticket with Veeam about the hung backup ASAP in case the problem happens again during the backup. Open it by saying (assuming I've got this right) "my disk holding the Veeam backups have failed, my hyper v host raid is degraded and running with two suspect disks, I'm trying to take a back up to a new destination disk it hung at 99%, help!"
Keep the site down until you've got your backup and built the disk. Those two disks currently in the server that were part of the original 4 disk raid are the ones I would be worried about. Expect them to fail soon. Also the older the 'new' disks are the greater the chance they have been damaged by accident while in storage... So I without being sure of their provenance would be quite nervous until I have a backup onto a known good disk.
Cheers, sounds like good advice.
Only thing I'm confused on is the disks in the server, and risk of failure. The original 4 were brand new in September. They are not old. I brought 2 brand new ones today. Can you explain the risk of failure in the other two, when my belief is the damage to the current two is a power failure this morning, perhaps corrupting data, rather then damaged or old disks?
An update: I checked yesterday's backup I tried that hung on 99% and it had all the files, so then I went straight ahead and rebuilt the array. All working better now... Veeam still hanging - but I have ordered a new NAS for Monday for backups and will sort something then, for now we have backups of all stuff, including emails and the array back and working.
Also plugged the server into the battery backup slot on the UPS and not the surge only side. Really should have checked that when we installed the server... But assumed it was already in there!!! Never again!
Time to breathe again!
For DCs - a second DC is good, but not foolproof. Get a bare-metal backup solution in place for it too - Windows Server Backup can do this to a USB disk if you want to do it cheaply. I'm planning on having an iSCSI target on a decent Synology NAS for all of my server backups in a remote location for DR.
Yeah, going to look to do a windows backup, but if it's a hypervhost (one of our schools are part of a failover cluster) will it try and backup the VMs as well? Or can you tell it not too?
Don't run Hyper-V myself, but Windows Server Backup has a number of selectable options for the backup job. You could just exclude the VM storage location too.