+ Post New Thread
Page 1 of 2 12 LastLast
Results 1 to 15 of 18
Hardware Thread, Server raid failure in Technical; What a day. Server at work suffered no power outage, rendering 2 drives in a 4 drive raid 6 failed. ...
  1. #1

    Join Date
    Apr 2012
    Posts
    413
    Thank Post
    39
    Thanked 35 Times in 28 Posts
    Rep Power
    15

    Server raid failure

    What a day. Server at work suffered no power outage, rendering 2 drives in a 4 drive raid 6 failed. Have put in 2 new drives, rebuilt one, but even with 2 the server was working.

    At this point, we have three drives working - I had to get home, and the tech didn't feel confident setting the second one off... So I got him to switch it all off.

    So tomorrow, do I get the last drive rebuilt, then run a backup, or do I do a backup before rebuilding the last drive?

    Also - how do you back up domain controllers? At my main school, we have 2 dcs, but don't back them up, but in this school, we only have one server doing it all, the dc is the hypervhost.

    Do I create a vm dc, and then veeam that? So we can rebuild a hypervhost, and launch the dc vm to create a physical one?

    Worryingly, the backup drive, which I used veeam free to back up vm's, is buggered. It was a buffalo external 1 tb drive, the unit won't turn on, and when I took the HDD out, wasn't initialised! How safe should I feel with 3 working drives??

  2. #2

    Join Date
    Jul 2006
    Location
    London
    Posts
    1,241
    Thank Post
    110
    Thanked 242 Times in 193 Posts
    Blog Entries
    1
    Rep Power
    74
    Two of four drives failed due to a 'power event'

    It seems prudent to me to consider that the two surviving drives may have sustained damage and now at increased likelihood of failing at anytime.

    You only have three drives online. If the those two drives were to fail you will be down until you have completed a full bare metal restore.

    Which takes longer? The backup or the rebuild? If the backup can complete on a busy server during the day in less than a couple of hours, do the backup first. Otherwise assuming building a disk takes less than a working day get your raid back to having four disks first than run a backup.

    The worst that could happen is the two drives fail before the build/backup has completed and you can only recover from yesterday's backup. But if that happens, it would have happened what every path you chose.

    Keep your management in the loop, your infrastructure is hanging together by a thread they need to be aware that despite your best efforts you are one unlucky break from being down for a couple of days. Don't be afraid to be seen to ask for help either.

  3. #3

    Join Date
    Jul 2006
    Location
    London
    Posts
    1,241
    Thank Post
    110
    Thanked 242 Times in 193 Posts
    Blog Entries
    1
    Rep Power
    74
    Quote Originally Posted by psydii View Post
    Two of four drives failed due to a 'power event'

    It seems prudent to me to consider that the two surviving drives may have sustained damage and now at increased likelihood of failing at anytime.

    You only have three drives online. If the those two drives were to fail you will be down until you have completed a full bare metal restore.

    Which takes longer? The backup or the rebuild? If the backup can complete on a busy server during the day in less than a couple of hours, do the backup first. Otherwise assuming building a disk takes less than a working day get your raid back to having four disks first than run a backup.

    The worst that could happen is the two drives fail before the build/backup has completed and you can only recover from yesterday's backup. But if that happens, it would have happened what every path you chose.

    Keep your management in the loop, your infrastructure is hanging together by a thread they need to be aware that despite your best efforts you are one unlucky break from being down for a couple of days. Don't be afraid to be seen to ask for help either.
    Hmm missed the bit about your Veeam backup disk being dead. If your backup disk is dead, keep the infrastructure down until you've got a new one. Then bring it up and do what ever takes the least amount of time first.

    If that is the rebuild, you might want to downside taking a copy of critical files while that is going on (MIS data, Finance Data, HR data, student course work)

    Worst case looks to me like a four figure data recovery bill if two of those disks fail before you've got either the two new ones built, or a backup completed.
    Last edited by psydii; 9th January 2014 at 08:11 PM.

  4. #4

    Join Date
    Apr 2012
    Posts
    413
    Thank Post
    39
    Thanked 35 Times in 28 Posts
    Rep Power
    15
    Hi, thanks for the reply.

    I have kept the head informed, it's a first school we support, he is aware that we could loose everything.

    Veeam backup of storage server and mail server take about 2 hrs, to get the 4 th disk back about 2 and a half. The network is not currently up. I'm thinking do the final rebuild, then we are back to optimal running?

    If a disk died now, would we go back to having just the two working disks?

  5. #5

    Join Date
    Jul 2006
    Location
    London
    Posts
    1,241
    Thank Post
    110
    Thanked 242 Times in 193 Posts
    Blog Entries
    1
    Rep Power
    74
    Quote Originally Posted by mattianuk View Post
    Hi, thanks for the reply.

    I have kept the head informed, it's a first school we support, he is aware that we could loose everything.

    Veeam backup of storage server and mail server take about 2 hrs, to get the 4 th disk back about 2 and a half. The network is not currently up. I'm thinking do the final rebuild, then we are back to optimal running?

    If a disk died now, would we go back to having just the two working disks?
    Assuming the third disk has completed rebuilding into the array. But given the recent history of the system if one of the disks fails, then there is nothing but luck stopping the the second one going.

    How big are the disks?
    How big are the VMs?

    What is your connectivity?

    How many client machines during the day?

  6. #6

    Join Date
    Apr 2012
    Posts
    413
    Thank Post
    39
    Thanked 35 Times in 28 Posts
    Rep Power
    15
    It completed, I did it through the controller bios, so it said it was fine.
    Disks are 500gb
    Vm's - largest is the storage at 140gb. Exchange is next at 80.

    The OS is on a different array, it's just the data array with issues, it def has 3 working drives out of 4 currently.

    There are around 20 clients, unless laptop trolleys are used. But at the moment, I left the server off over night, and have told staff not to use the network tomorrow, being a small first school that is not the end of the world.

    Backups would be to local USB HDD drive.

  7. #7

    Join Date
    Apr 2012
    Posts
    413
    Thank Post
    39
    Thanked 35 Times in 28 Posts
    Rep Power
    15
    Also, luckily, sims and cms hosted at county, also, disks are less then a year old, infact, only used from September, so when doing rebuild, here should be no reason for them to die. I know at doesn't mean they won't!!
    Last edited by mattianuk; 9th January 2014 at 09:05 PM.

  8. #8

    Join Date
    Jul 2006
    Location
    London
    Posts
    1,241
    Thank Post
    110
    Thanked 242 Times in 193 Posts
    Blog Entries
    1
    Rep Power
    74
    Quote Originally Posted by mattianuk View Post
    It completed, I did it through the controller bios, so it said it was fine.
    Disks are 500gb
    Vm's - largest is the storage at 140gb. Exchange is next at 80.

    The OS is on a different array, it's just the data array with issues, it def has 3 working drives out of 4 currently.

    There are around 20 clients, unless laptop trolleys are used. But at the moment, I left the server off over night, and have told staff not to use the network tomorrow, being a small first school that is not the end of the world.

    Backups would be to local USB HDD drive.
    USB 2 or 3?

  9. #9

    Join Date
    Apr 2012
    Posts
    413
    Thank Post
    39
    Thanked 35 Times in 28 Posts
    Rep Power
    15
    USB 2

    Storage vm's backup with veeam takes 50mins, tried it earlier, but hung on 99%. Exchange takes 50mb.

    I take it as the OS, which is also the dc, is on a different array, the OS should be fine?

  10. #10

    Join Date
    Jul 2006
    Location
    London
    Posts
    1,241
    Thank Post
    110
    Thanked 242 Times in 193 Posts
    Blog Entries
    1
    Rep Power
    74
    Quote Originally Posted by mattianuk View Post
    USB 2

    Storage vm's backup with veeam takes 50mins, tried it earlier, but hung on 99%. Exchange takes 50mb.

    I take it as the OS, which is also the dc, is on a different array, the OS should be fine?
    Ok. I was just double checking your figures with a back of the envelope calculation. Right, With such a small set of data I would lean towards getting a backup taken onto a reliable disk as your next step. Then add the disk.

    It might be worth opening a ticket with Veeam about the hung backup ASAP in case the problem happens again during the backup. Open it by saying (assuming I've got this right) "my disk holding the Veeam backups have failed, my hyper v host raid is degraded and running with two suspect disks, I'm trying to take a back up to a new destination disk it hung at 99%, help!"

    Keep the site down until you've got your backup and built the disk. Those two disks currently in the server that were part of the original 4 disk raid are the ones I would be worried about. Expect them to fail soon. Also the older the 'new' disks are the greater the chance they have been damaged by accident while in storage... So I without being sure of their provenance would be quite nervous until I have a backup onto a known good disk.

  11. Thanks to psydii from:

    mattianuk (10th January 2014)

  12. #11

    Join Date
    Apr 2012
    Posts
    413
    Thank Post
    39
    Thanked 35 Times in 28 Posts
    Rep Power
    15
    Cheers, sounds like good advice.

    Only thing I'm confused on is the disks in the server, and risk of failure. The original 4 were brand new in September. They are not old. I brought 2 brand new ones today. Can you explain the risk of failure in the other two, when my belief is the damage to the current two is a power failure this morning, perhaps corrupting data, rather then damaged or old disks?

  13. #12

    Join Date
    Apr 2012
    Posts
    413
    Thank Post
    39
    Thanked 35 Times in 28 Posts
    Rep Power
    15
    An update: I checked yesterday's backup I tried that hung on 99% and it had all the files, so then I went straight ahead and rebuilt the array. All working better now... Veeam still hanging - but I have ordered a new NAS for Monday for backups and will sort something then, for now we have backups of all stuff, including emails and the array back and working.

    Also plugged the server into the battery backup slot on the UPS and not the surge only side. Really should have checked that when we installed the server... But assumed it was already in there!!! Never again!

    Time to breathe again!

  14. #13

    3s-gtech's Avatar
    Join Date
    Mar 2009
    Location
    Wales
    Posts
    2,697
    Thank Post
    143
    Thanked 542 Times in 486 Posts
    Rep Power
    148
    For DCs - a second DC is good, but not foolproof. Get a bare-metal backup solution in place for it too - Windows Server Backup can do this to a USB disk if you want to do it cheaply. I'm planning on having an iSCSI target on a decent Synology NAS for all of my server backups in a remote location for DR.

  15. Thanks to 3s-gtech from:

    mattianuk (10th January 2014)

  16. #14

    Join Date
    Apr 2012
    Posts
    413
    Thank Post
    39
    Thanked 35 Times in 28 Posts
    Rep Power
    15
    Yeah, going to look to do a windows backup, but if it's a hypervhost (one of our schools are part of a failover cluster) will it try and backup the VMs as well? Or can you tell it not too?

  17. #15

    3s-gtech's Avatar
    Join Date
    Mar 2009
    Location
    Wales
    Posts
    2,697
    Thank Post
    143
    Thanked 542 Times in 486 Posts
    Rep Power
    148
    Don't run Hyper-V myself, but Windows Server Backup has a number of selectable options for the backup job. You could just exclude the VM storage location too.

  18. Thanks to 3s-gtech from:

    mattianuk (10th January 2014)

SHARE:
+ Post New Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. [SIMS] New SIMS Server RAID 5 or 10
    By fiza in forum MIS Systems
    Replies: 10
    Last Post: 15th May 2011, 03:12 PM
  2. Replies: 3
    Last Post: 12th August 2010, 02:57 AM
  3. RAID failure
    By joe90bass in forum Hardware
    Replies: 12
    Last Post: 27th July 2010, 11:40 AM
  4. [Ubuntu] Server wont boot after raid failure
    By Geoff in forum *nix
    Replies: 2
    Last Post: 21st August 2009, 12:51 PM
  5. Server RAID - HDD failed
    By mcloum in forum Hardware
    Replies: 8
    Last Post: 11th November 2008, 10:05 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •