+ Post New Thread
Results 1 to 11 of 11
Hardware Thread, Raid 1 Restore in Technical; Hi, Our Admin Server is looked after by the local authority people and is under warranty from the approved supplier. ...
  1. #1

    Join Date
    Jun 2009
    Location
    Birmingham
    Posts
    600
    Thank Post
    92
    Thanked 72 Times in 64 Posts
    Rep Power
    24

    Raid 1 Restore

    Hi,

    Our Admin Server is looked after by the local authority people and is under warranty from the approved supplier. It showed a degraded hard drive on Thursday and as such we got the hardware people out to look at it. They brought a fresh hard drive with them thinking that was the problem but when they arrived found that it wouldn't rebuild and as such took it back to the workshop to look at leaving us with no Admin Server.

    It has now been returned and all the data entered from the point the RAID showed it was degraded to the point where they took it away has been lost (2 days worth). The excuse being given is that it was the primary hard drive that has failed and as such then it's tough luck and do we have a back up from which to restore? This will only give us back 1 days work as it wasn't run again before the server was taken away.

    I would love someone to correct my thinking that the idea of a mirrored RAID is that it doesn't matter which hard drive fails the other one still has all the info and it can be resotred from it. Do you think the idiots have cloned the failed hard drive?

    Sorry for the long post

    Rich

  2. #2

    bossman's Avatar
    Join Date
    Nov 2005
    Location
    England
    Posts
    3,905
    Thank Post
    1,186
    Thanked 1,057 Times in 749 Posts
    Rep Power
    328
    @Tricky_Dicky:

    I would go for that knowing all LAs do make an awful lot of mistakes to say they have the required knowledge and staff (supposedly).
    You are correct in thinking that whichever drive in the raid 1 mirrored pair fails then the other will allow you to rebuild a new drive as long as the drive is of a similar spec.

  3. #3
    azrael78's Avatar
    Join Date
    Sep 2007
    Location
    Devon
    Posts
    383
    Thank Post
    47
    Thanked 37 Times in 33 Posts
    Rep Power
    20

    Smile

    Rich,

    Sorry to hear about this event, however it does sound as if they are correct to some degree.

    We recently had a RAID-1 server fail. Now, what failed for us was Disk 0:2 - which contained the Database. This however was mirrored to Disk 0:3. Due to the damage on Disk 0:2 - this meant that we couldn't access or even attempt a rebuild of the Database.

    What also happened about the same time, was Disk 0:0 deciding it had given up too - but rather than it being a RAID failure - it just decided to have a spat.

    Rebooting the server didn't help - once we rebooted, we couldn't get back in.

    Short version of it was this:

    We had lost Disk 0:0 due to it having a spat attack.
    We had lost Disk 0:2 and Disk 0:3 due to RAID failure.

    While both arrays are Mirrored, it didn't help us.

    We had to blank all drives and replace Disk 0:2 with a standby.
    Once we had done this, we reinitialised all of the disks, rebuilt the RAID and restored the OS to 0:0, as it's RAID-1, it mirrored this to 0:1.

    The database was eventually restored from backup (made the day before) to Disk 0:2, this later mirrored to 0:3.

    If you only had 2 disks that were RAID-1 then in theory at least - you should have been able to keep going from the 1 disk that hadn't failed, but while that's great in theory - in practice it sometimes just doesn't work that way.

    If one of your disks was also on the way out, then it's possible the mirror may not even have happened as it was meant to.

    I don't think - from what you describe, that you could have done more except perhaps back up the server prior to it being taken - but with a faulty RAID that's always risky.

    It may be worth looking at your disaster recovery scenarios - our disaster with our SIMS server made us look at ours somewhat sharpish.

    When you say 'Admin Server' - I assume it's the server that teaching staff use, it perhaps houses your MIS as well?

    If so - what kind of backup strategy do you have?
    Is it 1 tape every day - full backup?
    Do you have data you absolutely-positively cannot be without? (Obviously you do, but what specifically - sometimes you can get away with simply copying this data during the online day using an intelligent backup that supports VSS).

    I hope this helps somewhat

    Az

  4. #4

    Michael's Avatar
    Join Date
    Dec 2005
    Location
    Birmingham
    Posts
    9,262
    Thank Post
    242
    Thanked 1,568 Times in 1,250 Posts
    Rep Power
    340
    In my experience it shouldn't matter what hard disk has failed in a RAID1 array. It seems really odd you've lost two days worth of work like that. The only possible explanation would be to clone the failed drive, but that should be impossible (in theory).

  5. #5

    mattx's Avatar
    Join Date
    Jan 2007
    Posts
    9,240
    Thank Post
    1,058
    Thanked 1,068 Times in 625 Posts
    Rep Power
    740
    Quote Originally Posted by Michael View Post
    In my experience it shouldn't matter what hard disk has failed in a RAID1 array. It seems really odd you've lost two days worth of work like that. The only possible explanation would be to clone the failed drive, but that should be impossible (in theory).
    Agree. I've worked on huge Citrix farms in the past with racks full of 50 to 70 servers. All mirrored - had quite a few HDs fail and it was a simple case of pull out the duff, replace with the new one and as it had a good RAID card in it re-built. [ IBM ]
    The only problem I had with a RAID system was with a very old IBM 720 running LAN SERVER - it was a RAID 5, I got an alert on my screen saying there was a error with one of the disks [ this was way back in my old job ] so off I went with a new drive to the relevant office in the country. Arrived, pulled out the disk that had the problem, shoved in the new one, it started to re-build, then whilst that was re-building another drive failed......Clocked up around 30 hours of overtime sorting that one out !!

  6. #6

    AngryTechnician's Avatar
    Join Date
    Oct 2008
    Posts
    3,730
    Thank Post
    698
    Thanked 1,212 Times in 761 Posts
    Rep Power
    394
    My view is that if your support guys can't get every bit of data back that existed before the RAID controller detected a failure, you either have a rubbish RAID controller or rubbish support guys. It honestly could be either.

    With regard to data needed to maintain the RAID array being written to the disk (as azrael78 talks about with the loss of his database), this simply shouldn't happen. The RAID controller should not be storing anything required to rebuild the array on a drive that is part of the array. If it is, this is the 'rubbish RAID controller' scenario.

  7. #7

    Michael's Avatar
    Join Date
    Dec 2005
    Location
    Birmingham
    Posts
    9,262
    Thank Post
    242
    Thanked 1,568 Times in 1,250 Posts
    Rep Power
    340
    then whilst that was re-building another drive failed
    It is annoying when that happens, but it's the same with RAID1 and RAID5. As for RAID0, well you're buggered (that's a technical term)

    Back on topic, I suppose what needs to be asked is why installing a new spare didn't begin rebuilding the array? Hard disk compatibility shouldn't be an issue these days and a duff RAID controller could be one explanation.

  8. #8
    t4ll1f3r's Avatar
    Join Date
    Jun 2007
    Posts
    46
    Thank Post
    1
    Thanked 9 Times in 8 Posts
    Rep Power
    16
    More likely they cloned the new drive over the good drive then had to recover the failed drive.

  9. #9
    azrael78's Avatar
    Join Date
    Sep 2007
    Location
    Devon
    Posts
    383
    Thank Post
    47
    Thanked 37 Times in 33 Posts
    Rep Power
    20

    Smile

    Quote Originally Posted by AngryTechnician View Post
    With regard to data needed to maintain the RAID array being written to the disk (as azrael78 talks about with the loss of his database), this simply shouldn't happen. The RAID controller should not be storing anything required to rebuild the array on a drive that is part of the array. If it is, this is the 'rubbish RAID controller' scenario.
    Well to be fair - the RAID controller was a 'extra' on the mobo and someone (not me) thought they'd make good use of it.

    Oddly enough, if it were me, I'd probably have made use of it anyway - just wouldn't be too fond of RAID-1 on a critical server (like SIMS). Give me RAID-5 or 10 (all of our other big servers are R5).

    Az

  10. #10

    mattx's Avatar
    Join Date
    Jan 2007
    Posts
    9,240
    Thank Post
    1,058
    Thanked 1,068 Times in 625 Posts
    Rep Power
    740
    Quote Originally Posted by Michael View Post
    It is annoying when that happens, but it's the same with RAID1 and RAID5. As for RAID0, well you're buggered (that's a technical term)
    Indeed, hence the 30 off hours of OT !! [ Backups helped - but after recovering almost 70% of the 10gig DAT tape, it decided it did not like a file which was called lpt.doc and fell over..... ]
    All because of the reference to the LPT ports.....
    Still, after I got rid of that file and started again it went though.

    More likely they cloned the new drive over the good drive then had to recover the failed drive.
    Would a RAID controller let that happen ? I think it's down to the sheer incompetence of the people who were supposed to be fixing it. [ My opinion blah blah blah ]

  11. #11

    Join Date
    Jun 2009
    Location
    Birmingham
    Posts
    600
    Thank Post
    92
    Thanked 72 Times in 64 Posts
    Rep Power
    24
    Thanks for all the replies, some really interesting reading.
    Update: Our LA have been excellent in helping to restore to the last backup that was taken remotely, so fair play to them and I really can't fault them in this instance. I think the problem here lies with the company I purchased the server off and their support people.
    Quote Originally Posted by mattx View Post
    Would a RAID controller let that happen ? I think it's down to the sheer incompetence of the people who were supposed to be fixing it. [ My opinion blah blah blah ]
    That's what I'm thinking.

    Quote Originally Posted by Michael View Post
    Back on topic, I suppose what needs to be asked is why installing a new spare didn't begin rebuilding the array? Hard disk compatibility shouldn't be an issue these days and a duff RAID controller could be one explanation.
    The server is 3 years old, so would it make a difference then? As long as they brought along the correct sized hard drive in the first place it should have just worked.

    Quote Originally Posted by azrael78 View Post
    Rich,
    It may be worth looking at your disaster recovery scenarios - our disaster with our SIMS server made us look at ours somewhat sharpish.
    When you say 'Admin Server' - I assume it's the server that teaching staff use, it perhaps houses your MIS as well?
    If so - what kind of backup strategy do you have?
    Is it 1 tape every day - full backup?
    Do you have data you absolutely-positively cannot be without? (Obviously you do, but what specifically - sometimes you can get away with simply copying this data during the online day using an intelligent backup that supports VSS).
    Good point, well made. In hindsight I would have taken another backup before letting the service company touch it. It houses CMIS and all the office type stuff, so very important.
    A remote backup is taken every night by the LA which in this instance worked really well in restoring the data.

    Being as CMIS is a database system is there some way of running backups during the day while people are still in the system? For example to another server on site or something like that?

    Cheers for the help so far people.
    Rich

SHARE:
+ Post New Thread

Similar Threads

  1. SIMS Restore
    By oxide54 in forum Educational Software
    Replies: 2
    Last Post: 8th September 2009, 10:12 PM
  2. Net-Restore
    By HodgeHi in forum Mac
    Replies: 1
    Last Post: 1st July 2009, 09:56 PM
  3. 2003 DC restore using acronis true image with universal restore
    By ranj in forum Windows Server 2000/2003
    Replies: 2
    Last Post: 8th February 2009, 05:50 PM
  4. sims restore
    By Uraken in forum MIS Systems
    Replies: 7
    Last Post: 6th March 2007, 12:25 PM
  5. Restore Problems
    By Grommit in forum Wireless Networks
    Replies: 3
    Last Post: 6th March 2007, 08:53 AM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •