+ Post New Thread
Page 1 of 2 12 LastLast
Results 1 to 15 of 18
How do you do....it? Thread, Faulty Server HDD's in Technical; Morning all, Wasn't sure where to put this, so hopefully it's in the right place. We've had an ongoing issue ...
  1. #1

    Join Date
    Feb 2007
    Location
    Stockport
    Posts
    846
    Thank Post
    258
    Thanked 105 Times in 81 Posts
    Rep Power
    39

    Faulty Server HDD's

    Morning all,

    Wasn't sure where to put this, so hopefully it's in the right place.

    We've had an ongoing issue with our Server room. 4 years ago we put in a viglen system (before my time) and for the first year or so everything ran fine. After this initial "honeymoon period" a couple of the servers started to cause faulty harddrives. In total over th enext couple of years we went through 25-30 server hard drives in one server alone.

    12 months ago, we replaced Viglen with a vanilla windows environment and for 12 months everything has been running fine. However, as with Viglen, after this honeymoon period ended, we have had 1 server start to experience faulty harddrives and we've just had Hitatchi replace both of our SANS as they too have developed faulty drives.

    This, to me at least, is too much of a coincidence and has to be being caused by something else other than faulty hard ware.

    Does anyone know of a company who would come out and do a "Server Room Evaluation" or something along those lines? Essentially we are looking for someone to come out to us and see if there is something we've missed which might be causing HDD's to fail on a regular basis (low level static maybe? I dont know).

    The room already has its own air conditioning, and it's own power source but we still have the same problems time and time again.

    Any advice anyone can provide would be gratefully received.

  2. #2
    Mr.Ben's Avatar
    Join Date
    Jan 2008
    Location
    A Pirate Ship
    Posts
    941
    Thank Post
    182
    Thanked 157 Times in 126 Posts
    Blog Entries
    2
    Rep Power
    65
    Are all of the servers on a UPS?

    What version of RAID are you using?

  3. #3

    Join Date
    Feb 2007
    Location
    Stockport
    Posts
    846
    Thank Post
    258
    Thanked 105 Times in 81 Posts
    Rep Power
    39
    All servers are on a UPS.....

    Versions of Raid we're using are Raid 0 and Raid 5.

    There's been no error with the RAID though, it's the drives themselves that have developed the fault.

  4. #4
    DMcCoy's Avatar
    Join Date
    Oct 2005
    Location
    Isle of Wight
    Posts
    3,421
    Thank Post
    10
    Thanked 486 Times in 426 Posts
    Rep Power
    110
    How do you know they are faulty, what symptoms or errors are there? What is the humidity level like?

  5. #5

    Join Date
    Feb 2007
    Location
    Stockport
    Posts
    846
    Thank Post
    258
    Thanked 105 Times in 81 Posts
    Rep Power
    39
    We haven't checked humidity levels as yet, this is something I'm going to look at thought shortly.

    With regards how we know they are faulty, same way most would I guess. Servers start to beep, showing a faulty drive. Sans display warning light, when you go into the console it tells us that the drive has developed a fault. Of the two new SANs that Hittchi have sent us, one has developed an issue after just 2 weeks!

    As an addition to this, our FROG box has also chucked out a few hard drives over the 2 years we've had it in.

  6. #6
    accura2000's Avatar
    Join Date
    Apr 2007
    Location
    Ashford, Kent
    Posts
    176
    Thank Post
    17
    Thanked 34 Times in 23 Posts
    Rep Power
    20
    I used to get similar problems, admittedly it normally happened when a member of staff turn the aircon off in the server "cupboard", but the heat issues relating to the HD failures only affected the hitachi and seagate drives, that said the seagate drives i found the be the least reliable.

    I have now changed all the server drives to Western Digital Caviar Black drives and havent had any fall over since, That was well over a year ago, with no errors, no raid issues, no strange noises, nothing...

  7. #7
    Butuz's Avatar
    Join Date
    Feb 2007
    Location
    Wales, UK
    Posts
    1,579
    Thank Post
    211
    Thanked 220 Times in 176 Posts
    Rep Power
    62
    Assuming you can rule out power (do you have sufficient UPS's) and heat (do you have decent air con) isues you may be looking at something far more strange such as humidity (this should be solved by air con) or even worse some kind of magetic interference??

    Also, what make and model of hard drives do you use???

    The sheer qty of hard drive faliures is definatly cause for concern. I can count the number of server HDD faliures we've had here over the last 8 years on one hand *touches wood*, and I have normal hands by the way not freakish hands.



    Butuz

  8. #8
    zag
    zag is offline
    zag's Avatar
    Join Date
    Mar 2007
    Posts
    3,738
    Thank Post
    894
    Thanked 414 Times in 348 Posts
    Blog Entries
    12
    Rep Power
    85
    Try SSD drives?

    Not had one fail yet.

  9. #9

    SYNACK's Avatar
    Join Date
    Oct 2007
    Posts
    10,991
    Thank Post
    851
    Thanked 2,653 Times in 2,253 Posts
    Blog Entries
    9
    Rep Power
    764
    With regard to brands It depends on the batch I guess, we have had so many WD drives die but the Seagates have kept on going for years.

    As to the server room I would check the humidity, the temperature stability - if it fluctuates a large amount regularly it will kill the drives quicker. I'd also look at the vibration from the surroundings and from eqipment transfered through the mounting. Lots of mechanical vibration will also end in toasted drives.

  10. #10

    Join Date
    Feb 2007
    Location
    Stockport
    Posts
    846
    Thank Post
    258
    Thanked 105 Times in 81 Posts
    Rep Power
    39
    We do have the room air conditioned, and as far as I can tell, it is on 24 hours a day keeping the temperature at a constant temperature of 20 degrees C (that's what it is currently).

    I am certainly going to check humidity in there, but I'm not really sure how I can check for magnetic interference (which at the moment is where our thoughts are as to what is causing the problem).

    With regards makes of hard drives, I don't actually know. All our servers are under warranty with their respective manufacturers. Hitatchi SANs I would guess use Hitatchi drives. Our Backup server has been using Seagate Barracuda drives by the looks of it. The frog box, I've no idea, as FROG know when the drive goes before we do and they just turn up and replace it.

  11. #11
    Butuz's Avatar
    Join Date
    Feb 2007
    Location
    Wales, UK
    Posts
    1,579
    Thank Post
    211
    Thanked 220 Times in 176 Posts
    Rep Power
    62
    The air con should automatically dehumidify the air so unless you've got something seriously wrong with the building fabric, humidity shouldn't be an issue for you.

    As for magetic interference - I have no idea how you would check for that, I would assume it would be a specialised job, and therefore very expensive to find out.

    Hmm the less I say about Hitatchi the better. Hitatchi = IBM. When I was a wee boy we used to refer to IBM Deskstar hard drives as IBM Deathstars because it seemed as if every one ever made failed within a year. I had 2 in RAID 0 at that time. Big mistake. Spent more time reinstalling windows than actually doing anything productive :/

    Butuz

  12. #12

    Join Date
    Sep 2008
    Location
    England
    Posts
    267
    Thank Post
    6
    Thanked 67 Times in 59 Posts
    Rep Power
    51
    I would have thought you would need something fairly big and fairly nearby to cause problems with magnetic interference. Is there any large transformers/substations nearby? Or any other large industrial-type stuff? Lift machinery, large electric motors? TV/Radio transmitters? The disk itself will be inside a metal box, inside anther metal box, so it would take a lot for interference to get through. If it were to affect the electronics on the drive, I would have thought you would be getting other failures/random crashes as well. Does anything else go wrong in that room?

    Seeing how there are problems with several servers, it would point to something in the room rather than specific hardware.

    How well does the air circulate around the room? Are there any heat hot-spots near the servers/HDs? Does the air flow in the room work with or against the servers own cooling mechanism?

    Some other thoughts:
    * Does anyone else have access to the room (cleaners/caretakers etc)?
    * How are the servers situated within the room (in a rack, on the floor, on tables?)
    * Is there any problem with dust building up in the room/inside the servers?
    * Is the room carpeted? What is the floor made out of?
    * What is the power supply like in that room? Do you get any spikes/brownouts?
    * Have you tried wiping the drives and reusing them? Or is it a permanent failure? Can you access the data on another computer/server?
    * How many UPS's and servers do you have? what capacity are the UPSs?
    * Does the room have any problems with damp? Does it feel excessively dry or humid?
    * Do the failures happen at regular intervals or after any other events?

    Can you describe the room a bit more, maybe show some pictures? Perhaps the area around the room as well.

  13. #13

    Join Date
    Feb 2007
    Location
    Stockport
    Posts
    846
    Thank Post
    258
    Thanked 105 Times in 81 Posts
    Rep Power
    39
    Thanks again to everyone who's responded. We've spoken to a couple of companies now who offer a service whereby they'll come out and do some analysis of our server room. Unfortunately the school is baulking at the price (you genuinely can't win, they don't want down time, they don't want to spend any money to work out what's wrong!).

    Just to follow on from Chris_Cook's response, pleasae see attached some images of our server room.

    With regards the questions asked in Chris' response....

    - Nobody else has access to the server room. We have a locked door, with a metal roller shutter over the front which is also locked at night. As far as we know, we are the only people with keys.
    - Servers (as can be seen in the photos) are rack mounted. However before the rack was in, the servers were on a desk and we had exactly the same problems.
    - Don't think there is a problem with dust. When we removed one of the faulty sans I checked the back of it and there was no dust build up at all. The room itself is fine, it's pretty clean as server rooms go.
    - The room isn't carpeted, the floor is a vinyl type covering.
    - The room has it's own dedicated power supply running into it, when we were a viglen school this is something they said might be causing the problem so we had the dedicated power source put in. As far as I know we don't suffer from spikes/brownouts, but we need to do some research on this.
    - We have tried wiping the drives and reusing them, they always show as faulty and won't work. It is a permanent failure and we havebeen unable to access the data on another computer/server.
    - All our servers run through the UPS that we have. We had a full network rebuild last year and the company who did the majority of the work was responsible for the UPS and therefore it should be ore than sufficient for our needs.
    - No problems with damp in the room (although it does have a plastic sewage pipe running through the corner which burst over the summer!).
    - The failures aren't very unpredictable, there's no events that occur that could trigger it (that we know of).

    Thanks again
    Attached Images Attached Images

  14. #14
    jsnetman's Avatar
    Join Date
    Oct 2007
    Posts
    887
    Thank Post
    23
    Thanked 134 Times in 126 Posts
    Rep Power
    39
    Those servers look remarkably like my old servers, one which is retired now and one still going as a print server (intel chassis ?). We had disk problems for a while indicating faulty disks turned out to be a circuit board behind the HD housing forgot what it is called but its some sort of drive/error reporting circuit board. Could be that but that would not explain why it was happening on the previous server, unless they both had the same fault. Did you test all those faulty disks on another machine ?

  15. #15
    IanT's Avatar
    Join Date
    Aug 2008
    Location
    @ the back of my server racks farting.....
    Posts
    1,891
    Thank Post
    2
    Thanked 118 Times in 109 Posts
    Rep Power
    59
    25-30 server hard drives, thats alot of drives to go through!

SHARE:
+ Post New Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. Replies: 10
    Last Post: 17th October 2012, 08:31 PM
  2. Faulty server PSU?
    By Little-Miss in forum How do you do....it?
    Replies: 1
    Last Post: 2nd September 2010, 10:42 AM
  3. Server RAID - HDD failed
    By mcloum in forum Hardware
    Replies: 8
    Last Post: 11th November 2008, 10:05 PM
  4. Exchange Server 2003 - HDD Fail - Please Help
    By ninjabeaver in forum Windows
    Replies: 6
    Last Post: 20th March 2007, 05:06 PM
  5. Server - where to put the fastest HDD?
    By CM786 in forum Hardware
    Replies: 14
    Last Post: 3rd April 2006, 11:06 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •