+ Post New Thread
Results 1 to 14 of 14
Windows Server 2000/2003 Thread, ABSOLUTE DISASTER - File Server running with 40% CPU for interrupts in Technical; Christ I hate taking a holiday sometimes. First week off since Xmas, looking forward to an easy morning going over ...
  1. #1

    sonofsanta's Avatar
    Join Date
    Dec 2009
    Location
    Lincolnshire, UK
    Posts
    4,943
    Thank Post
    862
    Thanked 1,442 Times in 991 Posts
    Blog Entries
    47
    Rep Power
    616

    ABSOLUTE DISASTER - File Server running with 40% CPU for interrupts

    Christ I hate taking a holiday sometimes.

    First week off since Xmas, looking forward to an easy morning going over e-mails and easing myself back in, and it turns out that at the end of the holiday, one of the file servers had a HDD go funny. Easy enough, it's in RAID 5, not a disaster.

    Except now, while it tries to rebuild it (I assume)*, it is absolutely gorging on CPU. Running Process Explorer shows around 35% on Interrupts, with no clue as to what is sending all the IRQs. Students (who have their profiles housed on this server) have basically got a 50/50 chance of getting on, and loading up IE (for example) takes around 5 minutes.

    Something is very wrong, and at the moment, I'm completely lost and a little bit scared. We have backups so worst case scenario, I have to stay behind tonight and re-install Server 2k3, but I'd really rather not.

    This is a HP Proliant G5, the other HP's are using a Smart Array E200i Controller on System Board and are rebuilding fine (swapped a HDD out from a less vital server, so I can see that one rebuilding fine. With the faulty hard drive, oddly...)

    Any clues at all? Any way of telling where the IRQs are coming from? Anyone had this before? Right now I'm grateful for any advice at all, just ask me questions and I'll give you what answers I can. As U2 could have sung, monday bloody monday... *sigh*

    *can't actually tell if it's rebuilding because server is running so slow, it won't load the HP diagnostic malarkey to tell me.

  2. #2
    p858snake's Avatar
    Join Date
    Dec 2008
    Location
    Queensland
    Posts
    1,490
    Thank Post
    37
    Thanked 175 Times in 151 Posts
    Blog Entries
    2
    Rep Power
    51
    rebuild it outside of the operating system....

    In my limited knowledge of RAID i would be surpised if you can do it in the OS since data is being written and changed on the drives.

  3. Thanks to p858snake from:

    sonofsanta (9th June 2010)

  4. #3

    sonofsanta's Avatar
    Join Date
    Dec 2009
    Location
    Lincolnshire, UK
    Posts
    4,943
    Thank Post
    862
    Thanked 1,442 Times in 991 Posts
    Blog Entries
    47
    Rep Power
    616
    Blimey - should've come here to moan quicker!

    Just writing this post has apparently dropped the interrupts down to about 5% CPU making the server actually usable. Although looking at the time, this could be because break is about to start so the server is being hammered a lot less...

    Still, would love to know where all the interrupts are coming from. Some days being the boss is just no fun.

    EDIT: obviously spoke too soon. Back up to 40-50% now -_-

  5. #4

    sonofsanta's Avatar
    Join Date
    Dec 2009
    Location
    Lincolnshire, UK
    Posts
    4,943
    Thank Post
    862
    Thanked 1,442 Times in 991 Posts
    Blog Entries
    47
    Rep Power
    616
    Quote Originally Posted by p858snake View Post
    rebuild it outside of the operating system....

    In my limited knowledge of RAID i would be surpised if you can do it in the OS since data is being written and changed on the drives.
    Would love to try but as it stands, 50% of students getting logged on is still better than the 0% if it's offline. May try this over lunch still, they can live without flash games for one day.

    Fun stats: on my BDC which is also rebuilding: 6 reads/sec, 8 writes/sec. On the file server with problems, 35 reads/sec and 349 writes/sec. So yes - OS rebuild is not ideal

  6. #5

    Join Date
    Sep 2009
    Posts
    23
    Thank Post
    2
    Thanked 2 Times in 2 Posts
    Rep Power
    10
    That RAID 5 array is using a lot of CPU time because of all the calculations for the Parity data. RAID 5 can hit the CPU a fair bit with a lot of sustained I/O writes and a rebuild will even more so. Unfortunatly, this kind of performance hit is unavoidable while rebuilding the array. Like has been suggested, and offline rebuild is the only other option but perhaps the greater of two evils if you need your server online...
    The only way to truely minimise the overhead from a RAID5 is to spend lots of money on a card with full hardware processing of the parity data. Most, if not all integrated RAID controllers will tax the CPU for this duty.
    Last edited by CtrlAltDel; 7th June 2010 at 12:03 PM.

  7. Thanks to CtrlAltDel from:

    sonofsanta (9th June 2010)

  8. #6

    sonofsanta's Avatar
    Join Date
    Dec 2009
    Location
    Lincolnshire, UK
    Posts
    4,943
    Thank Post
    862
    Thanked 1,442 Times in 991 Posts
    Blog Entries
    47
    Rep Power
    616
    After much fruitless and painful mucking about, I've just unplugged the network cable over lunch and the interrupts have plummeted. So I shall leave it churning away for now (as no-one could log on anyway, so no loss) and hopefully, once it's rebuilt, it will all be happy again.

    Incidentally, to any passing mods who may read this - as I was posting all of this on a server with the hardened security etc. - when the tabbed forums won't load because JS is disabled, the message recommends switching to a no-tabs layout. Unfortunately the dropdown used to change layout relies on an onChange JavaScript event to do anything. Which won't tend to work when JS is disabled may just need a submit button adding for the form inside a noscript tag down there.

    I might actually go have a cup of tea and some biscuits now, and try and have that quiet catch up on emails I was hoping for...

  9. #7


    Join Date
    Mar 2009
    Location
    Leeds
    Posts
    6,580
    Thank Post
    228
    Thanked 854 Times in 733 Posts
    Rep Power
    295
    it possibly depends on which g5 raid card it is (some have small cache some much more) and how the cache is set i think default is biased towards read might be worth using the hp tools and biasing it towards write for the time being

  10. #8

    sonofsanta's Avatar
    Join Date
    Dec 2009
    Location
    Lincolnshire, UK
    Posts
    4,943
    Thank Post
    862
    Thanked 1,442 Times in 991 Posts
    Blog Entries
    47
    Rep Power
    616
    So by unplugging the network cable, the interrupts dropped low enough to allow the RAID array to rebuild happily, took it a while but it got there.

    Having come back in this morning though, whilst the RAID array is happy, the interrupts are still at a constant 40% and causing no end of problems.

    So a quick question this one: does anyone know of a good tool for Server 2k3 that will help me trace the source of the interrupts? Process Monitor isn't telling me much about them at all, sadly. Once I know that I can start moving forward with this again.

  11. #9


    Join Date
    Mar 2009
    Location
    Leeds
    Posts
    6,580
    Thank Post
    228
    Thanked 854 Times in 733 Posts
    Rep Power
    295
    is it worth running a malware scanner on it?

  12. Thanks to sted from:

    sonofsanta (9th June 2010)

  13. #10
    p858snake's Avatar
    Join Date
    Dec 2008
    Location
    Queensland
    Posts
    1,490
    Thank Post
    37
    Thanked 175 Times in 151 Posts
    Blog Entries
    2
    Rep Power
    51
    It would have been caused by the RAID controller trying to repair your broken raid whilst the network was still trying to use the resources of the server.

  14. #11

    sonofsanta's Avatar
    Join Date
    Dec 2009
    Location
    Lincolnshire, UK
    Posts
    4,943
    Thank Post
    862
    Thanked 1,442 Times in 991 Posts
    Blog Entries
    47
    Rep Power
    616
    Quote Originally Posted by p858snake View Post
    It would have been caused by the RAID controller trying to repair your broken raid whilst the network was still trying to use the resources of the server.
    I figured that was why the interrupts dropped while the array was rebuilding, but the array is now rebuilt and the interrupts are still riding high. KrView tells me that it's ntoskrnl, and it definitely seems to be disk activity causing the problems (file system runtime library and disk cache functions come top of the zoom results). I still don't understand why disk activity on a file server is such a problem, though. Replicating data across now so I can down the server and have a proper mess around with it, disable hardware etc. and do more to actually pin the problem down... still not a fun day today.

  15. #12
    ind1ekid's Avatar
    Join Date
    Jul 2008
    Location
    Nottinghamshire
    Posts
    82
    Thank Post
    6
    Thanked 16 Times in 13 Posts
    Rep Power
    15
    I had a single sas drive die in a raid 5 on a proliant 380g5 not too long back, the spare kicked in straight away and once I got a new drive in to replace the dead on it soon rebuilt the raid quickly and quietly in the OS, P400 raid card. Lucky me ey?

  16. #13

    Join Date
    Sep 2009
    Posts
    23
    Thank Post
    2
    Thanked 2 Times in 2 Posts
    Rep Power
    10
    Its the first time i've heard of this behaviour carrying on after the RAID has successfully rebuilt.
    Processor usage that high could indicate a problem with bus mastering/DMA transfers from an item of hardware. Why it would decide to start that after a disk failure I don't know.
    As you say, having time to disable devices and update drivers will be the key.
    Have you had time to fully power down the server and turn the power off for a few seconds at the plug? I've had experience of hardware becoming stuck in a loop due to a glitch in the driver and this has been a good fix on many an occasion.

  17. #14

    sonofsanta's Avatar
    Join Date
    Dec 2009
    Location
    Lincolnshire, UK
    Posts
    4,943
    Thank Post
    862
    Thanked 1,442 Times in 991 Posts
    Blog Entries
    47
    Rep Power
    616
    Quote Originally Posted by CtrlAltDel View Post
    As you say, having time to disable devices and update drivers will be the key.
    Yeah, I did... didn't work out, as I'll get to in a moment...
    Quote Originally Posted by CtrlAltDel View Post
    Have you had time to fully power down the server and turn the power off for a few seconds at the plug? I've had experience of hardware becoming stuck in a loop due to a glitch in the driver and this has been a good fix on many an occasion.
    This is my usual fix for most problems - shutdown, pull the power cables, wait for 30 seconds and click the power button to discharge the power fully and then try, but it didn't have any effect.

    As for disabling devices - at break time yesterday I thought I would take the opportunity to disable the new SCSI card that had gone in for the new tape drive, in case it had stemmed from that. Disabled in Device Manager, yes please to the restart, goes through the motions and theeeeeeeeeen it reboots just as it is about to show the login screen. At which point everything was royally, erm, "made love to" and it forced my hand with regards to breaking out the Server 2k3 discs and starting over.

    So the net result is that it's been playing beautifully all day, my creeping apology has been made at staff briefing, and now I may never know what was actually the problem as all the hardware and all the drivers are back in place. There was surprisingly little information about the web for the issue though, at least that helped, which leads me to hope it was such a bizarre situation and so completey abnormal that I should be safe from it ever occurring again. If it does, though, I'm reaching straight for the install discs.

    I've handed thanks out in this thread to people who chimed in; sometimes knowing this place is here as a fallback is all that keeps me going. Cheers!

SHARE:
+ Post New Thread

Similar Threads

  1. Server CPU Temp at 87c!
    By CHR1S in forum Hardware
    Replies: 40
    Last Post: 7th June 2010, 08:40 AM
  2. Server CPU Usage at 100%
    By DSapseid in forum Windows
    Replies: 14
    Last Post: 5th December 2008, 10:07 PM
  3. Running a music file from a drop down list
    By roo20487 in forum Coding
    Replies: 1
    Last Post: 22nd May 2008, 10:37 AM
  4. CPU on server at 100%
    By kennysarmy in forum Windows
    Replies: 5
    Last Post: 16th March 2008, 10:55 PM
  5. Help - server disaster!
    By eean in forum Windows
    Replies: 16
    Last Post: 28th January 2008, 01:47 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •