+ Post New Thread
Page 1 of 2 12 LastLast
Results 1 to 15 of 20
Wireless Networks Thread, [SOLVED] Entire Network Down - 100% Network Utilization - Please Help! in Technical; Alright guys. So as you can imagine, network goes down, I'm hoping to get this resolved soon. Here's what happened. ...
  1. #1
    link470's Avatar
    Join Date
    Nov 2007
    Location
    Canada
    Posts
    252
    Thank Post
    86
    Thanked 8 Times in 6 Posts
    Rep Power
    16

    Thumbs up [SOLVED] Entire Network Down - 100% Network Utilization - Please Help!

    Alright guys. So as you can imagine, network goes down, I'm hoping to get this resolved soon. Here's what happened.

    It's 12:30 P.M. [Do YOU know what you're networks' doing?]. Everything is running great. 3 main switches in the server room, 48 port managed switches [Dell PowerConnect 3348's] and all is well. Everything functions normal, all servers are online, all desktops are happily happenin'. I go out to get a shipment of 50 new machines that arrive and start piling them outside my office. Next thing I know, I have lots of requests saying the entire network is down and nobody can access anything. I quickly head back to the server room wondering if a UPS went down, if a server restarted, if a switch turned off, anything. But what do I see? Absolutely nothing out of the ordinary. Everything is functioning great.

    But wait...no it's not. I can't get on the internet. I can't ping ANY computers, I can't remote desktop into the servers, the RDC's I DO have up with servers all fail, and everything is extremely slow. So I call the school board office. They head over with their handy $18,000 fluke meter. They plug it into one of our switches, and it measures our network and quickly throws back at us a 100% Network Utilization. The guy from the board office goes WHOA!!! I've never EVER seen it that high before.

    So we try swapping the first switch in the stack under suspicion it may be bad. We put in a 3448, Dell's next model of the 48 port 10/100 PowerConnect switch and take out the 3348. We use patch cables to link them together in a chain setup and see if that works.

    In the end, the switch switch [lol] did nothing. I still can't ping any machine in the school or get out of the network. I checked our main router, and it's functioning normally. I restarted the servers and they all appear to be functioning normally. So I think to myself, what would cause 100% network utilization. I noticed that ONE ping got through, but only 1 out of the 4 pings got a reply. So I knew the infrastructure itself was probably ok, but I had the assumption that something was loop backing.

    So off I go around the network. Documenting every single wall jack and port in the school [took 7 hours] and checked EVERY switch we have for any type of loopback that might be possible, like a jack plugged into a switch, and another port on the switch plugged into another jack. I also shut down every machine I came across and every network printer I came across so the network can essentially have nothing to broadcast [except for powered nics but since the machine isn't on it's less of a chance of being anything malicious running on the machine itself]. Nothing from what I could see in any of the labs was like that. All the jacks had either a direct connection to a PC, or a connection to a switch, that contained only other connections to PC's, not back to a wall jack.

    So here I am guys. It's the weekend, Friday night, schools not in for the weekend so I've got 2 days to work and go in for some overtime. Anyone have any suggestions of what to try next? Thanks a ton you guys. I really appreciate this community and hope we can get this resolved!

    Network Notes:
    -Windows NT Based Network running Windows Server 2003 servers and 250 XP Professional client stations.
    -6 Windows Server 2003 servers total
    -3 main switches in server room, 7 others around school. All checked thoroughly and restarted.

    Take care and have a good weekend. We have multiple IP's to the school so since I can't access the internet inside the school because of extreme lag, I'll plug in to the main external switch plugged into the modem and grab an IP for my laptop if I go in so I can check Edugeek.

    Thanks guys!
    Last edited by link470; 11th July 2008 at 03:29 AM.

  2. #2

    SYNACK's Avatar
    Join Date
    Oct 2007
    Posts
    11,271
    Thank Post
    884
    Thanked 2,749 Times in 2,322 Posts
    Blog Entries
    11
    Rep Power
    785
    First thing that I would do is chuck wireshark (open source) on a laptop and set it up to sniff network packets. This will tell you what kind of packets are flooding the network and then you will have a better idea of how to deal with them.

    It could be a faulty NIC in one of the servers or even one of the powered off PCs as the NICs remain live. The good news is that if it is a faulty NIC there is a good chance that its gibberish broadcasts will contain its MAC address that may help you narrow it down.

    If it is just spewing rubbish then I would seporate the switches and do a wireshark packet sniff on each one individually to see if you can track down which switch contains the offending device.

    This should help you narrow it down and make the task easier. If your switches are managed you may be able to grab a port utilization reading off them to see if one is generating lots of traffic. You could also check for transmition errors on each port as if it is a faulty NIC or device it may be generating errors when it garbles a packet too badly.

    Hope this helps. Good Luck with your hunt.

  3. 2 Thanks to SYNACK:

    bizzel (10th May 2008), link470 (10th May 2008)

  4. #3
    link470's Avatar
    Join Date
    Nov 2007
    Location
    Canada
    Posts
    252
    Thank Post
    86
    Thanked 8 Times in 6 Posts
    Rep Power
    16
    Quote Originally Posted by SYNACK View Post
    This should help you narrow it down and make the task easier. If your switches are managed you may be able to grab a port utilization reading off them to see if one is generating lots of traffic. You could also check for transmition errors on each port as if it is a faulty NIC or device it may be generating errors when it garbles a packet too badly.

    Hope this helps. Good Luck with your hunt.
    Thank you very much! I'll try that tomorrow. Much appreciated!

    Any other ideas anyone to add to the list?

  5. #4

    GrumbleDook's Avatar
    Join Date
    Jul 2005
    Location
    Gosport, Hampshire
    Posts
    10,074
    Thank Post
    1,384
    Thanked 1,887 Times in 1,169 Posts
    Blog Entries
    19
    Rep Power
    614
    Honestly ...

    Start from the simplest point and work outwards. Get some monitoring software on your laptop if you can't borrow the Fluke again. Something likeSnoop is good, but I would also look at using The Dude to help monitor devices as they come online.

    All network hardware, desktops and servers turned off.

    Turn on your core switch(es) ... plug in your laptop. Send a few pings to it and your router. Turn on your DCs ... and leave snoop running to see what traffic there is.

    Then start up each server ... monitor for 5 minutes between each one.

    Now your servers are up I would remove the uplinks to each edge switch before turning them all on. Plug in one uplink at a time and monitor. Some people may prefer to have all the desktops / devices turned on at the same time as you can check both devices and network hardware at the same time, others prefer to go slowly so they have a benchmark of what would be 'normal' traffic.

    Again, some prefer to test one edge switch and then unplug it to test another ... others prefer to leave the tested ones connected.

    As you slowly start everything up you will see whatever is causing the problem jump in. the above is just a logical way or narrowing down the issue, but the Fluke should have been able to tell you what devices the traffic was originating from or what the destination is. Snoop will also do this for you ... it could save you some time.

    Things to check ... Spanning tree ... make sure it is on. If you have some bright spark that has plugged in a loopback then this can cause problems ... I did visit one school where students (after reading up on network design and a teacher mentioning this) decided to loop over 100 ports. Not fun.

    SPT is a good way to stop the problem if this is the case, but does not help you find where exactly the loop is ... systematic checks such as the above will.

    Other causes ... virus attacks, failing NICs broadcasting like hell, switches needing firmware / OS upgrades.

    HTH

  6. Thanks to GrumbleDook from:

    link470 (10th May 2008)

  7. #5
    link470's Avatar
    Join Date
    Nov 2007
    Location
    Canada
    Posts
    252
    Thank Post
    86
    Thanked 8 Times in 6 Posts
    Rep Power
    16
    Quote Originally Posted by GrumbleDook View Post
    Things to check ... Spanning tree ... make sure it is on. If you have some bright spark that has plugged in a loopback then this can cause problems ... I did visit one school where students (after reading up on network design and a teacher mentioning this) decided to loop over 100 ports. Not fun.
    lol, awsome. Sounds good, thanks for the advice! I'll add that to the todo list.

  8. #6
    stratisphere's Avatar
    Join Date
    Apr 2007
    Posts
    295
    Thank Post
    33
    Thanked 87 Times in 31 Posts
    Rep Power
    31
    Might be looking in the wrong area, but i've found this has caused me problems before (not quite like yours, but close).

    Your core switches, if they are managed... check no bugger is using their IP. (We had someone manually set their IP once, clashed with one of our switches and that switch just threw a paddy and practically died).

  9. #7

    FN-GM's Avatar
    Join Date
    Jun 2007
    Location
    UK
    Posts
    16,373
    Thank Post
    906
    Thanked 1,811 Times in 1,559 Posts
    Blog Entries
    12
    Rep Power
    468
    Can you ping two machines that have static ip addresses?

    I mean go to one server and see if you can ping another.
    Last edited by FN-GM; 11th May 2008 at 01:00 AM.

  10. #8
    link470's Avatar
    Join Date
    Nov 2007
    Location
    Canada
    Posts
    252
    Thank Post
    86
    Thanked 8 Times in 6 Posts
    Rep Power
    16
    Quote Originally Posted by FN-Greatermanchester View Post
    Can you ping two machines that have static ip addresses?

    I mean go to one server and see if you can ping another.
    I couldn't originally.

    BUT...

    Thank you all for your replies. Much appreciated! I ended up thinking of what you all said and took a laptop into work, ran wireshark, found a TON of packets, like, in the 100,000 range almost instantly. I ended up seperating our switch stacks, isolated it to one switch, and that switch was looped into another switch...twice. Everything is back up and running after disconnecting just one of those cables.

    What's strange, is I think it's been like that for quite awhile and nothing ever happened before. I may be wrong, but does this sound possible? As of now, the entire network is up and running again, and I thank you all so much for your support and quick suggestions and replies. I'm just chillin' at home now very happy but still wondering if it's possible that there could have been a delay and that the storm of broadcasting didn't catch on till later? The set up was that the main switch [switch 1 out of 3 switches connected together via gigabit uplinks] was plugged into a spare 4th switch down below that the previous tech had used as a spare because there wasn't enough places to plug things in [the patch panel had more ports than the switches could support in that room]. Only 4 things were plugged in. 2 of them were from patch panel locations to connect wall jacks around the school, and 2 were the redundant connections plugged into switch 1 that after removing 1 of those, everything worked again.

    Any ideas if it's possible for a delay to happen and it not really get to the point of this until now? Any ideas of what triggered it so suddenly to be problematic?

    Either way, it's up and running. Thanks a ton! I love this place.

  11. #9

    SYNACK's Avatar
    Join Date
    Oct 2007
    Posts
    11,271
    Thank Post
    884
    Thanked 2,749 Times in 2,322 Posts
    Blog Entries
    11
    Rep Power
    785
    Quote Originally Posted by link470 View Post
    Any ideas if it's possible for a delay to happen and it not really get to the point of this until now? Any ideas of what triggered it so suddenly to be problematic?
    It can take a while for enough broadcasts to build up to cause a problem, if your network is well segmented and has a low amount of broadcast traffic it could take some time for the system to drown.

    This kind of think happened on one of my networks due to the school accepting the suppliers offer to install the switch themselves for free (arghh). They managed to replace the existing switch that was linked by two trunked 1GB ports but managed to wipe the configuration of the main switch (hard reset). I had a rather purposeful chat with both the school in question and the so called 'professional' suppliers about that one.

    If one of your switches did have spanning tree on previously but some event reset the switch to default this could have occoured.

    Good to hear that you got it solved.

  12. #10

    Join Date
    Dec 2007
    Location
    Blackburn
    Posts
    13
    Thank Post
    0
    Thanked 3 Times in 2 Posts
    Rep Power
    15
    Ha
    We had a similar issue on thursday of last week the core switch was locked solid but pings were getting trough some times.
    we have fibre links back to the core from all other stacks so it was a quick fix to isolate the area.
    after examining the effected area we located the issue to a wall port that had nothing connected to it after further investigation we found that RATS had chewed the cables and coused themn to short.
    this freaked out the teacher and pest controllers were called in.

  13. #11

    Join Date
    Mar 2007
    Location
    Devon
    Posts
    1,048
    Thank Post
    226
    Thanked 63 Times in 56 Posts
    Rep Power
    30
    If one of our delightful pupils manages to loopback on one of the small 4 ports in a room can take hours to disrupt the entire system.

  14. #12
    link470's Avatar
    Join Date
    Nov 2007
    Location
    Canada
    Posts
    252
    Thank Post
    86
    Thanked 8 Times in 6 Posts
    Rep Power
    16
    Quote Originally Posted by modcoms View Post
    after further investigation we found that RATS had chewed the cables and coused themn to short.
    this freaked out the teacher and pest controllers were called in.
    LOL. That's the kind of thing you don't want to laugh at while you're trying to find it and it's actually quite annoying, but when you do that's a totally awsome story to keep lol.

  15. #13

    garethedmondson's Avatar
    Join Date
    Oct 2008
    Location
    Gowerton, Swansea
    Posts
    2,305
    Thank Post
    973
    Thanked 326 Times in 194 Posts
    Blog Entries
    11
    Rep Power
    170
    WHen this happened to us the other day we went to our central backbone switches and did the following:

    1. Unplugged each network cable one at a time from the backbones. About 7 cables in, the backbone switches calmed down. SO we figured it was number 7 which linked to another switch (The Design Tech building).

    2. The Design Tech switch was still going like the clappers to we went over to that building.

    3. IN that building we took each cable out one at a time. About 14 into the 24 ports the switch calmed down so we traced 14 back to one of the rooms.

    4. In that room we discovered a network cable doubled back on itself. We pulled it out and plugged things back in gradually.

    This process of starting from the inside and working out seemed to work for us. Narrowing down the buildings, ruling out other switches.

    I may have missed some stuff out - cannot remember exactly what we did, but it was along these lines.

    GJE

  16. #14


    Join Date
    Feb 2007
    Location
    Northamptonshire
    Posts
    4,706
    Thank Post
    354
    Thanked 807 Times in 722 Posts
    Rep Power
    348
    Might be worth reading up on Spanning Tree Gareth, might have helped you there.

  17. #15

    garethedmondson's Avatar
    Join Date
    Oct 2008
    Location
    Gowerton, Swansea
    Posts
    2,305
    Thank Post
    973
    Thanked 326 Times in 194 Posts
    Blog Entries
    11
    Rep Power
    170
    Quote Originally Posted by kmount View Post
    Might be worth reading up on Spanning Tree Gareth, might have helped you there.
    Hi Kim,

    We've been told by the LEA that SPanning Tree has to be turned off because it affects the Cisco switches that are used to connect us to the broadband network.

    I shall find out more - and report the facts and reasons. I cannot remember what they said.

    Regards

    Gareth



SHARE:
+ Post New Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. Connecting a Windows network to an RM managed Network
    By Scruff in forum Wireless Networks
    Replies: 4
    Last Post: 9th February 2010, 12:53 AM
  2. Comptia Network+ 2005 or Network+ 2007
    By atfnet in forum Courses and Training
    Replies: 5
    Last Post: 20th August 2009, 12:45 PM
  3. Replies: 2
    Last Post: 15th February 2008, 05:22 PM
  4. Map network drives on wireless network
    By woody in forum Windows
    Replies: 24
    Last Post: 1st December 2007, 07:27 PM
  5. terminating CAT5E network cables in network cabinets
    By broc in forum Network and Classroom Management
    Replies: 7
    Last Post: 10th July 2007, 12:54 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •