Wired Networks Thread, Network Issue - What to do now? in Technical; Hi All,
Having done a search and reading through various threads I was wondering if anyone could add anymore advice ...
17th July 2012, 02:36 PM #1
- Rep Power
Network Issue - What to do now?
Having done a search and reading through various threads I was wondering if anyone could add anymore advice to help me track down my problem.
Around 10am today for no apparent reason our network decided to go down, by down I mean the following behavior;
-Clients unable to access network resources
-Clients unable to browse to internet
-Switches showing a lot of activity (mixture of entry level procurves and a few random unmanged)
-Unable to ping from one static machine to another
-Fibre convertors showing lots of activity
Now at first I thought this must be either a loopback or a virus as Ive seen both have similar affects on a network, Ive taken each server/host (we are virtualized on vmware) and rebooted, also taken all switches in building down. Thought it might our Ruckus wireless controller however this is now back up and appears ok.
We are back up and stable (ish) now...no idea why or how and Im in the process of troubleshooting further.
Sophos is showing no viruses
arp -a resolves as expected
I can now ping internal resources with no packet loss eg; server to workstation, workstation to server etc
I can traceroute to google.com successfully
What I am noticing is that pinging our default gateway (a LA Cisco 2950) I received constant time outs on, this is from any area of the network...would it be reasonable to assume this could cause the issue given nothing has to our knowledge changed on the network?
Ill be doing another walk around later today to ensure there are no visable loopbacks/damaged cabling etc any other advice or ideas much appreciated
Thanks for reading
17th July 2012, 03:16 PM #2
The gateway may have blocked you out due to seeing too much garbage,
probably worth getting the LA to health-check +/- reset from their end.
17th July 2012, 03:19 PM #3
- Rep Power
Thanks for the reply!
I have indeed logged a call on the Capita helpdesk as I know a couple of other schools within my LA had their 2950s replaced.
Hopefully have someone come back to me later today
17th July 2012, 03:30 PM #4
Could be a broadcast storm thanks to a loopback, try checking for switches patched back into themselves.
17th July 2012, 03:38 PM #5
- Rep Power
Good call, Ive been obsessed with checking the classrooms, all cabs are under lock and key so nothing should have changed but always worth a look!
Some great threads about this sort of stuff already , strange thing everything seems stable other than my continual ping to the default gateway.....scary as other than take down individual cabs havent done anything else
Thanks for all the input thus far guys
17th July 2012, 03:47 PM #6
I presume if you disconnect or switch off the Cisco 2950 that problems go away? Other than not being able to ping it of course.
17th July 2012, 03:52 PM #7
- Rep Power
When disconnecting it earlier it seemed to resolve the issue, although by no means conclusive. I will restarting again later.
The switch is currently back up and running but as it is managed by the LA I am waiting for them to come back to me.
Just installing the trial for procurve manager which will hopefully allow me to look at some of the edge switches in more detail, wireshark hasnt yielded anything yet but then Im a first time user so reading up on its usage to ensure Im not making a newbie mistake!
17th July 2012, 03:57 PM #8
Any logging on the Cisco or edge switches?
17th July 2012, 04:01 PM #9
What I would do is this?
Do all your servers plug into a main switch?
If they do disconnect all connections expect the servers into that switch expect the server and reboot the switch. This will see if its the main switch or a server. Check to see if the servers can connect.
Now connect the switch back up and disconnect all the network feeds to other parts of the school and check that the other switches near the server switch still work. You may have to reboot them. Look at the flashing lights do they go from a blink to a twinkle.
Check each other cabinet in school to see if they are flashing or twinkling. If they are tinkling connect that arm of the network and check its still working.
My guess is one arm has a loop. Its the end of the year and kids and teachers are leaving and stopping the network working is fun for some people.
Once you have it down to one cabinet disconnect all the switches and then connect them one at a time till you get the flash back.
Dont forget to enable tree spanning or fast tree spanning if they are newer switches.
17th July 2012, 04:06 PM #10
Once this is done
Capture packets from the network with wireshark for about 15 mins and then anaylse with the free capsa. This will tell you if you are getting broadcast storms duplicate ip addresses network problems.
You can also get a app for some mobiles that pretend to be the gateway and that will kill your network so it might be worth turning off the wifi while doing tests to eliminate it as the cause,
You might also want to check that the dhcp server is still giving out addresses as the service might have stopped for some reason.
17th July 2012, 04:08 PM #11
Are you having problems with your electrics and do you have ups on the switches as well as the servers as this can cause the switches to become confused a reboot may fix this.
17th July 2012, 04:08 PM #12
- Rep Power
Thanks for all the advice guys overwhelmed by the response
Ricki my infrastructure runs on 3 hosts (VMware), Ive taken each of the servers down individually as well as restarting the hosts.
The core switch comes in at a central location with all other switches connected via cat 5 or fibre convertors, have restarted all of them.
Ill wait till schools out and start investigating further, some great advice here
17th July 2012, 05:24 PM #13
what's behind the 2950 as this is a pure layer 2 device and not a router.
17th July 2012, 05:36 PM #14
- Rep Power
The 2950 is connected directly to a linksys switch (unmanged and being replaced during summer break), other than that our edge switches consist of some newer procurve models and then its a bit mix and match (netgear, linksys etc).
Ive just got back from checking each cabinet, definetely no loopbacks
Checked all classrooms and any other points I know of and unable to find any sign of a loopback on the network
There is still a lot of activity showing on the switches around the building, Ive turned logging on and am in the process of setting up procurve manager.
As it stands at the moment the network appears stable, Ive been through the logs on the hosts and there is/was no excessive amount of traffic from the nics/hosts/servers that I can see.
Is there any other tools I can consider using? Ive left a contiual ping running with timestamp from my iMac to the core switch, The LA have informed me someone will be onsite at some point tomorrow to check out the 2950
17th July 2012, 07:12 PM #15
Sorry, I mean whats on the outside of the 2950 as this is a layer 2 switch with no routing functions.
By ddcmp in forum General Chat
Last Post: 30th June 2009, 05:25 PM
By drjturner in forum General Chat
Last Post: 9th August 2007, 06:41 PM
Last Post: 25th January 2007, 12:11 AM
By Dos_Box in forum General Chat
Last Post: 24th November 2005, 10:41 AM
By browolf in forum Hardware
Last Post: 2nd November 2005, 09:59 AM
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)