What a fun morning... For about 30 minutes for no reason that I can discern, my school network went partially down.
Some rooms were completely without network access and in others it was working just fine. I even had one lab where some kids could logon, but their neighbors could not.
All our servers were up and had no errors. I could print from our print server to some places, but not others.
The best part is that after about a half hour without me doing anything that I thought would fix the issue it all started playing nice again.
I'm thinking a faulty switch, but the randomness of the locations with issues versus the mostly unaffected has me a trifle nonplussed.
I guess I'm just thinking out loud and looking for some ideas.
That was my first thought too. A faulty switch is also a possibility. How about broadcast storm?
Hmm - just some first thoughts:
Network loop somewhere ?
Do you use spanning tree - maybe there was a change in the topology and the network had to re-converge ?
Device plugged into network with dodgy network card - maybe a printer or laptop that is rarely on ?
We get occaisional stutters for, as far as I can tell no good reason, and I tend not to worry about them unless they recur in a sortish time frame.
Your lucky, today all our network was totally down, the backbone fibre switch packed in. What a nightmare! There was no internet, no connection to the servers. No SIMS & Pars and it also brought down the biometric system.
The phone never stopped ringing!
The only people who had network & internet access was us
Sounds like a loopback to me. I have experienced the same type of problems.
I not aware of any changes that would occur in the midmorning that would cause these issues. No new hardware or software and no recent system or configuration changes. I've had to replace a couple of faulty switches a couple of weeks ago, but otherwise it's been quiet.
I aware of the concept of a broadcast storm and I thought I knew what a loopback address is, but I'm not bright enough to discern how a loopback would occur and cause these type of symptoms.
Loopback - one possibility, especially in schools:
A kid unplugs the cable from the back of the machine and then plugs it into another network socket. Therefore you have a cable going from one network socket straight into another one. Maybe after 30 minutes they realised what they done had caused a system halt and unplugged it.
We had a dicky nic the other week that took out an entire VLAN and our phone system and connection - it certainly foxed the phone provider - as we had to keep getting our BT lines reset to clear the problem until we tracked it down. We thought it must be a power problem as we didn't think it would take out the phone system as well - so we took 2 days to diagnose it.
This exact fault occurred for me in one of our it suites, it turned out to be a faulty cooling fan on the switches(Cisco) in the cab causing the switch to gradually overheat underload and reboot.
This could be seen on the console we had connected to it, purchased some new fans and all was well again(and still is!)
It's now happened twice more in the last three school days. I'm trying to remember, but it may have occurred around the same time each day during a homeroom period. A disquieting coincidence that may indicate malfeasence if not for the easy solution to the problem.
I find that resetting our first set of switches after our servers instantly fixes the problem. Leading me to hope that the solution is simply to replace one of the 4 switches in that room.
Any other suggestions?
If you haven't already done so, have a look at the management interface on the switches to see if anything looks funny. We had one switch with a faulty port which had errors on one packet in 5 or so and fell over at a few thousand, so it took a little while to actually happen - a quick look at the management for the ports identified it.
Another way to trace it would be to wait until it happens again and start unplugging cables from your main switch and see when the problem clears; that will tell you which satellite switch is causing the problem, so you can go and repeat the process on that switch to identify hardware fault or port(s) which have been linked with a loopback.
There are currently 1 users browsing this thread. (0 members and 1 guests)