Wireless Networks Thread, Have you ever seen anything like this.... strange switch behaviour... in Technical; hello everyone,
i have recently run into a very, very strange problem and i'm hoping at least one of you ...
28th January 2011, 08:17 PM #1
Have you ever seen anything like this.... strange switch behaviour...
i have recently run into a very, very strange problem and i'm hoping at least one of you have run into the same or similar... it might just save my sanity!
let me explain:
we've been having very brief periods of complete packet loss across our main backbone switch (all our switches are HP Procurves of various vintages). during this phenomenon the cpu on the switch spikes and the activity LEDs illuminate in a more solid fashion. This loss of traffic rarely lasts longer than 30 seconds, usually only 5-10. This happens very sporadically, sometimes 2-3 times a day, sometimes not at all.
now, i appreciate this *looks* like it might be a loop in the network, but we have STP enabled on all our switches and artificially inducing loops the lan, across switches causes no such loss. the switch log accessible from telnet reveals that the ports are instantly blocked by STP.
Anyway, further investigation seems to have identified the root cause of this issue -
It's *something* to do with the Marvell Yukon onboard LAN cards in some of our machines.
I had a machine in the office which had been reported as prodically displaying the no domain error and dropping the network and when i tried to reimage it, it would, with a 100% reproducibility cause a complete loss of traffic on the backbone switch - every single time. replace the lan card. problem goes away completely for a few weeks.
now, we have over 60 of these machines, but fortunately, these are all (mostly) contained on a couple of switches, so for a test, i've disconnected the switch where all these computers are plugged in and the problem (so far) has gone away.
so speculating this is likely caused by the lan card spewing either loads of broadcasts or some other malformed transmissions, maybe during bootup causing the switch to be temporarily overloaded dealing with those broadcasts and causing it to drop a few packets - the times of the network drops seems to mostly coincide with lesson start too. it doesn't seem to be related to a specific driver, i've tried several for the card, including the DOS one and they all cause the effect on the defective machine. Note: i'm not using multicast or anything else like that on the machine, it was a simple image download.
but, when this happens, it seems that the switch gets so bogged down with it all that it's very difficult to see that's going on the switch console as the connection is usually dropped.
is there going to be an easy way for me to work out where the rouge device is? other than by turning off each port in turn on the switch and seeing if the problem goes away?
any info/feedback greatly appreciated.
Last edited by 35mm; 28th January 2011 at 08:23 PM.
IDG Tech News
28th January 2011, 10:36 PM #2
Several things come to mind, first up firmware upgrade for the switches. Next would be forcing 100mbit duplex on the cards to see if that alleviates it. Third would be having wireshark running constantly to see if there is any weird traffic causing it.
28th January 2011, 11:04 PM #3
Thanks Synack. i will indeed leave wireshark running on a laptop plugged into the switch - that's a great idea. it should at least reveal the origin mac address of the broadcasts so it might make the job of tracking down the faulty card a bit easier. will do that on monday.
31st January 2011, 02:26 PM #4
think i've identified root cause of the issue.
there are two machines causing the problem, both have mac addresses which are 000000000000
when you ping the machine's ip address, the response gets broadcast to the entire network along with anything else you do (such as browse the file shares etc)
there are loads and other really strange things just going in a loop too. so it seems this is strange problem is causing the switch to get overloaded under certain circumstances.
very, very strange - thanks for the tips guys!
never seen that before.
31st January 2011, 02:33 PM #5
If I remember correctly, we had this issue a few years ago with some RM computers we had thrust upon us. The NIC's were SIS and the drivers were causing the problem.
Downloading the drivers from the manufacturers site cured it.
31st January 2011, 10:53 PM #6
it seems to be something very specific with those cards - i've tried the latest drivers, including the vista driver (the machines run XP normally) and even the old DOS driver - using either the card still has a 00-00-00-00-00-00 mac address.
i've picked up about 4 of these today now - it's such a strange thing seeing all the SMB traffic which is supposed to be going to that one machine being broadcast to everyone. as soon as the machine starts doing anything heavy you can see the cpu on the switch jump up to between 15-30% instantly.
it's under very rare circumstances that there are sufficient broadcasts that the network just grinds to a halt - like downloading a ghost image (not as a multicast, just as a file) you can see all that traffic going over the lan. this really kills the switch.
sadly, as i have discovered today... we have about 120 of these computers (they are Acer Power F1s purchased in 2007 iirc), i'm going to do a bit more digging to see if there is a known specific problem but realistically, it's just going to be a waiting game and just changing the cards as and when.
31st January 2011, 11:11 PM #7
31st January 2011, 11:13 PM #8
Maybe a mobo firmware flash/upgrade - if its only a couple of machines it could either be dodgy firmware or just a couple of dodgy cards.
but i have had strange issues with network cards before. I had a whole batch of pc's (intel nics) which would not work with a Nortel switch i had - swapped it with a HP and it worked fine back to the nortel doesn't work. We where going to replace the switch in the summer but i just happened in may rather the august.
1st February 2011, 10:43 AM #9
If these are integrated MB cards then this is a standard isue with junk oem gear that has not been fully configured. The MAC in those cases shouldbe stored in the bios but never gets entered. The toolkit from the solution section here - The DMI Discontinuity and the Perils of Brand X Computing - Blogs - EduGeek.net - has tools to program this in. “BNOBTC v6” is what you want to search for, can't post a link as there are silly copywrite isues.
Originally Posted by 35mm
By ianniow in forum Hardware
Last Post: 15th December 2009, 02:33 PM
By sychosis in forum Windows 7
Last Post: 14th December 2009, 10:54 AM
By Gatt in forum Windows 7
Last Post: 11th August 2009, 11:10 PM
Last Post: 2nd July 2008, 09:50 PM
By RabbieBurns in forum Windows
Last Post: 14th May 2008, 06:39 PM
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)