HELP PLEASE! - random network issue with PCs going offline
I'm hoping someone here might be able to help me work this out. We have a HP Procurve network with a 5406 at the core and a mixture of procurve edge switches.
Last week PCs, at random, started going offline just for briefly (a couple of seconds), long enough for the PCs to lose contact with all the servers but then asking to be resynched so they could pickup their desktop items off the server again etc. Annoying at the best of times, but not great for those using software connected to the SQL box.
It is happening randomly across the entire site. It's not a whole switch at a time - I've observed it happen on my PC without affecting my colleagues PC even though we're connected to the same switch, and vice versa. There doesn't appear to be a set interval of when it happens or anything. Nothing untoward, as far as I can tell, showing up on the switches (using the switches gui).
We have two VLANS one for our VOIP system and one for data. Spanning Tree is enabled.
Has anyone else had similar experiences or have any idea what could be causing it. We are struggling and as more people come back from their hols then we are going to see it more often just because more PCs will be in use.
Sounds like a broadcast storm of some sort. Look at the logs in the web interface of each switch they may potentially tell you of any issues, you may need to turn up the sensitivity of the error detection. Have a look at the logs from the command prompt on your switches especially the core switch, there will be more information in there than on the web interface. Get wireshark and start looking for anything that looks like it's making a lot of noise.
I've seen it happen before where a PC on the network had a dodgy NIC and whenever that was switched on it would cause all manner of weird connection drop-outs on the network. I'd also suggest the Wireshark route, see if anything is broadcasting ridiculous amounts.
I've run wireshark, and to be honest I'm not sure what I'm looking at, but the traffic looks fairly untoward - however I've only had my machine go offline once today and I didn't have wireshark running at the time...
Had a look at the logs on the core switch through the cli and there is nothing of note in there.
Will see what happens over the next couple of days as more people return.
Id check to see if this is a layer 1 issue, check your cabling between PC and Wall Sockets, and check from Wall Sockets back to the switches. If you have an Ethernet tester then test your cabling. Also - what about DHCP logs ? is your network dropping IP's briefly for some reason ?? Where are your IP's coming from ? Check lease times etc. Could it be a dodgy nic on your server which serves DHCP ?
When you say looses contact - does that mean the PC actually disconnects from the network or is it still connected but cannot talk ?
I note that a recent Windows Updates make substantial version jumps in the TCP/IP stack, a co-incidence worth exploring perhaps? Consider rolling back to a restore point prior to last round of windows updates.