@witch - Has the server itself frozen or is it they cant access the shares? Are you able to log onto the Server at the console?
I don't know if this is the right place to put this as I don't know if it is a server problem but here goes..
The server is an HP Proliant running Server 2008R2
It has been pretty reliable up until now.
Randomly, every few days, someone will come and tell me that they can't save as everything has frozen. I have to restart the server to unfreeze it.
There is nothing that I can see in the logs that would explain it.
I thought it was when the load increased - eg when a classful of children were logging on together but it has just done it at lunchtime when minimal people are logged on. (three weeks running it it did it at 1.20pm on a Monday)
I have solved it once by switching the switches off and on again - but that could be the removal of the load on the server that made it all unfreeze.
Has anyone ANY ideas what I can look at?
The switches are unmanaged and I have only managed to get hold of two spares that I could swap out. I had targeted the switch with most of the IT suite connections on it but today's event might make that a bit pointless.
I am really stuck this time as I have absolutely no idea where to start.
Sorry - should have said - yes, I can log in at the server so probably server freeze are the wrong words
This sounds familiar to a problem we had last year....we had to update the drivers for the network card on the server and that fixed the issue. But its worth mentioning we also had a lengthy remote session with an engineer who tried various other fixes some of which he didn't log with us but I believe the driver fix resolved the issue. I have put my notes below from when we had the problem hopefully they will be helpful
Serv1 following problems;
Network is spiking heavily and grinds to a halt we are getting completely disconnected on the client side the mapped drives just drop out and we cannot ping the main server, all other servers running fine
I restarted the server and when launching the DNS it told me that the server was unavailable and gives me a box to connect to DNS SERVER when I select 'this computer' it fails again, on restarting the DNS service the issue is fixed and the network begins to work again
I have pulled a log from our main switch just encase that is causing an issue which I doubt. We are also seeing a pattern of when the server goes down it seems to be every 2 days and between 12:00am and 7:00am. I am going to try and reboot the server tomorrow night to see if that will delay any problems on the Wednesday.
Rebooting the server the night before has delayed the issue the issue reappeared on Thursday
Engineer reviewed log from switch and can find no issues
Remote session with Partnership Engineer they have looked at the DNS advised that we update Ethernet drivers
Updated to the latest Ethernet drivers server has been running for 4 days solid no issues
Last edited by Griff; 11th June 2014 at 01:57 PM. Reason: When I say spiking heavily I was able to open resource monitor to view activity on the server
Are the switches on a UPS?
What else happens at 13:20 on a Monday?
Do the affected switches share a distribution board / supply with something like the canteen or other high "surging power draw".
If this were on our site, I'd be looking at power first.
I would also be looking at the NIC drivers, there are some versions for HP servers that are really borked. Be careful when updating though, do some research to check if anyone is having problems with the version you are going to update to.
Proliant MD/DL3?0 G? depending on the model it may have an integrated log viewer which will tell you about most hardware things that are afoot.
I would also look at the drivers, possibly set up performance counters to log during the day for stuff like Read/Write on the NIC and the drives along with the queue on each.
I would also look at the AV and make sure it is not collapsing connections that it does not like. We had all sorts of fun with Nortons AV cutting off connection from machines it thought were infected rightly or wrongly.
Another thing to check is turning off TCP acceleration and offloading in the NIC driver, especially if you have landed intel network cards in there and double especially if it hosts VMs. Had all sorts of issues with shoddy intel drivers going nuts and dropping connections to VMs.
EDIT: Watch the firmware versions as people have said, there was a firmware update a while back that bricked onboard NICs in hp servers.
Last edited by SYNACK; 11th June 2014 at 04:07 PM.
We have had some external support in over half term and we looked at the HP info together - we updated the NIC drivers, making sure that they weren't a problem but couldnt find anything wrong. When it all goes wrong, and I reboot either the server or the switches - which aren't on a UPS BTW - it all just quietly comes back. Once the server is back up the shares re-establish themselves and away it all goes.
However, and I have only just remembered this - the reason I thought the server had frozen was when a whole class were trying to log on, all PCs got stuck on "user profile" I think.
So we have that issue alongside the shares dropping out. No red crosses though, just no ability to save a doc to the share or open one on the share.
I am very confused
At 13.20 on a Monday a whole class logs in in the IT suite. But they weren't doing it today and it still all went wrong.
Will check to see if they are intel drivers.No VMs thank goodness. I've never had an issue with ESET - where would I look please?
Get a laptop and do some testing.
Can you ping the server from a spare port on the main switch the server connects to? What about try other switches to see if there is a problem.
Have you checked the server logs to see if anything stands out?
Are some users unaffected during the problem times? Id like to think if its a server problem everyone is going to be having problems not just those who can be isolated.
Is the problem isolated to certain hardware? Had any configureation changes been applied that can be ruled out?
Have checked the server logs - nothing. I have no idea if some users are unaffected as they aren't all logged on or working in their "My Documents" or the shared resources drive.
No changes in hardware. Only microsoft updates
Will try and ping on Fri - not there tomorrow
If you can log onto the server, what do you see in terms of CPU, Memory and network bandwidth use? Is the storage in rude health (sometimes things failing to quite fail can cause bottlenecks) ? What else happens at 1:20pm (sounds like a good time for a shedfull of activity where I am - start of class, someone telling all the kids - "OK, copy the Lord of The Rings MKV that I ripped from blu ray to your local drive from the server" ).
Everything looks fine according to me and the external support guy who is very good.
What happens at 1.20 is the class logs on - but as I said, it happened today during lunchtime when no more than 10 people, and probably less, were active on the network
Could be a dying disk controller. We had similar probs a few years ago
Eep. Eset. Lots of people like it, I'm not one of them after their updates hosed our exchange server twice in a row and caused all sorts of networking issues. Everyone else seems to like it though.
I would look for anything labeled intrusion detection or network activity checking, connection throttling etc. And disable all of it.
I have also had it kill the entire network stack on client machines and only release it with a removal and reinstall. Luckily as it works most of the time it may just be hitting a connections per minute throttle and locking it out.
There should be detection logs somewhere in the application on the server and a section for network related stuff, throttling should be listed there but don't rule it out even if its not.
They do change their engine behavior without warning so eset could easily have tweaked something in an update to wreck your day. They did it to me enough times.
There are currently 1 users browsing this thread. (0 members and 1 guests)