Power Surges, Cuts and server downtime... URGENT help please...
I am in desperte need of some ideas..
Over the past 2 weeks we have suffered from loads of problems, it started with a huge power surge, loads of buildings had power outages and when my network manger got into work on Sunday morning to see why our VPN was not responding, he discovered everything had faults. Servers (4 of them) were all down and when booted up 2 came up straight away, one took several attempts and one had RAID failures on 2 out of 4 and basically made it a write-off.
ALL of the network devices needed to be re-set, all had returned to factory defaults including WAPS, Modems and Routers etc.
We have a replacement server waiting HP to build it soon. Then we do the whole migration of datbases (SIMS Etc) as we had to install them all onto the domain server to get back up and running ASAP. We have also bought another UPS and daisy changed it, and tested it, to make sure we get over 15 minuntes for the main 2 servers.
Then over the last 2 weeks we have had several power outages and lightning strikes in the area. We have several occasions on evenings when there has been one server, but not another on the same UPS report a re-boot due to power failure on a morning after. Tonight we had another power surge of some sort and all servers shut down. This time they did not re-boot as the main power fuse tripped...
So, what to do? I have a principal who is livid and wants answers... well solutions really....
My initial thoughts are:
1) Bring in a specialist electrian/electrical company who can put in place some monitoring systems and help tell us if we are tripping systems internally at night time (we have issues during the day) due to flood lights and room lights being on OR if it is spikes/surges/cuts that are coming in from outside.
2) Put in some more surge protection. Currently everything is run off standard circuit breakers (trip switches / fuses). There is no specialised surge protection, perhaps there is something we need to invest in? We have individual surge protectors (5 socket extension leads with built in surge protection) on every PC and had none of these trip or cause problems. Is there something like this that we can use?
Please can anyone offer any suggestions? Thoughts?
Clearly the root of your problems are power related, however any decent UPS 'should' be protecting your servers from this sort of damage. It's unclear why your servers were damaged if they were plugged into a UPS.
I would highly recommend you get an electrician in to spot anything obvious. Your servers should be on their own power circuit and not shared with anything else. If you have many servers/switches, you should try to work out approx. how much power it's using and look at the wiring configuration you're using.
It's interesting one server rebooted connected to the same physical UPS. I would replace the power leads, but also monitor the temperature of the server. There's also a possibility the PSU is faulty or it's being maxed out, forcing a restart.
Might sound like an odd question but are you UPSs actually surge protected? Because not all of them are, so you might even need a surge plug on a UPS believe it or not.
Power surges on unprotected UPSs would burn out the batteries, you can ofcourse test these if you want, I'd plug 1 desktop and only a desktop into a UPS and then turn the power to the UPS off and see what happens. The new UPS might die instantly.
You also dont mention if the servers have two PSUs etc and if they are both plugged into 1 circuit
As Achandler has said, check you've actually got surge protection on your UPSs - some ports are surge only, some battery only etc...
Also, you say you 'daisy chained' UPSs? This is a big no-no from what I remember, as the output of one unit will be a simulated sine wave (when on battery), which the second UPS will think it is a power surge, and channel the power back into the first UPS potentially damaging it and possibly the second one also.
Instead, you either need to split the devices between the 2 UPSs or you need to replace both with a single UPS large enough to handle all the devices attached.
Work out which electrical circuit/phase your servers are on. You may need electricians for this. Work out what else is on that circuit. Enquire as to recent electrical work on that phase. Trace it back through smaller electrical boards to see if there's an isolated local issue.
If you have access to the main breaker panel where the phases go in (we have three armoured cables going into ours. Is the outer cable sheathe warm? Hot? A bit tacky/melty? If one cable is very hot, but two are barely warm / cool, you could probably do with having the load balanced. Test this at peak load.
Round up all the electric heaters in the school and lock them away.
If you managed to get hold of one of those British Gas energy meters, dump the .csv from the web interface and look at the voltage fluctuations during the day and overnight. Ask the kitchen when they start cooking - that's a major load spike for us.
As you're finding out, IT gear is much more picky about the power supply than nearly anything else. As others have said, having a dedicated circuit from the main panel to your server room does negate most issues that co-workers can introduce via excessive load.
Last edited by pete; 4th November 2011 at 10:33 AM.
Over here (note, the OP seems to be in Portugal) I'd be on the case of my electricity supplier demanding recompense for damaged kit. Beyond that, a UPS specialist would be a good idea if you don't have the skills (not many really do) to specify a system to protect the critical servers.
Thanks for the advice I will start putting together an urgent action plan and get things moving; i´ll post it here when its complete too..
One note I have already checked up on, our daisy chained UPS solution is possible because they are both APC units and the software when setting up the new one of the two suggests this configuration and runs through doing it. So the units know this is happening... clever stuff by the sound of it..