garethedmondson (25th February 2009)
I was in the usual ICT weekly meeting this morning and the Deputy Heads has got me all twitchy.
Basically he asked how long it would take to get the PDC or any server back up and running if we had a disaster.
I replied approximately 2 to 3 days as long as everything went well. Reply to that was that it needs to be backup and running within a day or less.
I promised that I would look into it and one of my suggestions was mirroring all the servers onto a Virtual Server. I have no idea if this would work or even how to go about it.
Also any other suggestions please? I'm thinking that money would not be too much of an issue, but I don't want to go stupid on it as I have plenty of other plans on the go that need to be implemented to bring the school up to date.
Thanks in advance everyone,
garethedmondson (25th February 2009)
2 to 3 days is a reasonable time span.
Yes, a virtual copy would help - but what if the server room burnt down?
Not much cop then - an offsite backup and some hastily borrowed servers would be more use
virtualise as many roles as you can and have 2 separate servers with the ability to run all the servers at once, possibly with redundancy built into the servers ( so for istance having 2 dhcp servers live at any one time) but only running with half of the servers during normal operations. Place them in seperate buildings if possible. Have each server backup to the other one and to a tape.
some servers will be tricky to virtualise en masse, just ensure they're well backed up and if possible have some spare machinery to restore to.
Time for a disaster recovery plan me thinks!
One thing we've done is split the servers between two rooms in seperate buildings, so that if one building goes down we can still run from the other building.
The other idea I've been thinking about is to maybe club together with other schools in the LEA to have 1 or 2 spare servers capable of running a few virtual machines kept at the LEA data center so that if disaster struck we could grab these servers and get the core services back up and running quickly. This would be especially useful for stuff like getting SIMS back up quickly.
I have managed to get a PDC wish DHCP file shares printers and stuff back up fully in a few hours. The problem with that is they may expect that next time...
Tell me about clustering then (well point me to a website if you can so I can read up on it)
Any starting guides to virtualiation as well?
I'm pretty impressed the Deputy head seems to know what he is talking about.
DRBD is for. It mirrors block devices (disks, partitions, RAID arrays, encrypted partitions) between machines over a standard TCP/IP network. The documentation is comprehensive and well written, check the website. I use DRBD in conjunction with the Xen virtual machine system running on CentOS, all of which is free. Just download the CentOS CD images and install on your server - the install wizard even gives you an option to select to install Xen with no further setup needed.
DRBD works as a standard Linux disk driver, so should work with any Linux distribution or VM system you want. It will help performance if you have a dedicated network card in each machine for DRBD to use for mirroring. Don't worry too much, though - people seem to forget that disk reads generally far outweigh disk writes in normal operation, and it's only the disk writes DRBD needs to send across the network.
I'd say split your machines between two locations in your school and aim to get as fast a network connection between them as you can - it might be worth spending some of that spare cash on a couple of decent switches linked with fibre so you can mirror drives. Also worth investing in decent UPS', ones that don't trip out during power surges.
The Xen documentation suggests that live switch-over of a running VM is possible with around 0.8 seconds of downtime, although I am not inclined to go and unplug one of the servers to test this right now.
One question no one has asked (amazingly) is what type of setup do you currently have?
I can only presume you operate Active Directory, DNS, DHCP and that it also acts as a File and Print server. What other roles does this server have?
- How many users are in your network?
- How many hard drive(s) do you have in your server and how much disk space (roughly) are you using?
- What backup solution(s) do you currently use?
As a recommendation RAID is a cost effective way of adding redundancy to a server. There are different types of RAID, however I would of thought RAID1 or RAID5 would most likely be suitable for most setups. The most common point of failure in a server (in my opinion) are the hard drives themselves, so it does make a lot of sense to implement RAID1 at least on the system drive, which would host Windows, Active Directory, DNS and DHCP.
Installing Windows 2003 from scratch, any required drivers and then performing a System State restore can easily take a few hours itself to restore. If you implement at least RAID1 you do minimise the risk of building from scratch dramatically.
This then leaves just restore user data, network shares and application(s) you may be hosting. Backing up to tape or a NAS is ideally what you should be looking at. NAS boxes are more popular these days and restoring data is also much faster too. The advantage of NAS also is that you can strategically install it anywhere in the building, away from a server room, but also the cost is much lower than a tape drive per GB. It's not uncommon (even for small schools) to have a couple hundred of Gigabytes and several Terabytes in Secondary schools.
First thing I'd ask is "What's the budget for equipment and software to be able to implement a less than 1 day disaster recovery plan"? Just because someone high up demands it, doesn't mean it is possible, unless appropriate funding is made available.
For example, the best way, in my mind to deal with this would be to have
a) all servers virtualised, and the disks stored on a SAN.
b) you take snapshots and store them on a second SAN, somewhere else on-site
c) you have a set of redundant spare servers sat waiting somewhere - presumably in the same place as the secondary SAN. (or as dhicks mentions mirror the block devices, but I can see this as being a bottleneck possibility).
d) you have dual core switches for the servers, and all edge switches connect to both, and possibly via different routes. A restored server is not much use if you don't have a network to connect across...
A thing to remember here is that you should have this formalised in a disaster recovery plan.
The server room floods or there is a fire.
How long now?
You can't eliminate disruption in the event of a disaster, the closer you get the more it's all going to cost.
Neverfail can mirror windows servers and allow a passive server take over if the production one fails. No idea on pricing but might be worth a look - supports domain controllers, exchange, file servers etc.
Neverfail - Products
On the disaster recovery side obviously would be great if the backup server sits in another building
Last edited by ssiruuk2; 24th February 2009 at 09:37 PM.
Thanks to everyone so far. Plenty of reading there. Thanks David for you comments and also Michael and Localzuk.
I'll also be looking at ssiruuk2 idea as well. For the moment I just want to get some stuff on paper and present it to the Deputy Head. I'm all for it, but I don't think any of the school staff understand it's not just a one button push for a rebuild.
Any more suggestions are more than welcome.
There are currently 1 users browsing this thread. (0 members and 1 guests)