How do you do....it? Thread, Managing Downtime In A School Environment in Technical; Hi all,
This a bit of a random one but I though I'd throw it out there and see how ...
12th April 2011, 10:23 AM #1
Managing Downtime In A School Environment
This a bit of a random one but I though I'd throw it out there and see how EduGeekers handle it:
What is your school's expectations for uptime, what maintenance windows are you given, and how do you manage planned and unplanned downtime, particularly on a school budget?
For example, some of the issues we are facing:
- In the past, pretty much all students would finish at 15:00 and after that we could do server maintenance that wouldn't affect staff. Most staff would be done by 16:30 or at least would understand downtime after then and so we could do maintenance at the end of the day. Now, lessons go on until 16:00 or 16:30 and controlled assessment goes until until 17:00 or 17:30 so we can't take anything down during the majority of school days.
- We used to be able to get away with quick reboots of devices at breaktime or lunchtime but clubs and activities now run through these.
- School holidays used to be free for us to do as much maintenance as we needed as long as we warned people, but now even a single day causes problems. We're in the Easter break currently and I spent yesterday from 7:30am to 6:30pm re-racking a server cabinet and despite the fact we warned people everything would be down (core switch was turned off) we still had a few complaints.
- I need a few more hours to finish re-cabling the above mentioned cabinet properly, and I'm currently either going to have to work a night or a weekend, and getting time off in lieu for that will be difficult.
The school basically expects close to 100% uptime, yet I currently have no budget, only half the funding I needed for virtualisation and no money for redundant storage or a proper backup solution.
Is this unusual, or are most schools facing this problem? Our school heavily runs on email and SIMS so any downtime in the day is seen as a major issue. A few months ago the network was down for most of the day (bad switching issue took out most of the network and was a nightmare to trace) and I'm still being reminded by management how big a deal that was and how it must never happen again.
12th April 2011, 10:29 AM #2
We were directed by SLT.
Basically 0% downtime between 8am and 4.30pm unless something has failed and needs fixing.
Any updates etc have to be done out of these hours. During the holidays I can pull the strings so long as I give 24 hours notice.
Thanks to bodminman from:
12th April 2011, 10:32 AM #3
I do know of schools that claim 100% uptime, but they also have very very large IT budgets and the backing to ensure that they have spares on hand for most things. I think the only realistic thing to do is to be honest, set out the true cost of a 100% (or 99.999%) uptime environment and then the cost of environments that get close to that with a few less 0's at the end of the price. Make sure you show them why it's impossible to expect the network to be perfect with a less than pefect budget.
Originally Posted by Duke
I know that my last school IT had been heavily underfunded for many many years to the point where everything was being held together with gaffer tape. In the end I turned round to the head and the governors and showed them that they were expecting me to keep £1 - 2 million worth of IT equipment running without any money, when the majority of that equipment only had 3-5 year life spans and we were already past that with 80% of the kit. When asked what alternatives they had to investing money I was honest and told them the alternative was to remove IT suites and interactive whiteboards until we kit a level of IT penetration that suited the budget they had in mind. That shocked them a little, but the next day I had the green light for massive IT investment over five years.
I think many schools are still treating IT as something that can be strung along on a shoestring and as you point out there is an ever greater reliance on IT systems to the point where it's just not fesable to underfund to the levels that are sometimes talked about.
I hope my above ramblings helped a little
12th April 2011, 10:33 AM #4
The way I manage it here is this - I give notice of downtime, and do it. Its not possible to run the system otherwise.
I try not to do any downtime during term, as this would unfairly damage T&L, but out of term time I have to do it. I don't get paid to work evenings or nights, so I'm certainly not going to do it then.
The options as far as I'm concerned are as follows:
1. Do downtime as and when needed, warning people in advance, and work with existing budgets.
2. Demand a larger budget to allow you to cluster everything, and remove the effects of downtime.
3. Don't do any maintenance and watch the systems crash and burn, causing massive disruption and unwanted downtime.
The way to present it to SMT is this - you wouldn't let a small piece of rust on a car stay without treatment. You'd fix it as soon as you could, taking the hour or so required to fix it. If you didn't, that bit of rust soon turns into a massive problem and you end up with your car off the road for days.
Remember, whatever you do, you'll always get people complaining. Just remember Scotty's rule - estimate the time to do the task as being twice what you really think, then finish in half that time, and you'll forever be seen as a miracle worker.
Last edited by localzuk; 12th April 2011 at 10:47 AM.
12th April 2011, 10:42 AM #5
And to answer the original question properly - this is how I normally manage downtime. There are occasional bits that I may do out of hours during term, but I have quite a good relationship with my line manager so could reclaim TOIL during the holidays for this sort of thing.
Originally Posted by localzuk
I'll have to remember tha analogy for future reference . And I follow a similar estimation rule - much better to deliver early and be thanked than deliver ontime/late and get complaints!
Originally Posted by localzuk
12th April 2011, 10:45 AM #6
We try not to do things (unless vital) between 8am and 4.30pm. If we plan to do soemthing later on I usually get someone to come in late and stay late but that's only for vital stuff. Other than that we have an agreement that the 1st Monday of the holidays the system is "at risk" - I confirm two weeks before the impending holiday if that means small areas down or a whole network outage while we re-arrange the server rack etc.
People seem happy enough to work with that. I think as long as people are given plenty of notice it's fair enough to take things down occasionally.
I do also end up remoting in at night to do some work too but the school are very flexible when it comes to time off so I can't complain.
Thanks to jcollings from:
12th April 2011, 10:56 AM #7
I've worked in places which expected similar things of me and to be honest the ones who complained were generally the ones who didn't listen that maintenance was scheduled and instead would come in and complain that they couldn't use the system. You will always get people who complain but you have to ignore most of these, but then mostly downtime was kept to during the holidays. If people were warned that there was going to be downtime they should listen.
However as others have said, if you are being pressured into feeling that you need to provide 100% uptime of services you need to breakdown the true cost of this. Without sufficient budget these things often aren't realistic and highlighting what you can realisticly get with your budget should help the SMT realise they are asking too much.
In addition, I wouldn't work over for nothing either, the more you do that the more they expect you to do that. But is this your SMT that are making these demands or all staff? If this is your SMT and you have neither the budget nor the manpower then they need to revise what they expect you to do. If they still want you to provide support 24/7 then they need to give you the resources to do this. If it is all staff who are complaining tell them to speak to your SMT.
12th April 2011, 11:01 AM #8
We don't do anything that could affect the system between 8 & 16:00. Thursdays are marked down as maintenance or patch day with notice. So we let everyone know via a traffic light system whether it will be SIMS or the whole network. Summer we usually publish a week where the system will be unreliable so staff don't come in with the expectation of working. This allows us to drop the servers or switches when we want, do a full reboot of stations, etc, with out having to check everyone is ok.
I know the feeling though, more and more we are being expected to keep the system up & carry out anything in our own time, but with no overtime & TOIL is a joke. Same as you, holidays used to be our time with a few exceptions but now that is being eroded. Add to that we don't have that much time as we have to take holiday in holiday time, summer is restricted as that is when we do most and have to take xmas, it really is going to need a shift in expectations or school policy at some point.
Thanks to TechMonkey from:
12th April 2011, 11:49 AM #9
I had much the same issue, wanting as near to 100% up-time as possible on a limited budget. This is where I found open source virtulisation and storage comes in to its own - it turns out you don't actually need to spend any money at all on anything except hardware, all the software to do what you want is available for free.
Originally Posted by Duke
12th April 2011, 12:04 PM #10
Wow, thanks for all the quick responses everyone!
I can generally live with that and think no planned downtime in school hours (8:00-15:00) is perfectly reasonable. However, if I finish at 16:30 and can't do any maintenance between 15:00 and 16:30 then I'd have to either do a weekend (which I'm willing to do, but it inconveniences the Site Office having to take off alarms for me and I'd rather not make it a regular occurrence because it'll become expected of me) or work late (which I don't mind doing either, but again I can only work until 18:00 because of alarms and I don't want to make it a regular thing). If I don't do weekends or working late then it's 6 weeks between each holiday until I can do real maintenance.
Originally Posted by bodminman
Definitely, thank you. Management are generally pretty good at realising we need proper funding, but that last three years have been pretty tight and everyone is facing budget cuts this year and next year. I've just been promoted to head of department so I'll be working out an IT strategy that'll lay out what budget we need - unfortunately because we haven't had much budget the last few years a lot of desktops really do need replacing while I've also got major infrastructure upgrades to do.
Originally Posted by Soulfish
I think management just have this idealistic view that it never breaks and never needs maintenance - I wish! (although I suppose that would put me out of a job) 1. = my preferred option, management have just made it very difficult to do. 2. = not happening this year or next year due to budgets. 3. = my conscience wouldn't let me.
Originally Posted by localzuk
I think I need to set up a day a month when maintenance is expected. Generally speaking people are fine with the downtime as long as it's properly planned, but yesterday's maintenance was calendared for a while ago, then cancelled because someone needed the network up, then it was moved again, cancelled for the same reason, and now I'm trying to fit it in when it should have been done a while ago.
Originally Posted by jcollings
Completely agree, the realistic costs to match expectations will go on my new plans...
Originally Posted by penfold
As I mentioned, I think I need to organise a maintenance day - e.g. server reboots will be happening between 15:30 - 16:30 on a certain day. Don't think management will like this though. Technically my job doesn't get me any overtime pay or TOIL (except for weekends) now I've hit a certain grade, however management are flexible on this. My bigger concern is that it would become expected of me and it would be assumed I'd be happy to work whatever hours are needed.
Originally Posted by TechMonkey
I agree up to a point. I find if you want true redundancy and failover then VMware offers some very nice options, but at the end of the day the biggest costs for this type of thing have been hardware which is hard to avoid if you want reliable kit with a full warranty.
Originally Posted by dhicks
Many thanks everyone,
12th April 2011, 12:10 PM #11
It's the old story. Schools expect to have the latest kit but cannot afford it and it doesn't help that the government keep pushing for new IT courses which need up to date resources; the students and staff then get upset when it doesn't work or is so slow it may as well not!! We only reboot in term time as a last resort and then never during lesson time but we have the expectation that holidays can be used for 'proactive' maintenance and that there will be disruption for those who cannot bear to be away or have no life outside school!!
You could always be a but sneaky and have a 'practice' network crash and blame it on not having enough time for maintenance ;-)
Computers are stupid machines which can do clever things. Programmers are clever people who can do stupid things - this combination can cause mayhem!!
12th April 2011, 12:43 PM #12
If you pay out for the kit and then look at the free software - Linux KVM Which is an open source virtulization platform which can do things such as live migration. The only thing lacking at the moment is the automatic failover but i'm sure it will appear soon.
Originally Posted by Duke
We currently run aprox 70% of the network on the platform and its alot quicker then esxi IMHO (plus a few articles on the net)
12th April 2011, 12:57 PM #13
In my experience, a warranty is just there to stop you fixing stuff yourself. Modern computers aren't dangerious mechanical devices, they're just a bunch of standard components shoved in a case - build your own servers and you don't have to worry about invalidating a garuntee, you can just swap the broken component out and carry on. Saying that, modern off-the-shelf components are also, generally, very reliable.
Originally Posted by Duke
12th April 2011, 01:32 PM #14
I disagree. For example on our dell servers we have a 4 hour warranty. If the motherboard dies in one they will be here within 4 hours to fix it, any time of the day.
Originally Posted by dhicks
If i was to fix it. I would first have to:
- 100% make sure that it is the board
- Find a replacement - may take a few days
- Replace the board myself - My take longer and may break something else.
Particular parts like motherboards can be hard to find when the computer/server is getting older. With a warranty you know they will be available fairly quick.
Meanwhile the days and hours are ticking and this will have a knock-on to lots children’s education.
You also have to take into account the time it takes you to do these jobs. You have to remember your not free.
The extra time that is spent resolving the issue could also be better used in improving the children’s education.
Last edited by FN-GM; 12th April 2011 at 01:37 PM.
12th April 2011, 01:32 PM #15
Downtime to any business is crucial and unless you have everything covered in total fail-over and redundancy then you are in fact going to incur downtime.
How you manage that downtime is something that you and the SLT are going to have to sit down and work out.
Then write up an SLA to which you and the SLT agree to and there you have it.
You as the NM are required to be as flexible as feasibly possible and the school also has to be flexible in its thinking, give and take should result in a very manageable process which gives both you and the SLT total faith in the IT infrastructure and the management of it.
Work life balance is something we should all be aware of and can lead to some serious health issues for people but if managed properly by SLT and yourself then it should be a progressive partnership.
The whole school then benefits from this and you will be recognised for the efforts of you and your team.
By UKDarkstar in forum Virtual Learning Platforms
Last Post: 22nd February 2010, 05:14 PM
By RallyTech in forum General Chat
Last Post: 19th February 2010, 12:10 PM
By Flakes in forum Internet Related/Filtering/Firewall
Last Post: 30th November 2009, 10:21 AM
By John_Howarth in forum MIS Systems
Last Post: 12th October 2008, 09:33 PM
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)