+ Post New Thread
Page 2 of 3 FirstFirst 123 LastLast
Results 16 to 30 of 39
How do you do....it? Thread, Managing Downtime In A School Environment in Technical; Originally Posted by FN-GM For example on our dell servers we have a 4 hour warrenty. If the motherboard dies ...
  1. #16

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,683
    Thank Post
    1,268
    Thanked 789 Times in 686 Posts
    Rep Power
    237
    Quote Originally Posted by FN-GM View Post
    For example on our dell servers we have a 4 hour warrenty. If the motherboard dies in one they will be here within 4 hours to fix it, any time of the day.
    That's still 4 hours - with cheaper hardware you can afford to have spare capacity, so a hardware malfunction doesn't give you any downtime at all. In that situation it doesn't matter if it takes a day or two to get hold of some parts. Modern computer equipment does fail but relativly rarely - the chances of another server conking out while you're repairing the first one should be vanishingly small.

    Particular parts like motherboards can be hard to find when the computer/server is getting older.
    An excellent reason for using off-the-shelf parts - I can nip down the road to Maplins and get a motherboard, harddrive, etc, in under an hour, no problem - in four hours I reckon I could have an entire server built and ready to start running virtual machines.

  2. #17

    GrumbleDook's Avatar
    Join Date
    Jul 2005
    Location
    Gosport, Hampshire
    Posts
    9,992
    Thank Post
    1,359
    Thanked 1,828 Times in 1,135 Posts
    Blog Entries
    19
    Rep Power
    602
    There is a difference between downtime and routine maintenance. Downtime during the working day would only ever be for very clearly stated circumstances.

    1- there is a fault which is already affecting most/all the school and it can only be rectified / fixed / worked around with some immediate action. The difficult choice comes when deciding whether it is better to do a work around (ie get the service running again even though there is a risk that the fault could occur again) or whether you should do a full fix. If you *have* to do a work around you should get an allocated period of time to allow you to investigate and plan on how to solve the problem on a permanent basis, and then get the allocated time to do that chunk of work too.

    2- there is a significant risk (eg data protection, security, elf & safety, etc) which needs to be rectified otherwise the school may end up in breach of stutory / legal duties to staff, students, parents, etc.

    3- you are instructed to do so by the legal authority in the school (Head or other person with delegated authority such as the Child Protection Officer), or an external authority with legal rights (eg Bailiff or Police).

    All of the above will mean it covers you for zero-day exploits, virus attack, serious hardware failures, etc.

    Scheduled maintenance should be what it says. The schedule should be authorised by you and agreed on by the Head (or the person the Head gives authority to). It should fit round T&L activities ... but it is a careful balancing act to ensure you get what you need done and the staff / students who are in during the hols can still do stuff. A change management group can help here. You don't have to shoulder the whole burden on your own.

    A few things that can help.
    Have a test system so you can show that in your planning you have fairly accurate data for how long patching / updating / work takes. The Scotty rule works well, but if you are always seen a miracle worker then they will always expect miracles. Managing expectations properly is better IME. When you publish schedule works also give a rollback time too .. So that if someone wants to stop you just as you start then you can let them know when you are past the point of no return, or how long it will take to get things back to where they were before.

    Hope these help.

  3. Thanks to GrumbleDook from:

    Duke (12th April 2011)

  4. #18

    GrumbleDook's Avatar
    Join Date
    Jul 2005
    Location
    Gosport, Hampshire
    Posts
    9,992
    Thank Post
    1,359
    Thanked 1,828 Times in 1,135 Posts
    Blog Entries
    19
    Rep Power
    602
    @dhicks ... a warranty is also about risk management ... You are putting the risk onto an external party and reducing your own. If you have the capability to do your own repairs and have the capacity then that is fine, but not everyone does. Most try to sit somewhere in the middle. It is sometimes cheaper to have a warranty than you own set of spare parts to use to fix things. I know on servers I would not expect schools to have a complete set of spares and the time to take to order a spare part is the same to get an engineer out to do the fix. For desktops most people will have spare parts (even if just cannibalised from old machines) which they can use until they get replacement parts in. A number of resellers / vendors like this and will make life easy for you, but the warranty is your backup on the desktop.

    Warranty on other hardware is also vital ... Switches, projectors, etc ...

    Horses ... courses ...

  5. #19

    FN-GM's Avatar
    Join Date
    Jun 2007
    Location
    UK
    Posts
    16,228
    Thank Post
    894
    Thanked 1,779 Times in 1,533 Posts
    Blog Entries
    12
    Rep Power
    462
    An excellent reason for using off-the-shelf parts - I can nip down the road to Maplins and get a motherboard, harddrive, etc, in under an hour, no problem - in four hours I reckon I could have an entire server built and ready to start running virtual machines.
    Its not ideal if you are running something that cant be virtualised. For example you have a big SQL server. The board goes and you replace it with another board. the chances are you will have to reinstall the os and it will take more time.

  6. #20

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,683
    Thank Post
    1,268
    Thanked 789 Times in 686 Posts
    Rep Power
    237
    Quote Originally Posted by GrumbleDook View Post
    If you have the capability to do your own repairs and have the capacity then that is fine, but not everyone does.
    That's the thing, I think if people skipped paying for 4-hour-response-time warrenties there'd be more than enough money to have spare capacity to cover in case of hardware failure.

    It is sometimes cheaper to have a warranty than you own set of spare parts to use to fix things. I know on servers I would not expect schools to have a complete set of spares and the time to take to order a spare part is the same to get an engineer out to do the fix.
    I don't see any reason to use anything much different in a server as a desktop, and therefore no reason not to have a spare power supply, harddrive, etc to hand. Servers are just computers, you can throw them together out of any old bits.

    Warranty on other hardware is also vital ... Switches, projectors, etc ...
    I guess this depends on the size of the school, the level of service expected and the importance attached to system failure, however I think there are a lot of schools that would manage just fine with having a half-decent spare switch (£300 worth of equipment) in case of the failure of another and a reliable next-day delivery company for projector bulbs. I get the impression lots of schools might be paying for somewhat over-the-top garantees that promise to replace equipment within 4 hours when there's no point - four hours is still the best part of a school day that you've lost, if you want proper reliability then go for spare server capacity and on-site spares ready to swap in.

  7. #21

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,683
    Thank Post
    1,268
    Thanked 789 Times in 686 Posts
    Rep Power
    237
    Quote Originally Posted by FN-GM View Post
    Its not ideal if you are running something that cant be virtualised. For example you have a big SQL server.
    Why wouldn't you be able to virtualise a big SQL server?

  8. #22
    Duke's Avatar
    Join Date
    May 2009
    Posts
    1,017
    Thank Post
    300
    Thanked 174 Times in 160 Posts
    Rep Power
    58
    Thanks again for all the replies everyone - nice to see I'm not alone in having to deal with these high expectations and budgets that don't necessarily match them.

    I completely agree with all the points about planned downtime during the day, and let me make clear this isn't something we do here. My point about breaktime/lunchtime outages was if you really needed to reboot a server to get something working properly you could get away with it during lunch, whereas now there's no chance and I can't even do it at the end of the school day either.

    Although I have no problem with open-source solutions and home-rolled hardware in the right situation, I do side with FN-GM and GrumbleDook on this on in most cases. I think a lot of it is about the peace of mind that things will work, and whose fault it is when something breaks. For example: If I build my own in-house web filter using off-the-shelf hardware - who gets blamed when an OS upgrade doesn't support the RAID card I've used, or when the open-source filtering project I use gets discontinued, or when the filtering doesn't catch something it should have and it becomes a child protection issue? I don't mind taking responsibility for my network, but at the same time I don't want to have personal responsibility for every single component and configuration on the network. I pay Company XYZ for a solution and support so that when it does break I pick up the phone and they deal with it in 4 hours or next business day - I don't have the time any more to know every single bit of hardware inside out and how to fix it and what to buy when it breaks.

    Take our SAN for example - I could have built something using off-the-shelf components in a Backblaze-style device using FreeNAS or OpenFiler. However, the Oracle solution does everything for me, works extremely well, and I don't need to worry about keeping it running because that's what I pay them for. On the other hand, with budgets being so tight I wouldn't object to building my own D2D backup storage solution as it would have no immediate impact on users if it did break, so it depends on the scenario for me.

    Chris

  9. #23

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,683
    Thank Post
    1,268
    Thanked 789 Times in 686 Posts
    Rep Power
    237
    Quote Originally Posted by Duke View Post
    For example: If I build my own in-house web filter using off-the-shelf hardware - who gets blamed when an OS upgrade doesn't support the RAID card I've used, or when the open-source filtering project I use gets discontinued, or when the filtering doesn't catch something it should have and it becomes a child protection issue?
    You're confusing hardware and software issues there. Software, or the support of it - proper expertise - is worth paying for. A web filter should have no reason for not running as a virtual machine, though - your VM system should be able to assign a physical interface to a specific VM, ensuring that no network traffic can bypass the filter.

    I don't have the time any more to know every single bit of hardware inside out and how to fix it and what to buy when it breaks.
    The idea is that the hardware all works the same, it's just a bunch of off-the-shelf components. If something breaks you chuck it out and stick a new part in place.

    Take our SAN for example
    Exactly - why even have a SAN in the first place? Why not just mirror disk volumes between two servers? Depending on the size of the school, two servers to do processing could well be all you need, so you'd save on the running costs of a whole server as well as the purchase price of your SAN device. If one server went down, for planned or unplanned reasons, the other one could simply take over with no loss of service. Each VM would also have local-speed read access to its associated disk volumes instead of accesing data over an iSCISI connection.

  10. #24

    GrumbleDook's Avatar
    Join Date
    Jul 2005
    Location
    Gosport, Hampshire
    Posts
    9,992
    Thank Post
    1,359
    Thanked 1,828 Times in 1,135 Posts
    Blog Entries
    19
    Rep Power
    602
    There are a lot of risks and issues with building your own tech and then supporting it.

    Firstly you have the time issue. My time is precious and was when I was in a school. Most people on here say they don't have enough time as it is so the time spent building your own desktops from off-the-shelf parts and then comparing any cost saving on that with buying pre-built kit from someone like Dell. If I am kitting out 2 rooms each year then it is a better value option to buy it pre-made.

    This is before you get to the fact that off-the-shelf parts do not always stay the same ... and whilst I would love to say that you can just swap any hardware around, we all know that you can't. Some software does not like it ... and I am not going to spend days trying to rebuild a system just because a RAID controller is having a fit.

    Then you have to remember that you may not be at that particular school forever. Every NM and Tech will have a slightly different skillset. As much as we would all love to say that we could build systems (and then document them, obviously) that *anybody* could come in and support, this is not always the case. I would be doing a school a dis-service by leaving them with a system that could not be picked up easily by someone else. By easily, i mean that should I win the lottery and go on a long holiday and my replacement comes in the day after I have gone, then when the admin server goes down on their first afternoon, it should be easy to get it going again. if the server was built 4 years ago and the PSU has gone, how the heck am I going to find a replacement PSU *and* motherboard to fit it? If we have already established that the cost of building yourself might not give you cost savings to have a spare of everything (why not have two servers then?) then you have to get this off the shelf kit from somewhere ... if the kit is no longer available then i am looking at building a new system when a warranty would have been an easier option.

    It is a careful balancing act to decide which solutions can be home-made / school-engineered and which need the risk put onto the supplier ...

  11. Thanks to GrumbleDook from:

    FN-GM (13th April 2011)

  12. #25

    SYNACK's Avatar
    Join Date
    Oct 2007
    Posts
    11,240
    Thank Post
    882
    Thanked 2,742 Times in 2,316 Posts
    Blog Entries
    11
    Rep Power
    784
    Transferability is a big one, my Windows sites could be easily handed off to someone else who could get to grips with most of the config even without documentation in a reasonably short timeframe. Linux is simply so customizeable that it can very quickly turn into an undecypherable maze of interlocking systems with enough config files to comfortably wallpaper the infinite chasums of hell.

    Having gone into a school with a realitivly simple custom build of linux on it that had been built by one person then stumbled through by several others before I got to it the shear scale of the unknown elements is astonishing. That and the helpful linux community who take joy in just posting the man pages online make it a total horror show.

    Give me a supported and somewhat standardised solution any day as at least that will not absorb so much time not only to build but to document and so much longer to transfer.

  13. #26

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,683
    Thank Post
    1,268
    Thanked 789 Times in 686 Posts
    Rep Power
    237
    Quote Originally Posted by GrumbleDook View Post
    Most people on here say they don't have enough time as it is so the time spent building your own desktops from off-the-shelf parts
    All of the above in this thread was regarding servers. For desktops, I think the best way to ensure uptime is to have a spare standing by ready to plug in. You don't need a four-hour reposnse-time garantee for that, though, just a standard replace-if-it-conks-out garantee is fine, an any "extended" garantee for longer than three years is probably not worth the bother.

    whilst I would love to say that you can just swap any hardware around, we all know that you can't.
    I think we might be looking at this with somewhat different ideas as to how to design a system. A broken server isn't something to fix, it's a bunch of generic components that you can quickly use to build a new server - shove a few new parts in and you have a new server. You don't have to worry about getting data off the harddrives or as that's already mirrored somewhere else and your spare processing capability is taking care of running the VMs that were running on that server - there's no need to fret about downtime if your system is designed with spare capacity, having hardware break doesn't stop your system running. If your RAID controller no longer works, chuck it out and buy a new one - this is still cheaper than paying Dell for a four-hour response time.

    Every NM and Tech will have a slightly different skillset. As much as we would all love to say that we could build systems (and then document them, obviously) that *anybody* could come in and support, this is not always the case.
    But this is the whole point of virtualisation - you don't have to worry about the hardware, that can be any old generic slot-together equipment, your servers and so on actually run on top of your virtual machine system, which is hardware-agnostic. Anyone can build a modern computer, you just chuck parts in a case.

  14. #27
    Duke's Avatar
    Join Date
    May 2009
    Posts
    1,017
    Thank Post
    300
    Thanked 174 Times in 160 Posts
    Rep Power
    58
    Quote Originally Posted by dhicks View Post
    I think we might be looking at this with somewhat different ideas as to how to design a system. A broken server isn't something to fix, it's a bunch of generic components that you can quickly use to build a new server - shove a few new parts in and you have a new server. You don't have to worry about getting data off the harddrives or as that's already mirrored somewhere else and your spare processing capability is taking care of running the VMs that were running on that server - there's no need to fret about downtime if your system is designed with spare capacity, having hardware break doesn't stop your system running. If your RAID controller no longer works, chuck it out and buy a new one - this is still cheaper than paying Dell for a four-hour response time.
    I kind of agree with this in principal, but I think it assumes a properly configured, redundant, documented setup that in the real world none of us have. A broken server isn't something to fix, it's a bunch of generic components that you can quickly use to build a new server - for a VM host I can get behind this maybe, but are you saying you have no physical standalone servers, or that if you do then you will lose nothing by scrapping them and building a new one to replace them? If a component fails that takes down a server, in theory I can have it up and running next business day under the warranty. To me that seems vastly better than having to build a new physical server, install the OS and all the apps then restore all the data to it, again assuming the process to do all that is properly documented.

    If everything is virtualised, my concern would be the shared storage. In theory having it mirrored between two home-built boxes would cover you, but personally I wouldn't be comfortable having no warranty, guarantees or support on those devices.

    I can see where you're coming from - there's an appeal to having a Google/Facebook-style of network where no single server is of any importance, and if it dies there's no immediate impact. All I need to do is build up something that has the same functionality as the server that died and add it back into the 'pool' of servers or storage. My concern is that building an infrastructure like that properly isn't something most of us have the time or money to set up...

    Chris

  15. #28

    SYNACK's Avatar
    Join Date
    Oct 2007
    Posts
    11,240
    Thank Post
    882
    Thanked 2,742 Times in 2,316 Posts
    Blog Entries
    11
    Rep Power
    784
    In reply to the OP, if I need to reboot something during the day to fix something major I do. If I get any complaints I simply outline the costs involved with creating the infrastructure nessisary to attain higher uptimes. Its not just about the hardware but the software to, at least two mail servers with synced DBs for access, at least two mirrored sets of the docs, clustered configuration for various other things all fo which add complexity and cost. Buying your storeage space three times over at least, all of the additional power, software liscences, software and setup complexity, increased management overhead to stage upgrades on all of the additional backup/cluster member servers (virtual or not), etc, etc, etc.

    It is at this point where they can actually hear the sides of the school cheque book collapsing in from the imagined vacum inside that you ask them if they are serious and really would like to proceed towards a fully redundant system like that. If conveyed properly (the look of horror on their faces is the usual indicator) this usually quells any further outlandish requests like that for at least a few weeks.

  16. Thanks to SYNACK from:

    Duke (18th April 2011)

  17. #29

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,683
    Thank Post
    1,268
    Thanked 789 Times in 686 Posts
    Rep Power
    237
    Quote Originally Posted by Duke View Post
    If everything is virtualised, my concern would be the shared storage. In theory having it mirrored between two home-built boxes would cover you, but personally I wouldn't be comfortable having no warranty, guarantees or support on those devices.
    This is where "garantees" can be dangerous - no garantee will actually protect your data, that's up to you to sort out with backups and so forth. Paying a hardware support contract doesn't make your data safer, it just means you'll get some new, blank harddrives turn up within four hours.

    My concern is that building an infrastructure like that properly isn't something most of us have the time or money to set up...
    But that's my whole argument here, and the reason for my initial reply to your post - setting up exactly that kind of system is way easier and cheaper than you probably think. Setting up a virtual machine infrastructure with mirrored storage can be entirely with free software (I prefer Xen and DRBD running on Debian, but I'm sure other options exist), and the whole point of using cheap, widley-available hardware is that it's, well, cheap and you can replace a part by running down to your local PC shop.

    Admitedly, the biggest barrier to having such a setup is probably the fiddlyness of installing Debian, Xen and DRBD - there's definatly a gap in the market there for a half-decent server management console built around the idea of mirrored storage. However, once you've built a couple of servers that way you realise it's actually dead easy - I reckon it should be possible to build a complete server, from scratch, starting from the point of going to the PC shop to get the bits and ending with a VM server added to the resources pool, in around 4 hours.

  18. #30
    ezzauk's Avatar
    Join Date
    Jul 2007
    Location
    Redditch
    Posts
    109
    Thank Post
    18
    Thanked 10 Times in 10 Posts
    Rep Power
    17
    All our services should have 0% downdown time all year around. I am give the first Tuesday of every month, a 2 hour slot between 6-8am to do any server outages.

SHARE:
+ Post New Thread
Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. The Virtual Learning Environment (VLE) at Twynham School
    By UKDarkstar in forum Virtual Learning Platforms
    Replies: 3
    Last Post: 22nd February 2010, 04:14 PM
  2. Replies: 11
    Last Post: 19th February 2010, 11:10 AM
  3. managing whitelist for entire school?
    By Flakes in forum Internet Related/Filtering/Firewall
    Replies: 5
    Last Post: 30th November 2009, 09:21 AM
  4. Replies: 7
    Last Post: 12th October 2008, 08:33 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •