How NOT to recover from a corrupt drive
Thought I'd share this as a tale to warn others of the dangers of over-stupidity..
Got a call from a primary where suddenly everyones Home folders had disappeared and it seemed there was a problem with login times spiralling through the roof too...
Odd thinks I, so I pop into school, do a bit of digging around, discover that yet again staff have filled their partition with AVI and image files of silly size so I did a bit of work and applied my image resizing script.
Job done, thinks I... Anyone care to guess as to how wrong I was... :doh:
Anyway, desperate phone call next day that it's now worse... Nothing working and did I do nothing the day before.. *gulp*.
Trying to gain access to the server is getting nowhere fast and when I check with the IT literate TA (worth her weight in gold I might add!) the server appears to be on a go slow mission and just not coming up.
It's then that I remember that I'd seen the front panel had been removed from the server the day before... Uhoh!... So I do a little more digging, discover I can get to the event log for the server in question, remotely and discover a ton of file corruption and MFT errors, etc... *oops!*
So, thinking a bit harder realise that I dropped a ball and should have checked that the hot swap SCSI caddies hadn't been knocked out or damaged... (Lesson #1).
Anyway, wait most of the day, trying to get remotely into the server until 5pm rolls around and success, I'm in, but it's taking 5 minutes between clicks. Somehow, after 2 hours I get to the computer management > Disk management section and force a diskcheck repair which gets it back up and running...
Good news at last... So I apply my backup routine for my image resizer script and set it to work overnight, robocopying everything across to the alternate, newer server...
... except I forgot it was only set to copy JPEG files... :doh:
Remember yesterday evening, attempt to stop robocopy and change the routine but the whole thing locks up on me so visit required today.
It's then that I discover that Robocopy will stall if it hits any permissions issues and it seems a couple of files got damaged and needed to be admin owned and then set to re-inherit permissions... This stalled the whole backup process by a stunning 6 hours!
Anyway, the point of this little tale of woe... Well here's some pointers that I'm going to apply if I ever end up here again..
1. If the server has any part dislodged.. Check it thoroughly... Don't just plug the bit back on
2. If you're going to run a script, check it first on a small subset of files to make sure it's doing what it's supposed to!
3. If a hard drive required a chkdsk repair, do a little checking of security permissions and re-inherit perms on folders that are key permission set points. This helps identify any problem files.
Oh and if you like irony...
4. Make sure you discuss updating backup procedures with the head in question BEFORE the server decides to test your existing process and thus identifies weaknesses in it.
Job -> Love -> Oh yes indeedy... :censored: