Currently user shares on one server are showing 128GB of duplicates out of 380GB of user data and on another server 45GB of duplicates of 187GB of data. That's only on two servers and doesn't cover cross-server duplication. I suspect I've got an easy 500GB of dupes that's just user data.
My Windows dedupe uses a script that calls subinacl and imagex and dumps areas to .wim in a flagrant abuse of unintended functionality. To call it hacktastic is an understatement - I need to replace it with something more maintainable. I also need to extend the functionality to Linux and OS X.
- Software product, not hardware appliance
- Cross-platform - Linux, OS X, Windows
- File-level dedupe
- Able to choose on a per-volume / per folder-tree level whether to include in dedupe or not (i.e just dedupe d:\shares\userdata).
- Keep a record of files checksums rather than calculating from scratch if the file is unchanged.
- Not cost the moon on a stick (I'm looking at you Symantec).
- Able to point the backup target at arbitrary storage that doesn't have to be formatted into a magical backup volume. i.e I can give it space on an existing backup server.
- Not designed by people who've never had to use the product
Would be nice
- Cross-backup dedupe at a later date (say during the day after the nightlys have run)
- Free pony
I'm poking BackupPC at the moment, but I remember looking at it before and something putting me off (can't remember what exactly). What else are you using that's decent and not hateful?
The above only works for read-only backup folders, obviously, as you'll wind up with a number of files all hard-linked to the same central point - fine if you just want to restore a file, but not for in-place editing. There's no need for any kind of database, just a standard file system is fine, and you can share out your backup folder as a Windows share so people can easily restore their own files - no messing around with having to ask IT to restore a file.
I originally wrote the above script to do its own file transfers, but upon investigation rsync would probably be well suited. Rsync is supported on Windows and OS X as well as Linux, and Windows does support hard links just fine. The script I wrote took a couple of pages of Python, nothing more, and was pretty much a couple of basic "for each ..." loops.
Edit: don't forget to have a periodic process that clears out older backups and, importantly, once it has removes orphaned files from teh central torage folder. Orphan files are simple to spot, they simply have a reference count of 1, so all that's needed is a loop to run through the central storage folder at the end of each backup job and check for orphans.
Last edited by dhicks; 7th April 2011 at 02:50 PM.
There are currently 1 users browsing this thread. (0 members and 1 guests)