Enterprise Software Thread, Backup software with cross-platform agents that supports file-level deduplication? in Technical; Currently user shares on one server are showing 128GB of duplicates out of 380GB of user data and on another ...
7th April 2011, 02:12 PM #1
Backup software with cross-platform agents that supports file-level deduplication?
Currently user shares on one server are showing 128GB of duplicates out of 380GB of user data and on another server 45GB of duplicates of 187GB of data. That's only on two servers and doesn't cover cross-server duplication. I suspect I've got an easy 500GB of dupes that's just user data.
My Windows dedupe uses a script that calls subinacl and imagex and dumps areas to .wim in a flagrant abuse of unintended functionality. To call it hacktastic is an understatement - I need to replace it with something more maintainable. I also need to extend the functionality to Linux and OS X.
- Software product, not hardware appliance
- Cross-platform - Linux, OS X, Windows
- File-level dedupe
- Able to choose on a per-volume / per folder-tree level whether to include in dedupe or not (i.e just dedupe d:\shares\userdata).
- Keep a record of files checksums rather than calculating from scratch if the file is unchanged.
- Not cost the moon on a stick (I'm looking at you Symantec).
- Able to point the backup target at arbitrary storage that doesn't have to be formatted into a magical backup volume. i.e I can give it space on an existing backup server.
- Not designed by people who've never had to use the product
Would be nice
- Cross-backup dedupe at a later date (say during the day after the nightlys have run)
- Free pony
I'm poking BackupPC at the moment, but I remember looking at it before and something putting me off (can't remember what exactly). What else are you using that's decent and not hateful?
IDG Tech News
7th April 2011, 02:48 PM #2
At my last school, I had a script that did file-level deduplication on a backup server. For each file / subfolder in a given location it would get an MD5 checksum of each file and see if a file with that checksum existed in a central folder. If so, it was a duplicate and the script would make a hard link to the central copy, if not then the script would simply move the new file to the central folder before making a hard link to it again.
Originally Posted by pete
The above only works for read-only backup folders, obviously, as you'll wind up with a number of files all hard-linked to the same central point - fine if you just want to restore a file, but not for in-place editing. There's no need for any kind of database, just a standard file system is fine, and you can share out your backup folder as a Windows share so people can easily restore their own files - no messing around with having to ask IT to restore a file.
I originally wrote the above script to do its own file transfers, but upon investigation rsync would probably be well suited. Rsync is supported on Windows and OS X as well as Linux, and Windows does support hard links just fine. The script I wrote took a couple of pages of Python, nothing more, and was pretty much a couple of basic "for each ..." loops.
Edit: don't forget to have a periodic process that clears out older backups and, importantly, once it has removes orphaned files from teh central torage folder. Orphan files are simple to spot, they simply have a reference count of 1, so all that's needed is a loop to run through the central storage folder at the end of each backup job and check for orphans.
Last edited by dhicks; 7th April 2011 at 02:50 PM.
By DaveP in forum General Chat
Last Post: 18th August 2009, 02:30 PM
By cookie_monster in forum Windows
Last Post: 17th April 2009, 06:26 PM
By mac_shinobi in forum Mac
Last Post: 30th March 2009, 07:46 PM
By pete in forum How do you do....it?
Last Post: 5th March 2009, 12:32 PM
By dagza in forum ICT KS3 SATS Tests
Last Post: 13th June 2006, 10:48 PM
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)