+ Post New Thread
Results 1 to 2 of 2
Enterprise Software Thread, Backup software with cross-platform agents that supports file-level deduplication? in Technical; Currently user shares on one server are showing 128GB of duplicates out of 380GB of user data and on another ...
  1. #1


    Join Date
    Dec 2005
    Location
    In the server room, with the lead pipe.
    Posts
    4,715
    Thank Post
    288
    Thanked 789 Times in 616 Posts
    Rep Power
    226

    Backup software with cross-platform agents that supports file-level deduplication?

    Currently user shares on one server are showing 128GB of duplicates out of 380GB of user data and on another server 45GB of duplicates of 187GB of data. That's only on two servers and doesn't cover cross-server duplication. I suspect I've got an easy 500GB of dupes that's just user data.

    My Windows dedupe uses a script that calls subinacl and imagex and dumps areas to .wim in a flagrant abuse of unintended functionality. To call it hacktastic is an understatement - I need to replace it with something more maintainable. I also need to extend the functionality to Linux and OS X.

    Need:
    • Software product, not hardware appliance
    • Cross-platform - Linux, OS X, Windows
    • File-level dedupe
    • Able to choose on a per-volume / per folder-tree level whether to include in dedupe or not (i.e just dedupe d:\shares\userdata).
    • Keep a record of files checksums rather than calculating from scratch if the file is unchanged.
    • Not cost the moon on a stick (I'm looking at you Symantec).
    • Able to point the backup target at arbitrary storage that doesn't have to be formatted into a magical backup volume. i.e I can give it space on an existing backup server.
    • Not designed by people who've never had to use the product


    Would be nice

    • Cross-backup dedupe at a later date (say during the day after the nightlys have run)
    • Free pony


    I'm poking BackupPC at the moment, but I remember looking at it before and something putting me off (can't remember what exactly). What else are you using that's decent and not hateful?

  2. #2

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,772
    Thank Post
    1,308
    Thanked 804 Times in 698 Posts
    Rep Power
    247
    Quote Originally Posted by pete View Post
    My Windows dedupe uses a script that calls subinacl and imagex and dumps areas to .wim in a flagrant abuse of unintended functionality. To call it hacktastic is an understatement - I need to replace it with something more maintainable. I also need to extend the functionality to Linux and OS X.
    At my last school, I had a script that did file-level deduplication on a backup server. For each file / subfolder in a given location it would get an MD5 checksum of each file and see if a file with that checksum existed in a central folder. If so, it was a duplicate and the script would make a hard link to the central copy, if not then the script would simply move the new file to the central folder before making a hard link to it again.

    The above only works for read-only backup folders, obviously, as you'll wind up with a number of files all hard-linked to the same central point - fine if you just want to restore a file, but not for in-place editing. There's no need for any kind of database, just a standard file system is fine, and you can share out your backup folder as a Windows share so people can easily restore their own files - no messing around with having to ask IT to restore a file.

    I originally wrote the above script to do its own file transfers, but upon investigation rsync would probably be well suited. Rsync is supported on Windows and OS X as well as Linux, and Windows does support hard links just fine. The script I wrote took a couple of pages of Python, nothing more, and was pretty much a couple of basic "for each ..." loops.

    Edit: don't forget to have a periodic process that clears out older backups and, importantly, once it has removes orphaned files from teh central torage folder. Orphan files are simple to spot, they simply have a reference count of 1, so all that's needed is a loop to run through the central storage folder at the end of each backup job and check for orphans.
    Last edited by dhicks; 7th April 2011 at 02:50 PM.



SHARE:
+ Post New Thread

Similar Threads

  1. Firefox Security Flaw Reported: Cross Platform
    By DaveP in forum General Chat
    Replies: 2
    Last Post: 18th August 2009, 02:30 PM
  2. Update Backup Exec agents
    By cookie_monster in forum Windows
    Replies: 4
    Last Post: 17th April 2009, 06:26 PM
  3. cross platform encryption apps FYI
    By mac_shinobi in forum Mac
    Replies: 1
    Last Post: 30th March 2009, 07:46 PM
  4. Replies: 0
    Last Post: 5th March 2009, 12:32 PM
  5. Cross platform pilot
    By dagza in forum ICT KS3 SATS Tests
    Replies: 15
    Last Post: 13th June 2006, 10:48 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •