
No, that's the whole point of a deduplicating file system - you store replicated data once and have multiple pointers to it. At the block device / dedicated deduplicating file system level this means calculating a checksum on each block of data. This is the approach used by ZFS and a bunch of assorted FUSE-based filesystems available for Linux. Unfortuantly, I found ZFS for Linux rather unreliable, and I think development has now stalled, and the FUSE-based systems available are experimental and/or horribly slow. Therefore, the best approach is probably file-level deduplication, where you simply calculate a checksum on each individual file and replace duplicates with a hard link. This won't work for a live filesystem (because you might end up changing a file that you think is a separate file but is actually a hard link), but it's ideal for a read-only backup.
rsync seems to operate in a way that is exactly what you'd want to be compatible with the above system, i.e. it creates a whole new file when updating an existing file so it doesn't end up appending data to the linked file. Therefore, all you need is a very simple script that, every night, duplicates the previous day's backup folder into a new folder by duplicating the directory structure and creating hard links to files, then runs rsync to update any changes with your live file server. Each day you only use however much space it takes to duplicate the directory structure and space for changed files. I'm guessing your 28TB server could store daily backups of a 1TB file share going back several years.
Interestingly, I've built a backup box a little while ago - 8x2TB disks (2x80GB for OS) with space for 2 more disks if I need - total cost around £1.5k, though now would cost a lot more due to HDD prices. Case purchased off xcase (significantly cheaper case than the Supermicros, a little lower build quality, but does the job), low power hard drives (it's only a backup server, we're not expecting significant iops) and am hitting fairly decent speeds (about 80-90Mb/s continual across 1GB link). OS wise, grabbed NexentaStor Community Edition, though could've gone with Nexenta Core and would've done the job as well. Actually, just remembered I blogged about it - Willog » Terabytes on a budget… 2U 14.5TB usable backup device, although haven't updated that with OS HDDs (which increased the network throughput, USBs were causing some issues on some aspect of the system)
Depending on what exactly you're backing up, @dhicks idea is a good one, we're looking at backup software with capability to dedupe due to our backup arrangements. Expansion of dhicks idea could be to use inbuilt zfs snapshotting (so rsync just has to do update, you don't have it doing the move of older files), and possibly robocopy on windows (keep acls and use cifs for file transfers)
There are currently 1 users browsing this thread. (0 members and 1 guests)