apaton (18th April 2010)

Mind having a look to see how much performance overhead FUSE adds when you get a moment?

How would I tell?
I can tell you that delete performance is really slow - this is understandable, and pointed out in the documentation, but you tend to forget. I had to leave the server overnight to complete an "rm" operation on a 2GB image file. Not an issue on a backup server running batch jobs overnight, but not something for everyday use.
--
David Hicks

Now on 32K block size, current deduplication ratio of 1:8 and my 54GB of disk image data should hopefully be copied over by midnight. I'm worried that write performance is going to go down the more data is stored, but I guess I'll find out (I'll come back tomorrow and check and see if the files have copied accross or not). LessFS is an inline deduplication system (SDFS had a batch-mode option, too), but a solution is to have data written to a non-lessFS volume and copy files over to the LessFS volume overnight, deduplicating as it goes. The documentation mentions having separate volumes to store block data and metadata (because this is a FUSE-based file system, so it stores its data as database files in an existing file system) - nice idea, but I don't have two separate RAID arrays to store data on (and storing metadata on a non-RAID disk seems a little pointless).
--
David Hicks
I've also come across the same block size issues with ZFS. (Sun S7000 refers blocks as database record). Smaller Blocks < 32K can give a better dedup ratio but requires more memory and Disk/Volume throughput is reduced for data transfers.
Personally I'm only 60% convinced that dedup for general users file systems is efficient use of resources on a NAS file severer.
General user filesystems at my customers sites all have different amount of users and data profiles, thus calculating dedup returns is almost impossible to predict. So its become let try and see! Not very scientific.
I do know dedup works best when you have a large amount of duplicated data such as backup and Virtual Machines images.
dhicks I think your on the right path with your use of dedup, its just a pity you didn't have much success with ZFS.
dhicks - have you tryed NexentaStor Project - CommunityEdition - NexentaStor Project Seems pretty good to me.
apaton (18th April 2010)
I've known about Nexenta for a while, but must admit didn't know they had a Community Edition. Another one to have a look at but time is a finite resource.

As in have a staging area that I copy files from over to the LessFS volume overnight? Now I've just got to find a bigger harddrive to make a useful sized staging area...
I think ZFS worked fine, I think the problem was with Solaris on our backup server - I tried OpenSolaris before, a couple of years ago when I built the machine, and that time it didn't even boot from the install CD.its just a pity you didn't have much success with ZFS.
--
David Hicks

I remember trying something all-in-one and Solaris-based a few months ago, swearing at it lots, and using OpenSolaris instead. Could have been NexentaStor, I can't remember. I don't really care all that much which OS I wind up using, it's just until I get this server fixed I have no way to restore our media suite full of machines with mangled disk images...
--
David Hicks

Right, I'm giving up on inline, block-level deduplicating file systems (ZFS, SDFS, LessFS, etc) and going back to a plain Ubuntu 9.10 server with an mdadm RAID-5 array containing an ext3 filesystem. Re-reading the documentation for rsync, it looks like it should be perfectly compatible with having a peridocally running script check for and eliminate duplicate files via means of hard links. Deduplication ratios on file system backups should be similar to what I'd have got with block-level deduplication anyway, and the 1:1.5 ratio I was getting on the 2GB disk-image files wasn't worth the massive performance hit - an order of magnitude for reads, and simply incredably slow for writes.
--
David Hicks
There are currently 1 users browsing this thread. (0 members and 1 guests)