+ Post New Thread
Page 2 of 2 FirstFirst 12
Results 16 to 28 of 28
*nix Thread, ZFS - Deduplication (Dedup) Vs Compression in Technical; Mind having a look to see how much performance overhead FUSE adds when you get a moment?...
  1. #16


    Join Date
    Dec 2005
    Location
    In the server room, with the lead pipe.
    Posts
    4,534
    Thank Post
    271
    Thanked 752 Times in 590 Posts
    Rep Power
    218
    Mind having a look to see how much performance overhead FUSE adds when you get a moment?

  2. #17

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,498
    Thank Post
    1,185
    Thanked 745 Times in 647 Posts
    Rep Power
    228
    Quote Originally Posted by pete View Post
    Mind having a look to see how much performance overhead FUSE adds when you get a moment?
    How would I tell?

    I can tell you that delete performance is really slow - this is understandable, and pointed out in the documentation, but you tend to forget. I had to leave the server overnight to complete an "rm" operation on a 2GB image file. Not an issue on a backup server running batch jobs overnight, but not something for everyday use.

    --
    David Hicks

  3. #18

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,498
    Thank Post
    1,185
    Thanked 745 Times in 647 Posts
    Rep Power
    228
    Quote Originally Posted by dhicks View Post
    I can tell you that delete performance is really slow
    Copying files from a USB harddrive to the LessFS drive seems to be happening at around 6.5 Megabytes per second - so at this rate, my 54GB of disk image data is going to take around 6 days to copy. That's quite slow...

    --
    David Hicks

  4. #19

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,498
    Thank Post
    1,185
    Thanked 745 Times in 647 Posts
    Rep Power
    228
    Quote Originally Posted by dhicks View Post
    Copying files from a USB harddrive to the LessFS drive seems to be happening at around 6.5 Megabytes per second
    Ah, found the problem - default blocksize is 4K, which is great for deduplication but bad for disk performance. Setting block size to 128K has the file copy whizzing along - I'll maybe try 32K or 64K and see what the best size/performance tradeoff is.

    --
    David Hicks

  5. #20

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,498
    Thank Post
    1,185
    Thanked 745 Times in 647 Posts
    Rep Power
    228
    Quote Originally Posted by dhicks View Post
    Setting block size to 128K has the file copy whizzing along
    ...but with a deduplication ratio of around 1:1.01. Switched to 64k blocks, current ratio around 1:10 - this is only from a quick script I wrote, I'm not sure if it's correct yet.

    --
    David Hicks

  6. #21

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,498
    Thank Post
    1,185
    Thanked 745 Times in 647 Posts
    Rep Power
    228
    Now on 32K block size, current deduplication ratio of 1:8 and my 54GB of disk image data should hopefully be copied over by midnight. I'm worried that write performance is going to go down the more data is stored, but I guess I'll find out (I'll come back tomorrow and check and see if the files have copied accross or not). LessFS is an inline deduplication system (SDFS had a batch-mode option, too), but a solution is to have data written to a non-lessFS volume and copy files over to the LessFS volume overnight, deduplicating as it goes. The documentation mentions having separate volumes to store block data and metadata (because this is a FUSE-based file system, so it stores its data as database files in an existing file system) - nice idea, but I don't have two separate RAID arrays to store data on (and storing metadata on a non-RAID disk seems a little pointless).

    --
    David Hicks

  7. #22
    apaton's Avatar
    Join Date
    Jun 2009
    Location
    Kings Norton
    Posts
    283
    Thank Post
    54
    Thanked 106 Times in 87 Posts
    Rep Power
    35
    I've also come across the same block size issues with ZFS. (Sun S7000 refers blocks as database record). Smaller Blocks < 32K can give a better dedup ratio but requires more memory and Disk/Volume throughput is reduced for data transfers.

    Personally I'm only 60% convinced that dedup for general users file systems is efficient use of resources on a NAS file severer.

    General user filesystems at my customers sites all have different amount of users and data profiles, thus calculating dedup returns is almost impossible to predict. So its become let try and see! Not very scientific.

    I do know dedup works best when you have a large amount of duplicated data such as backup and Virtual Machines images.

    dhicks I think your on the right path with your use of dedup, its just a pity you didn't have much success with ZFS.

  8. #23


    Join Date
    Oct 2006
    Posts
    3,387
    Thank Post
    183
    Thanked 350 Times in 279 Posts
    Rep Power
    147
    dhicks - have you tryed NexentaStor Project - CommunityEdition - NexentaStor Project Seems pretty good to me.

  9. Thanks to j17sparky from:

    apaton (18th April 2010)

  10. #24
    apaton's Avatar
    Join Date
    Jun 2009
    Location
    Kings Norton
    Posts
    283
    Thank Post
    54
    Thanked 106 Times in 87 Posts
    Rep Power
    35
    I've known about Nexenta for a while, but must admit didn't know they had a Community Edition. Another one to have a look at but time is a finite resource.

  11. #25


    Join Date
    Oct 2006
    Posts
    3,387
    Thank Post
    183
    Thanked 350 Times in 279 Posts
    Rep Power
    147
    Quote Originally Posted by apaton View Post
    I've known about Nexenta for a while, but must admit didn't know they had a Community Edition. Another one to have a look at but time is a finite resource.
    Ill save you some hassle then; it aint going to run properly on anything but decent server gear with 4gb+ ram. Tryed it on a cheapo Dell T105 and it was having none of it.

  12. #26

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,498
    Thank Post
    1,185
    Thanked 745 Times in 647 Posts
    Rep Power
    228
    Quote Originally Posted by apaton View Post
    I think your on the right path with your use of dedup
    As in have a staging area that I copy files from over to the LessFS volume overnight? Now I've just got to find a bigger harddrive to make a useful sized staging area...

    its just a pity you didn't have much success with ZFS.
    I think ZFS worked fine, I think the problem was with Solaris on our backup server - I tried OpenSolaris before, a couple of years ago when I built the machine, and that time it didn't even boot from the install CD.

    --
    David Hicks

  13. #27

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,498
    Thank Post
    1,185
    Thanked 745 Times in 647 Posts
    Rep Power
    228
    Quote Originally Posted by j17sparky View Post
    I remember trying something all-in-one and Solaris-based a few months ago, swearing at it lots, and using OpenSolaris instead. Could have been NexentaStor, I can't remember. I don't really care all that much which OS I wind up using, it's just until I get this server fixed I have no way to restore our media suite full of machines with mangled disk images...

    --
    David Hicks

  14. #28

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,498
    Thank Post
    1,185
    Thanked 745 Times in 647 Posts
    Rep Power
    228
    Right, I'm giving up on inline, block-level deduplicating file systems (ZFS, SDFS, LessFS, etc) and going back to a plain Ubuntu 9.10 server with an mdadm RAID-5 array containing an ext3 filesystem. Re-reading the documentation for rsync, it looks like it should be perfectly compatible with having a peridocally running script check for and eliminate duplicate files via means of hard links. Deduplication ratios on file system backups should be similar to what I'd have got with block-level deduplication anyway, and the 1:1.5 ratio I was getting on the 2GB disk-image files wasn't worth the massive performance hit - an order of magnitude for reads, and simply incredably slow for writes.

    --
    David Hicks

SHARE:
+ Post New Thread
Page 2 of 2 FirstFirst 12

Similar Threads

  1. Replies: 20
    Last Post: 5th February 2010, 06:18 PM
  2. 7110 Compression
    By john in forum Hardware
    Replies: 6
    Last Post: 13th June 2009, 06:23 PM
  3. Replies: 3
    Last Post: 21st April 2009, 05:32 PM
  4. Compression Software
    By FN-GM in forum General Chat
    Replies: 6
    Last Post: 6th April 2009, 06:41 PM
  5. NT Backup compression
    By exsupport in forum Windows
    Replies: 0
    Last Post: 27th October 2008, 08:14 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •