+ Post New Thread
Page 1 of 2 12 LastLast
Results 1 to 15 of 28
*nix Thread, ZFS - Deduplication (Dedup) Vs Compression in Technical; Some more basic tests on ZFS Dedup, this time to see what it like with real office data. Setup OpenSolaris ...
  1. #1
    apaton's Avatar
    Join Date
    Jun 2009
    Location
    Kings Norton
    Posts
    283
    Thank Post
    54
    Thanked 106 Times in 87 Posts
    Rep Power
    36

    ZFS - Deduplication (Dedup) Vs Compression

    Some more basic tests on ZFS Dedup, this time to see what it like with real office data.

    Setup OpenSolaris build 131
    Sun X4200, 2 x Dual Core Opteron 2.6Ghz, 8Gb Ram, 4 x 73Gb SAS 10Krpm

    Created two ZFS data sets, one with dedup enabled the other with compression. (default level)
    Code:
    root@osol:~# zfs list
    NAME                       USED  AVAIL  REFER  MOUNTPOINT
    compress                    72K  66.9G    21K  /compress
    dedupe                      72K  66.9G    21K  /dedupe
    
    root@osol:~# zfs set compression=on compress
    root@osol:~# zfs set dedup=on dedupe
    Real data transfered to the drives.
    I loaded the my company project/Software folders, 68,000 files (Visio/PDF/Project/Word/OpenOffice/Excel,ISO's... ) total of 38.9Gb

    Load times, copying files from local UFS filesystem to ZFS dataset.

    Copy data to a ZFS dedup dataset
    Code:
    root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /dedupe/ ; tar xf - )
    real    19:51.930407394
    user        5.807881662
    sys      1:48.025965013
    38.8GB 0:19:51 [33.3MB/s] 
    Copy data to a ZFS compressed dataset
    Code:
    root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /compress/ ; tar xf - )
    real    18:46.544321180
    user        3.368262960
    sys      1:52.065809786
    38.8GB 0:18:46 [35.3MB/s]
    The ZFS dedup dataset was 66 seconds slower than the compress volume for data transfer.

    Let see how much space we saved for both methods
    ZFS Dedup
    Code:
    root@osol:/ufs# zpool list
    NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
    compress    68G  36.1G  31.9G    53%  1.00x  ONLINE  -
    dedupe      68G  38.4G  29.6G    56%  1.02x  ONLINE  -
    

    ZFS Compression
    Code:
     root@osol:/ufs# zfs get compressratio compress
    NAME      PROPERTY       VALUE  SOURCE
    compress  compressratio  1.08x  -
    


    Conclusion
    The compressed dataset did a better job than dedup by a 6% saving in storage used. Therefore ZFS dedup doesn't deliver any real benefits for real "office data" and compression is a better offer. Also check this thread about compression on 7110.

    Now why would you want to dedup? Well just look at my dedup ratio of 2.28 for a NFS share with VMware, now this is exciting!

    Code:
    root@osol:~$ zpool list vm-dedupe
    NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT 
    
    vm-dedupe  68G   16.1G  51.9G    23%  2.28x  ONLINE  -
    Therefore I can only say for deduplication "Some data is more equal than others"

    It is possible to have dedup and compressions enabled on ZFS dataset, but I haven't tested that combination yet.

    Andy
    Ps Should Deduplication be shortened to "dedupe" or "dedup" ? I can't decided which.
    Last edited by apaton; 8th February 2010 at 09:12 AM. Reason: Typo

  2. 3 Thanks to apaton:

    dhicks (7th February 2010), pete (7th February 2010), webman (8th February 2010)

  3. #2

    Join Date
    Jan 2009
    Location
    England
    Posts
    1,524
    Thank Post
    301
    Thanked 304 Times in 263 Posts
    Rep Power
    83
    I vote for dedupe

    We're running compression on our 7110 and currently achieve a compression ratio of roughly 1.3x. We've got a mix of both LZJB and GZIP-2 depending on the share and the 7110 barley breaks a sweat

    The data is a mix of VMs/Home Directories/General storage. I imagine the dedupe will give big benefits on things like shared storage and home directories where we've got people storing the same file across multiple locations.

  4. #3

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,624
    Thank Post
    1,240
    Thanked 778 Times in 675 Posts
    Rep Power
    235
    Quote Originally Posted by apaton View Post
    Now why would you want to dedup? Well just look at my dedup ratio of 2.28 for a NFS share with VMware, now this is exciting!
    I'm going to try copying our workstation disk images (all of which are Windows XP, same software installed, just different workstations names) to our new OpenSolaris file server tomorrow, closely followed by the daily backups of our main file servers. I'll see how much of a space saving we get with that - with my file-level deduplicating backup system we've been able to store daily file server backups for the past two years, so hopefully we'll do better than that.

    --
    David Hicks

  5. #4

    Join Date
    Nov 2009
    Location
    Austin, TX
    Posts
    22
    Thank Post
    7
    Thanked 0 Times in 0 Posts
    Rep Power
    0
    Quote Originally Posted by dhicks View Post
    I'm going to try copying our workstation disk images (all of which are Windows XP, same software installed, just different workstations names) to our new OpenSolaris file server tomorrow, closely followed by the daily backups of our main file servers. I'll see how much of a space saving we get with that - with my file-level deduplicating backup system we've been able to store daily file server backups for the past two years, so hopefully we'll do better than that.

    --
    David Hicks
    This is where dedupe is supposed to really help. Uncompressable data (videos, isos, etc) that have very little difference between them. This will get you considerably more space savings in this scenario over compression.

  6. #5
    apaton's Avatar
    Join Date
    Jun 2009
    Location
    Kings Norton
    Posts
    283
    Thank Post
    54
    Thanked 106 Times in 87 Posts
    Rep Power
    36
    Quote Originally Posted by dhicks View Post
    I'm going to try copying our workstation disk images (all of which are Windows XP, same software installed, just different workstations names) to our new OpenSolaris file server tomorrow, closely followed by the daily backups of our main file servers. I'll see how much of a space saving we get with that - with my file-level deduplicating backup system we've been able to store daily file server backups for the past two years, so hopefully we'll do better than that.

    --
    David Hicks

    Just curious how you got on with this.
    Did you get the space saving you expected?

    Thanks

    Andy

  7. #6

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,624
    Thank Post
    1,240
    Thanked 778 Times in 675 Posts
    Rep Power
    235
    Quote Originally Posted by apaton View Post
    Just curious how you got on with this. Did you get the space saving you expected?
    Hmm. Since moving our machine images over to the ZFS filesystem I've kept getting image corruption issues. This could very well be to do with the client side of things rather than the server, but until I've figure out exactly what the problem is and how to stop it happening I'm going to be careful what I trust to the backup server for now.

    --
    David Hicks

  8. Thanks to dhicks from:

    apaton (7th April 2010)

  9. #7

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,624
    Thank Post
    1,240
    Thanked 778 Times in 675 Posts
    Rep Power
    235
    Quote Originally Posted by dhicks View Post
    Hmm. Since moving our machine images over to the ZFS filesystem I've kept getting image corruption issues.
    And double Hmm. After a reimage, non of the media suite machines want to boot. This isn't looking good...

    --
    David Hicks

  10. #8

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,624
    Thank Post
    1,240
    Thanked 778 Times in 675 Posts
    Rep Power
    235
    Quote Originally Posted by dhicks View Post
    This isn't looking good...
    ...Does anyone want to recommend a deduplicating file system for Linux instead? I plan to move the backup server back to an Ubuntu Server install, but I see there's a couple of add-on filesystems available that do deduplication - ZFS for Linux, SDFS, maybe others that Google didn't find. Anyone any recommendations? SDFS looks pretty good, I'll probably try that first.

    --
    David Hicks

  11. #9


    Join Date
    Dec 2005
    Location
    In the server room, with the lead pipe.
    Posts
    4,638
    Thank Post
    275
    Thanked 778 Times in 605 Posts
    Rep Power
    223
    Quote Originally Posted by dhicks View Post
    And double Hmm. After a reimage, non of the media suite machines want to boot. This isn't looking good...

    --
    David Hicks
    What file format are your images in? (I have a mental image of the RIS groveler service and ZFS clashing horribly).
    Do you have before / after md5sums and what happens if you copy the image to a non-ZFS share?

  12. #10
    apaton's Avatar
    Join Date
    Jun 2009
    Location
    Kings Norton
    Posts
    283
    Thank Post
    54
    Thanked 106 Times in 87 Posts
    Rep Power
    36
    Things still looking good this end. I'm running two VMware vSphere Windows XP VM on a dedup NFS mount.
    Also checked some iso files with "digest -a md5 <filename>"

    OpenSolaris Build 131.

    Andy

  13. #11

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,624
    Thank Post
    1,240
    Thanked 778 Times in 675 Posts
    Rep Power
    235
    Quote Originally Posted by pete View Post
    What file format are your images in?
    Workstation images are created / restored from a Samba file share with Partimage, included on SystemRescueCD. Disk images written after the move to Solaris have a tendancy to be corrupted, while disk images created before that and simply copied over to the new server seem to be fine. The issue seems to be in writing data reliably to the Samba share on Solaris rather than the data storage itself - this could be something to do with disk I/O performance, or network I/O performance, or something else entirely.

    --
    David Hicks

  14. #12

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,624
    Thank Post
    1,240
    Thanked 778 Times in 675 Posts
    Rep Power
    235
    I'm copying my workstation images off the backup server on to an external harddrive in preparation for reinstalling with Ubuntu 9.10 Server. After removing some old directories of files, I notice that they are still being listed as present in the Samba share. The file system not being sure what has / hasn't been deleted would go quite a long way to explaining why disk images are getting currupted when they are being updated.

    --
    David Hicks

  15. #13


    Join Date
    Dec 2005
    Location
    In the server room, with the lead pipe.
    Posts
    4,638
    Thank Post
    275
    Thanked 778 Times in 605 Posts
    Rep Power
    223
    There's always Debian-BSD? Debian GNU/kFreeBSD if you want ZFS with a debian userland and packages. Though if you're having issues working out whether it's samba or samba+solaris or samba+solaris+ZFS causing corruption, you may not want it.

    Nexentastor bumped the limit of the community edition to 12TB recently as well. NexentaStor License Versions

  16. Thanks to pete from:

    dhicks (14th April 2010)

  17. #14

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,624
    Thank Post
    1,240
    Thanked 778 Times in 675 Posts
    Rep Power
    235
    Quote Originally Posted by pete View Post
    There's always Debian-BSD?
    I know there's a ZFS-on-Linux project, but it doesn't seem to support deduplication yet. I'm just trying out a couple of FUSE-based file systems: SDFS (Opendedup), which doesn't seem to work on drives over 250GB, and LessFS, which looks a bit more promising. I'm now back to a Ubuntu 9.10 (64 bit version) and have an mdadm RAID-5 array of 6 500GB harddrives with an ext3 file system on which acts as the base storage for LessFS' FUSE-based file system. I'm just going to set up Samba using the LessFS file system and see what happens.

    --
    David Hicks

  18. #15

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,624
    Thank Post
    1,240
    Thanked 778 Times in 675 Posts
    Rep Power
    235
    Quote Originally Posted by dhicks View Post
    I'm just going to set up Samba using the LessFS file system and see what happens.
    Ah ha - turns out that if you're sharing any FUSE-based file system with Samba you need to remember to pass the "allow_other" option through to the FUSE file system when you mount it, otherwise Samba comes along, tries to mount the file share as the authenticated user and fails, giving a "path can not be found" message in Windows, which is very confusing. So, for instance, I have this line in /etc/rc.local to mount the LessFS file system:

    /usr/local/bin/lessfs /etc/lessfs.cfg /data -o allow_other

    You also need to edit /etc/fuse.conf and make sure the line "user_allow_other" is un-commented, which it isn't by default.

    Samba is configured as detailed in this other post:

    Configuring Samba

    The content of my /etc/lessfs.cfg file look like so:

    Code:
    BLOCKDATA_PATH=/mnt/md0/dta
    BLOCKDATA_BS=1048576
    #
    BLOCKUSAGE_PATH=/mnt/md0/mta
    BLOCKUSAGE_BS=1048576
    #
    DIRENT_PATH=/mnt/md0/mta
    DIRENT_BS=1048576
    #
    FILEBLOCK_PATH=/mnt/md0/mta
    FILEBLOCK_BS=1048576
    #
    META_PATH=/mnt/md0/mta
    META_BS=1048576
    #
    HARDLINK_PATH=/mnt/md0/mta
    HARDLINK_BS=1048576
    #
    SYMLINK_PATH=/mnt/md0/mta
    SYMLINK_BS=1048576
    #
    LISTEN_IP=127.0.0.1
    LISTEN_PORT=100
    MAX_THREADS=2
    # Cache size in megabytes.
    CACHESIZE=128
    # Flush data to disk after X seconds.
    COMMIT_INTERVAL=30
    #
    MINSPACEFREE=10
    # Consider SYNC_RELAX=1 or SYNC_RELAX=2 when exporting lessfs with NFS.
    SYNC_RELAX=0
    ENCRYPT_DATA=off
    # ENCRYPT_META on or off, default is off
    # Requires ENCRYPT_DATA=on and is otherwise ignored.
    ENCRYPT_META=off

    /mnt/md0 is simply a mount point for a mdadm array containing an ext3 filesystem, mounted via /etc/fstab:

    Code:
    # <file system> <mount point>   <type>          <options>                       <dump>  <pass>
    proc            /proc           proc            defaults                        0       0
    /dev/sdg1       /               ext2            errors=remount-ro               0       1
    /dev/sdg5       none            swap            sw                              0       0
    /dev/scd0       /media/cdrom0   udf,iso9660     user,noauto,exec,utf8           0       0
    /dev/fd0        /media/floppy0  auto            rw,user,noauto,exec,utf8        0       0
    /dev/md0        /mnt/md0        ext3            defaults                        0       0
    --
    David Hicks

SHARE:
+ Post New Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. Replies: 20
    Last Post: 5th February 2010, 06:18 PM
  2. 7110 Compression
    By john in forum Hardware
    Replies: 6
    Last Post: 13th June 2009, 06:23 PM
  3. Replies: 3
    Last Post: 21st April 2009, 05:32 PM
  4. Compression Software
    By FN-GM in forum General Chat
    Replies: 6
    Last Post: 6th April 2009, 06:41 PM
  5. NT Backup compression
    By exsupport in forum Windows
    Replies: 0
    Last Post: 27th October 2008, 08:14 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •