Some more basic tests on ZFS Dedup, this time to see what it like with real office data.
Setup OpenSolaris build 131
Sun X4200, 2 x Dual Core Opteron 2.6Ghz, 8Gb Ram, 4 x 73Gb SAS 10Krpm
Created two ZFS data sets, one with dedup enabled the other with compression. (default level)
Real data transfered to the drives.Code:root@osol:~# zfs list NAME USED AVAIL REFER MOUNTPOINT compress 72K 66.9G 21K /compress dedupe 72K 66.9G 21K /dedupe root@osol:~# zfs set compression=on compress root@osol:~# zfs set dedup=on dedupe
I loaded the my company project/Software folders, 68,000 files (Visio/PDF/Project/Word/OpenOffice/Excel,ISO's... ) total of 38.9Gb
Load times, copying files from local UFS filesystem to ZFS dataset.
Copy data to a ZFS dedup dataset
Copy data to a ZFS compressed datasetCode:root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /dedupe/ ; tar xf - ) real 19:51.930407394 user 5.807881662 sys 1:48.025965013 38.8GB 0:19:51 [33.3MB/s]
The ZFS dedup dataset was 66 seconds slower than the compress volume for data transfer.Code:root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /compress/ ; tar xf - ) real 18:46.544321180 user 3.368262960 sys 1:52.065809786 38.8GB 0:18:46 [35.3MB/s]
Let see how much space we saved for both methods
Code:root@osol:/ufs# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT compress 68G 36.1G 31.9G 53% 1.00x ONLINE - dedupe 68G 38.4G 29.6G 56% 1.02x ONLINE -
Code:root@osol:/ufs# zfs get compressratio compress NAME PROPERTY VALUE SOURCE compress compressratio 1.08x -
The compressed dataset did a better job than dedup by a 6% saving in storage used. Therefore ZFS dedup doesn't deliver any real benefits for real "office data" and compression is a better offer. Also check this thread about compression on 7110.
Now why would you want to dedup? Well just look at my dedup ratio of 2.28 for a NFS share with VMware, now this is exciting!
Therefore I can only say for deduplication "Some data is more equal than others"Code:root@osol:~$ zpool list vm-dedupe NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT vm-dedupe 68G 16.1G 51.9G 23% 2.28x ONLINE -
It is possible to have dedup and compressions enabled on ZFS dataset, but I haven't tested that combination yet.
Ps Should Deduplication be shortened to "dedupe" or "dedup" ? I can't decided which.
Last edited by apaton; 8th February 2010 at 09:12 AM. Reason: Typo
I vote for dedupe
We're running compression on our 7110 and currently achieve a compression ratio of roughly 1.3x. We've got a mix of both LZJB and GZIP-2 depending on the share and the 7110 barley breaks a sweat
The data is a mix of VMs/Home Directories/General storage. I imagine the dedupe will give big benefits on things like shared storage and home directories where we've got people storing the same file across multiple locations.
apaton (7th April 2010)
Things still looking good this end. I'm running two VMware vSphere Windows XP VM on a dedup NFS mount.
Also checked some iso files with "digest -a md5
OpenSolaris Build 131.
I'm copying my workstation images off the backup server on to an external harddrive in preparation for reinstalling with Ubuntu 9.10 Server. After removing some old directories of files, I notice that they are still being listed as present in the Samba share. The file system not being sure what has / hasn't been deleted would go quite a long way to explaining why disk images are getting currupted when they are being updated.
There's always Debian-BSD? Debian GNU/kFreeBSD if you want ZFS with a debian userland and packages. Though if you're having issues working out whether it's samba or samba+solaris or samba+solaris+ZFS causing corruption, you may not want it.
Nexentastor bumped the limit of the community edition to 12TB recently as well. NexentaStor License Versions
dhicks (14th April 2010)
/usr/local/bin/lessfs /etc/lessfs.cfg /data -o allow_other
You also need to edit /etc/fuse.conf and make sure the line "user_allow_other" is un-commented, which it isn't by default.
Samba is configured as detailed in this other post:
The content of my /etc/lessfs.cfg file look like so:
Code:BLOCKDATA_PATH=/mnt/md0/dta BLOCKDATA_BS=1048576 # BLOCKUSAGE_PATH=/mnt/md0/mta BLOCKUSAGE_BS=1048576 # DIRENT_PATH=/mnt/md0/mta DIRENT_BS=1048576 # FILEBLOCK_PATH=/mnt/md0/mta FILEBLOCK_BS=1048576 # META_PATH=/mnt/md0/mta META_BS=1048576 # HARDLINK_PATH=/mnt/md0/mta HARDLINK_BS=1048576 # SYMLINK_PATH=/mnt/md0/mta SYMLINK_BS=1048576 # LISTEN_IP=127.0.0.1 LISTEN_PORT=100 MAX_THREADS=2 # Cache size in megabytes. CACHESIZE=128 # Flush data to disk after X seconds. COMMIT_INTERVAL=30 # MINSPACEFREE=10 # Consider SYNC_RELAX=1 or SYNC_RELAX=2 when exporting lessfs with NFS. SYNC_RELAX=0 ENCRYPT_DATA=off # ENCRYPT_META on or off, default is off # Requires ENCRYPT_DATA=on and is otherwise ignored. ENCRYPT_META=off
/mnt/md0 is simply a mount point for a mdadm array containing an ext3 filesystem, mounted via /etc/fstab:
proc /proc proc defaults 0 0 /dev/sdg1 / ext2 errors=remount-ro 0 1 /dev/sdg5 none swap sw 0 0 /dev/scd0 /media/cdrom0 udf,iso9660 user,noauto,exec,utf8 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto,exec,utf8 0 0 /dev/md0 /mnt/md0 ext3 defaults 0 0
There are currently 1 users browsing this thread. (0 members and 1 guests)