ZFS - Deduplication (Dedup) Vs Compression
Some more basic tests on ZFS Dedup, this time to see what it like with real office data.
Setup OpenSolaris build 131
Sun X4200, 2 x Dual Core Opteron 2.6Ghz, 8Gb Ram, 4 x 73Gb SAS 10Krpm
Created two ZFS data sets, one with dedup enabled the other with compression. (default level)
Code:
root@osol:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
compress 72K 66.9G 21K /compress
dedupe 72K 66.9G 21K /dedupe
root@osol:~# zfs set compression=on compress
root@osol:~# zfs set dedup=on dedupe
Real data transfered to the drives.
I loaded the my company project/Software folders, 68,000 files (Visio/PDF/Project/Word/OpenOffice/Excel,ISO's... ) total of 38.9Gb
Load times, copying files from local UFS filesystem to ZFS dataset.
Copy data to a ZFS dedup dataset
Code:
root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /dedupe/ ; tar xf - )
real 19:51.930407394
user 5.807881662
sys 1:48.025965013
38.8GB 0:19:51 [33.3MB/s]
Copy data to a ZFS compressed dataset
Code:
root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /compress/ ; tar xf - )
real 18:46.544321180
user 3.368262960
sys 1:52.065809786
38.8GB 0:18:46 [35.3MB/s]
The ZFS dedup dataset was 66 seconds slower than the compress volume for data transfer.
Let see how much space we saved for both methods
ZFS Dedup
Code:
root@osol:/ufs# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
compress 68G 36.1G 31.9G 53% 1.00x ONLINE -
dedupe 68G 38.4G 29.6G 56% 1.02x ONLINE -
ZFS Compression
Code:
root@osol:/ufs# zfs get compressratio compress
NAME PROPERTY VALUE SOURCE
compress compressratio 1.08x -
Conclusion
The compressed dataset did a better job than dedup by a 6% saving in storage used. Therefore ZFS dedup doesn't deliver any real benefits for real "office data" and compression is a better offer. Also check this thread about compression on 7110.
Now why would you want to dedup? Well just look at my dedup ratio of 2.28 for a NFS share with VMware, now this is exciting!
Code:
root@osol:~$ zpool list vm-dedupe
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
vm-dedupe 68G 16.1G 51.9G 23% 2.28x ONLINE -
Therefore I can only say for deduplication "Some data is more equal than others"
It is possible to have dedup and compressions enabled on ZFS dataset, but I haven't tested that combination yet.
Andy
Ps Should Deduplication be shortened to "dedupe" or "dedup" ? I can't decided which.