Some more basic tests on ZFS Dedup, this time to see what it like with real office data.
Setup OpenSolaris build 131
Sun X4200, 2 x Dual Core Opteron 2.6Ghz, 8Gb Ram, 4 x 73Gb SAS 10Krpm
Created two ZFS data sets, one with dedup enabled the other with compression. (default level)
Real data transfered to the drives.Code:root@osol:~# zfs list NAME USED AVAIL REFER MOUNTPOINT compress 72K 66.9G 21K /compress dedupe 72K 66.9G 21K /dedupe root@osol:~# zfs set compression=on compress root@osol:~# zfs set dedup=on dedupe
I loaded the my company project/Software folders, 68,000 files (Visio/PDF/Project/Word/OpenOffice/Excel,ISO's... ) total of 38.9Gb
Load times, copying files from local UFS filesystem to ZFS dataset.
Copy data to a ZFS dedup dataset
Copy data to a ZFS compressed datasetCode:root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /dedupe/ ; tar xf - ) real 19:51.930407394 user 5.807881662 sys 1:48.025965013 38.8GB 0:19:51 [33.3MB/s]
The ZFS dedup dataset was 66 seconds slower than the compress volume for data transfer.Code:root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /compress/ ; tar xf - ) real 18:46.544321180 user 3.368262960 sys 1:52.065809786 38.8GB 0:18:46 [35.3MB/s]
Let see how much space we saved for both methods
ZFS Dedup
Code:root@osol:/ufs# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT compress 68G 36.1G 31.9G 53% 1.00x ONLINE - dedupe 68G 38.4G 29.6G 56% 1.02x ONLINE -
ZFS Compression
Code:root@osol:/ufs# zfs get compressratio compress NAME PROPERTY VALUE SOURCE compress compressratio 1.08x -
Conclusion
The compressed dataset did a better job than dedup by a 6% saving in storage used. Therefore ZFS dedup doesn't deliver any real benefits for real "office data" and compression is a better offer. Also check this thread about compression on 7110.
Now why would you want to dedup? Well just look at my dedup ratio of 2.28 for a NFS share with VMware, now this is exciting!
Therefore I can only say for deduplication "Some data is more equal than others"Code:root@osol:~$ zpool list vm-dedupe NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT vm-dedupe 68G 16.1G 51.9G 23% 2.28x ONLINE -
It is possible to have dedup and compressions enabled on ZFS dataset, but I haven't tested that combination yet.
Andy
Ps Should Deduplication be shortened to "dedupe" or "dedup" ? I can't decided which.
Last edited by apaton; 08-02-2010 at 09:12 AM. Reason: Typo
I vote for dedupe
We're running compression on our 7110 and currently achieve a compression ratio of roughly 1.3x. We've got a mix of both LZJB and GZIP-2 depending on the share and the 7110 barley breaks a sweat
The data is a mix of VMs/Home Directories/General storage. I imagine the dedupe will give big benefits on things like shared storage and home directories where we've got people storing the same file across multiple locations.
I'm going to try copying our workstation disk images (all of which are Windows XP, same software installed, just different workstations names) to our new OpenSolaris file server tomorrow, closely followed by the daily backups of our main file servers. I'll see how much of a space saving we get with that - with my file-level deduplicating backup system we've been able to store daily file server backups for the past two years, so hopefully we'll do better than that.
--
David Hicks
Hmm. Since moving our machine images over to the ZFS filesystem I've kept getting image corruption issues. This could very well be to do with the client side of things rather than the server, but until I've figure out exactly what the problem is and how to stop it happening I'm going to be careful what I trust to the backup server for now.
--
David Hicks
apaton (07-04-2010)
...Does anyone want to recommend a deduplicating file system for Linux instead? I plan to move the backup server back to an Ubuntu Server install, but I see there's a couple of add-on filesystems available that do deduplication - ZFS for Linux, SDFS, maybe others that Google didn't find. Anyone any recommendations? SDFS looks pretty good, I'll probably try that first.
--
David Hicks
Things still looking good this end. I'm running two VMware vSphere Windows XP VM on a dedup NFS mount.
Also checked some iso files with "digest -a md5 <filename>"
OpenSolaris Build 131.
Andy
Workstation images are created / restored from a Samba file share with Partimage, included on SystemRescueCD. Disk images written after the move to Solaris have a tendancy to be corrupted, while disk images created before that and simply copied over to the new server seem to be fine. The issue seems to be in writing data reliably to the Samba share on Solaris rather than the data storage itself - this could be something to do with disk I/O performance, or network I/O performance, or something else entirely.
--
David Hicks
I'm copying my workstation images off the backup server on to an external harddrive in preparation for reinstalling with Ubuntu 9.10 Server. After removing some old directories of files, I notice that they are still being listed as present in the Samba share. The file system not being sure what has / hasn't been deleted would go quite a long way to explaining why disk images are getting currupted when they are being updated.
--
David Hicks
There's always Debian-BSD? Debian GNU/kFreeBSD if you want ZFS with a debian userland and packages. Though if you're having issues working out whether it's samba or samba+solaris or samba+solaris+ZFS causing corruption, you may not want it.
Nexentastor bumped the limit of the community edition to 12TB recently as well. NexentaStor License Versions
dhicks (14-04-2010)
I know there's a ZFS-on-Linux project, but it doesn't seem to support deduplication yet. I'm just trying out a couple of FUSE-based file systems: SDFS (Opendedup), which doesn't seem to work on drives over 250GB, and LessFS, which looks a bit more promising. I'm now back to a Ubuntu 9.10 (64 bit version) and have an mdadm RAID-5 array of 6 500GB harddrives with an ext3 file system on which acts as the base storage for LessFS' FUSE-based file system. I'm just going to set up Samba using the LessFS file system and see what happens.
--
David Hicks
Ah ha - turns out that if you're sharing any FUSE-based file system with Samba you need to remember to pass the "allow_other" option through to the FUSE file system when you mount it, otherwise Samba comes along, tries to mount the file share as the authenticated user and fails, giving a "path can not be found" message in Windows, which is very confusing. So, for instance, I have this line in /etc/rc.local to mount the LessFS file system:
/usr/local/bin/lessfs /etc/lessfs.cfg /data -o allow_other
You also need to edit /etc/fuse.conf and make sure the line "user_allow_other" is un-commented, which it isn't by default.
Samba is configured as detailed in this other post:
Configuring Samba
The content of my /etc/lessfs.cfg file look like so:
Code:BLOCKDATA_PATH=/mnt/md0/dta BLOCKDATA_BS=1048576 # BLOCKUSAGE_PATH=/mnt/md0/mta BLOCKUSAGE_BS=1048576 # DIRENT_PATH=/mnt/md0/mta DIRENT_BS=1048576 # FILEBLOCK_PATH=/mnt/md0/mta FILEBLOCK_BS=1048576 # META_PATH=/mnt/md0/mta META_BS=1048576 # HARDLINK_PATH=/mnt/md0/mta HARDLINK_BS=1048576 # SYMLINK_PATH=/mnt/md0/mta SYMLINK_BS=1048576 # LISTEN_IP=127.0.0.1 LISTEN_PORT=100 MAX_THREADS=2 # Cache size in megabytes. CACHESIZE=128 # Flush data to disk after X seconds. COMMIT_INTERVAL=30 # MINSPACEFREE=10 # Consider SYNC_RELAX=1 or SYNC_RELAX=2 when exporting lessfs with NFS. SYNC_RELAX=0 ENCRYPT_DATA=off # ENCRYPT_META on or off, default is off # Requires ENCRYPT_DATA=on and is otherwise ignored. ENCRYPT_META=off
/mnt/md0 is simply a mount point for a mdadm array containing an ext3 filesystem, mounted via /etc/fstab:
--Code:# <file system> <mount point> <type> <options> <dump> <pass> proc /proc proc defaults 0 0 /dev/sdg1 / ext2 errors=remount-ro 0 1 /dev/sdg5 none swap sw 0 0 /dev/scd0 /media/cdrom0 udf,iso9660 user,noauto,exec,utf8 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto,exec,utf8 0 0 /dev/md0 /mnt/md0 ext3 defaults 0 0
David Hicks
There are currently 1 users browsing this thread. (0 members and 1 guests)