
Reading the opensolaris forums, I notice a few people are seeing significant performance hits when turning on dedupe. (OpenSolaris Forums : [zfs-discuss] ZFS Dedup Performance ... and Re: [zfs-discuss] Troubleshooting dedup performance).
The "fix" appears to be loads of ram or an SSD for cache, though it looks like it needs work ([zfs-discuss] $100 SSD = >5x faster dedupe).
I'm building out a test vm now to see if I get similar issues.
a) Has anyone else run into this?
b) Bearing in mind the 7110s have no SSD, will it be sensible to turn on dedupe?
I've played with De-dupe, but only under OpenSolaris build 128a running on VirtualBox, I haven't done any performance testing yet. I'm building X4200 with OpenSolaris b132 next week, I will let you know how I get on with performance.
With regards the OpenSolaris forum posts listed, I'm struggling to get my head around how SSD with L2ARC is improving performance with de-dupe. L2ARC is a read cache, I thought the performance hit would be from writing de-dupe data to disk and not reading de-duped data. So some homework for me to do.
With reference to 7110, I feel the limitation will be a combination of maximum 8Gb memory and no read SSD L2ARC. (L2ARC is a secondary read cache to main system memory)
But we are still in development and the code will change, its a wait and see.
Andy
Basic performance testing of dedupe
Dedup on a Sun X4200 Opteron 2 x Dual Core 2.6, 8Gb Memory & 4 x 73Gb SAS Drives (10Krpm)
OS
Created 2 ZFS pools (dedupe,zfs) and a UFS FilesystemCode:apaton@osol:/export/home/iso/linux$ uname -a SunOS osol 5.11 snv_131 i86pc i386 i86pc Solaris
- /dedupe - ZFS with Dedup enabled (SHA256 - Verify off)
- /zfs - ZFS with Dedup disabled (Defaults)
- /ufs - UFS filesystem (Defaults)
Data files, (CentOS-4.4.ServerCD-x86_64 is duplicated 3 times) (7.5Gb)
Initial copy of files to DEDUP enable ZFS dataset.Code:apaton@osol:/export/home/iso/linux$ ls -ilh * 462 -rwxrwxrwx 1 root root 601M 2010-02-01 16:31 CentOS-4.4.ServerCD-x86_64_A.iso 463 -rwxrwxrwx 1 root root 601M 2010-02-01 16:48 CentOS-4.4.ServerCD-x86_64_B.iso 459 -rwxrwxrwx 1 root root 601M 2010-02-01 13:28 CentOS-4.4.ServerCD-x86_64.iso 457 -rwxrwxrwx 1 root root 3.6G 2010-02-01 13:27 CentOS-5.1-i386-bin-DVD.iso 455 -rwxrwxrwx 1 root root 628M 2010-02-01 13:25 rhel-3-u6-i386-es-disc2.iso 464 -rwxr-xr-x 1 apaton staff 68M 2010-02-01 23:27 rhel-3-u7-i386-as-disc1.iso 461 -rwxrwxrwx 1 root root 36M 2010-02-01 13:29 rhel-5.2-server-x86_64-dvd.iso 453 -rwxrwxrwx 1 root root 177M 2010-02-01 13:22 RHEL4-U2-x86_64-ES-disc1.iso 458 -rwxrwxrwx 1 root root 699M 2010-02-01 13:28 ubuntu-8.10-desktop-i386.iso 456 -rwxrwxrwx 1 root root 638M 2010-02-01 13:25 ubuntu-8.10-server-i386.iso
Initial copy of files default ZFS dataset.Code:apaton@osol:/export/home/iso/linux$ pfexec ptime tar cf - . | pv | ( cd /dedupe; tar xf - ) real 5:28.688240194 user 0.572326624 sys 14.953198730 7.53GB 0:05:28 [23.5MB/s]
Initial copy of files default UFS filesystem.Code:apaton@osol:/export/home/iso/linux$ pfexec ptime tar cf - . | pv | ( cd /zfs; tar xf - ) real 2:37.843718899 user 0.575755074 sys 14.397860363 7.61GB 0:02:37 [49.3MB/s]
Third copy of files to DEDUP enable ZFS dataset.Code:apaton@osol:/export/home/iso/linux$ pfexec ptime tar cf - . | pv | ( cd /ufs; tar xf - ) real 2:53.149996234 user 0.711298116 sys 14.034587988 7.61GB 0:02:53 [ 45MB/s]
SummaryCode:apaton@osol:/export/home/iso/linux$ pfexec ptime tar cf - . | pv | ( cd /dedupe/d3; tar xf - ) real 2:49.402522619 user 0.859008763 sys 11.766251301 7.53GB 0:02:49 [45.5MB/s] apaton@osol:/$ zpool list dedupe NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT dedupe 68G 6.38G 61.6G 9% 3.56x ONLINE -
The above result look conclusive that a ZFS dataset with Dedup enabled will slow things down dramatically. But notice performance on dedpue did improve on second and third copies.
I've researched on the critical performance factors for dedup and memory seems critical. The deduplication table (DDT) needs to be held in memory(ARC) or at least SSD (L2ARC/Readzilla). Simple answer more hardware the better!
This is the same findings as Pete pointed out in his post. (I now understand them better)
So therefore in my opinion the 7110 will give usable but not great performance for dedup. Sun 7310 or above will be the best option for serious dedupe performance/requirements.
Andy
Last edited by apaton; 3rd February 2010 at 12:17 AM. Reason: spelling mistake

It would be interesting to see if there was a difference in the initial write speed for lots of smaller files (but similar large overall size) instead of ISOs. For example, lots of office documents, images and audio files to represent as close to a "real world" environment as possible.

I've just been investigating using OpenSolaris for our backup server. Our backup system currently handles de-duplication at the file level, calculating a hash value for each file, but block-level de-duplication would seem rather more space-efficient. Searching Google, I came accross this post:
ZFS Deduplication : Jeff Bonwick's Blog
Which points out that you can change the hashing algorithm used from SHA256 to, say, fletcher4, but make sure you turn on verification, i.e. make sure the file system checks for hash collisions. If memory or CPU performance is an issue you could pick a hashing algorithm that produced a range that would fit better in to RAM and/or took less time to calculate.
Going by the post above, block-level de-duplication has only been available in ZFS file systems since November just gone. The OpenSolaris install available on Sun's site seems to be from back in June. Can I actually use de-duplication right now on OpenSolaris, does it install straight from the CD, or do I have to do something complicated to install de-duplication support?
--
David Hicks

Look here for newer builds: Genunix
I also found the EON nas build, but only really poked around with it to see if it worked. E O N
I suppose we could always campaign for Sun to let us bump up the ram on the 7110s. It's not like there aren't free slots in the chassis.
On ZFS I was hoping for a delayed dedupe functionality (as you can have with file-based dedupe), so it could spend the weekend/overnight sorting out duplicates.

is anyone using dedupelication ZFS with samba homedirectories? We been using samba for some time now, and I'd certainly consider a solaris migration just wondered if anyone had success here
Build 128 was first to have Dedupe, 128a was released to fix bug in Fletcher4.
Current binary release is build 131.
Upgrade to latest build release.
pfexec pkg set-authority -O http://pkg.opensolaris.org/dev/ opensolaris.org
pfexec pkg image-update
Andy
dhicks (2nd February 2010)

Originally Posted by dhicks
Okay, un-scratch the above, then, it seems it does work. If / when I get OpenSolaris working I guess I'll find out myself - no luck so far, EON booted just fine as a live CD but seems to be missing instructions on how to get it on to my harddrive, the OpenSolaris CD didn't get past GRUB, and I managed to screw up the CentOS install to run OpenSolaris as a Xen VM 3 times in one day. I'll try again tomorrow.Originally Posted by dhicks
--
David Hicks

Start here: EON ZFS Storage (NAS) (EON ZFS Storage) halfway down the page. Specifically:
After the image (eon.iso) is burned to a CD and booted. Login info is:
user: admin pass: eonstore
user: root pass: eonsolaris
Type and run the following. This script prompts the user through configuration questions like hostname, IP/DHCP, netmask, domain name and more. This step will ask questions to configure and ID the system for live image use.
# /usr/bin/setup
This step is optional but necessary if the configuration changes made are to be preserved beyond a reboot or power off. This requires a writable destination, USB or CF drive attached before the command is run. The command will facilitate formatting and installing the live image (image on the CD) to the USB or CF drive.
# /usr/bin/install.sh
This step should be done after install.sh or, to preserve configuration changes made to the image. This preserves the original image to /mnt/eonX/boot/x86.eon.orig (bootable by the OEM choice from GRUB) and saves a new default boot image to /boot/x86.eon. It will move the live image to x86.eon.1, x86.eon.2 and so on each time it is run.
# /usr/bin/updimg.sh
dhicks (3rd February 2010)
Ive found EON to be by far the easiest and most reliable for doing testing in virtualbox.
Theres a web interface for it too. No substitute for CLI quite yet but great for doing general mantainance on an OS you most likely never hardly touch
// napp-it free ZFS NAS-SAN-Server: installed quickly - ready to run - easy to manage
There are currently 1 users browsing this thread. (0 members and 1 guests)