I've run Dedupe on ZFS on a BSD Fileserver before now and I have to warn you to be careful. You need to look at your workloads and check if dedupe is really worth it to you. In my case the system overhead caused by running deduping was not worth the performance hit. So when hard drives are so cheap per GB in the long run I found it better overall not to dedupe and just throw more disk space at the problem.
I only really use it on our (120Tb) archive system currently. It is rarely accessed and using space efficiently is more important than speed of access. This system also runs with compression on some parts of the file system although this is rapidly becoming an irrelevance as file formats start to adopt compression natively (wav -> mp3 and office 97 -> office XML). Our day to day working file set (6Tb) runs uncompressed and with no deduplication.
to ZFS. The performance hit is extremely minimal* and you don't need huge amounts of RAM. Read the TechNet links below for details.
- Introduction to Data Deduplication in Windows Server 2012 « The Storage Team Blog
- Deduplication Cmdlets in Windows PowerShell
* Assuming you don't do something stupid like setting MinimumFileAgeDays to 0 which forces Windows to constantly dedupe the volume(s) that have it enabled.
For volumes with a lot of redundant data, the space savings can be massive.
@Geoff. I'm guessing your Oracle SAN was under-specced for the amount of data you were deduping? From what I have read, 1TB of deduplicated data (4KB average block size) would require around 80GB RAM to hold the dedupe tables and need a beast of a CPU to do the SHA256 checksum calculations.
Last edited by Arthur; 31st May 2013 at 11:50 AM.
Got it in production here.
33% on file server
55% on VLE server.
No performance hits either.
Long long ago I co-hosted my user data with the Windows 2000 RIS service.
This meant i got de-dupe on my user shares. It wasn't entirely painless, and about 6 months after I first encountered a non-critical problem Microsoft added a note to say that it was not a recommended configuration. I never lost data, but restores could be 'fun'. It gave me about 40% more space that I had paid for and kept backup sizes down.
I'm intrigued to see it come back into the Windows File Server role 14 years after it's initial outing. Looks like they've done a lot of work on it. Data Deduplication Overview
In this incarnation I'd be surprised if it isn't a by-product of work done making Azure VMs space efficient.
Last edited by psydii; 31st May 2013 at 12:04 PM.
I have deduplication enabled on the data drive for my win8 box at home. I see between 55 – 60 % savings.
I will be getting Server 2012 later this summer and plan to move all my shares to it. The eval tool says I’ll get 58% on my WDS / Software share and 27% on my user data share.
From an elevated PowerShell session these commands will get you the info you’re looking for.
Deduplication Cmdlets in Windows PowerShell
Just moved our staff files over to our 2012 server, and deduplication has run on the data. Total saving of 47% or 478GB.
There are currently 1 users browsing this thread. (0 members and 1 guests)