I'm replying to an old post here so, with the benifit of hindsight, I'm going to be avoiding block-level deduplication for the moment. I tried OpenSolaris on standard Intel hardware and got data corruption problems with disk images being stored on ZFS deduplicated volumes - it might well be that I'd done something wrong, but I couldn't afford to spend any more time figuring out whate the matter was. I then switched back to Linux, which was reliable, however the de-duplicating file systems available for Linux are more on the "experimental" side and run as FUSE modules, which gave dreadful performance.
Originally Posted by j17sparky
In the end, I figured the easiest thing was going to be to write my own file-level deduplication script on top of a standard EXT4 filesystem. Nice and simple. Hardware-wise (just going back to the original post), I plan to install a large Antec 1200 case stuffed with 12 2TB harddisks over the summer, running Linux, rsync and my own file-level deduplicating Python script as a backup solution. 22TB of backup storage should hopefully last us for a while.
Might be worth trying nexenta?
It's open solaris with a gui though... so might not do it?
Hmmm. The S7000 does de-duplication as standard (it uses ZFS and Open Solaris with an intuitive interface). It does block level de-dupe and you can turn it off and on at either a share or project (read template) level.
The advantage of investing in a product such as the S7000 from Oracle is that it is fully supported, installs and works in a very short space of time and has an excellent roadmap going forward (am I sounding slightly biased here?).
Ask the other edugeekers about it and see what response you get.
Don't forget that you'll lose some of your 24TB capacity to RAID and the S7000 has, because it uses ZFS for RAID, the best usable capacity of any comparative RAID system.
If you need / want to chat about it then please drop me a line on here or reach me via the Oracle main switchboard.
How much are we looking at for a S7000 Hebdenlad? Rough price is fine.
Just looking at options on a few boxes and the S7000 comes up quite a bit
See your other thread, Andy should be able to get you an accurate quote based on your requirements. ;)
EDIT: The other thread is HERE for those reading this.
As you can see Duke has pipped me to the post on a reply. He must spend all day just watching the threads on here. In terms of a price it depends on how much capacity / performance you need / want.
The S7120 starts at 12TB using 1TB drives (there is a 2TB driver version as well) and can scale out to 60 drives. Moving up the range we can support clustering for high availability, read and write optimised SSDs for massive IOPS and with the top model you will be able to scale out capacity wise to just under 3PB.
Andy on here should be able to give you a good indication of price / capacity / performance matched to your budget. I'm the technical one for the S7000 range.
If I can help any more then please let me know and if you need me to come and see you I'm not that far away from Liverpool - live near Halifax and am always glad of local work.
Microsoft and VeryPC launched their EcoVault range which is pretty well priced? VeryPC, Green PC | Broadleaf | Greenhive | Eco PC
Just a word of warning regarding the use of snapshots as a method of backing up Domain Controllers. Please remember that it is possible to really screw the Active Directory after restoring a snapshotted server. If the AD time stamps are out of sync for to long - the system will mark the server as having a degraded AD and that will effectively stop AD replication and updates. Whist not fatal - it can be a right pig to fix and involves the forced grabbing of FSMO roles and the forced removal of the server from the AD. Before adding it back in and re-allocating ther various roles.