Duplicate file replacer
Further to my recent post about my battles with duplicate files, does anyone know of a piece of software which allows you to choose a location for a group of duplicates and then it deletes all the duplicates and replaces then with shortcuts to the location you specified?
The reason I ask is that doing it manually is fine for the few larger suspects, but as there are currently 236,000 files deemed duplicates, it may take a few thousand years for me to do it by hand...
I don't know of a third party app that'll do it, but storage server 2008 has a single instance storage option (like the exchange one) built in.
I'm going to start looking at it this month, so will report back if its any good!
From what I've read, it isn't included in 2008, only Storage Server, Unified Storage Server and Home Server...
Originally Posted by Domino
Okay, so maybe I could have put that clearer...
Hmm, that's a tricky one - pretty much all apps that clean up duplicates assume you want them gone because you've forgetfully made copies all over the shop, not that you actually want the files still linked.
One of the best apps I've used is Duplicate Cleaner (imaginative name) - Duplicate Cleaner - Find Duplicate files - Freeware it's fast, has a decent interface and a great selection assistant for choosing which dupes to wipe and which to keep, great when people insist on creating parallel folder structures.
Realistically, I'd assume a lot of your dupes are caused by people doing the same and copying the entire tree across - so you end up with what would be 1 shortcut followed by say 200 dupe files under it. Might be worth giving it a whizz over with this and using the sort options to verify, then script the copying of a shortcut to your folder (\\server\teachers\pubdocs is the classic one here) after nuking 100% CRC matched dupes?
Duplicate Cleaner can export the list of matched files to CSV - with a bit of a quick tidy up and search/replace in Excel you ought to be able to create a batch file to sort everything out quite easily?
Edit : Just realised I read 'free' into your question when it wasn't necessarily there - NoClone is the classic shareware duplicate file remover, it definitely has an option for creating shortcuts and NTFS hard links on removal.
Lol, that'll teach me to read it too fast. It isn't really an option though, as you can't use it for other, non-storage, tasks - which we need also.
Originally Posted by Domino
Apparently Moodle 2.0 has a filesystem that does de-duplication.
Just need to link it into an smb/cifs frontend like alfresco
i know of a clever software which is free that helps you identify space being wasted accross the network, duplicates, files no-one is using anymore and files that should not be on there -eg holiday snaps, look at MailMeter File Archiver - File Archiving - Saving Storage Space
It's actually dead easy to write a script to do this, Just calculate the MD5 checksum of each file, place that file in a central repository and create a hard link to the file from the originla location. If you find a duplicate file just create a hard link, skip the copy stage.
Originally Posted by localzuk