Anyone know of any software that will look at two documents (well in end going to be 100+ documents but lets start with two) and find common phrases between the documents. Problem is I do not know what the phrases are. I think might be asking for too much and have to do common words and then compare the context and start building a list of phrases to check for. Any ideas greatly received.
russdev (20th September 2011)
Sadly I don't think that will work as phrases are not going to be in their database (almost certainly not as they are chat logs publicly available before someone says something). It has to look at documents we have and compare them against each other as said I think I am being to ambitious.
I'll bet it exists, probably used in Uni's for plagiarism checking. I'll bet it costs £Mega though.
Who's running it? You or someone non-technical?
Like Tom alludes, crude functionality is trivial to achieve quickly if you've access to decent tools, but if you want a gui it gets trickier.
There's similarity-tester in the Ubuntu universe repos, designed for people nicking code, but also works on natural language.
You'd probably want to prepare the (Microsoft?) documents beforehand so they were readable by running them through wv.
Quite a lot of VLEs have plagiarism modules available.
@tom time to brush up on my perl then...
It will just be me so technical is fine. Has to be standalone as not running it as part of a LP I will check the similarity-tester...
Crude is fine at moment to be honest because of content sounds cagey doesn't it I don't fancy reading all 500+ documents..
There are currently 1 users browsing this thread. (0 members and 1 guests)