Computer Science – Databases
Scientific paper
2007-02-01
Computer Science
Databases
Sixth International Conference on Data Mining (ICDM'06), Dec 2006
Scientific paper
10.1109/ICDM.2006.126
We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.
Gehrke Johannes
Ginsparg Paul
Sorokina Daria
Warner Simeon
No associations
LandOfFree
Plagiarism Detection in arXiv does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Plagiarism Detection in arXiv, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Plagiarism Detection in arXiv will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-675921