Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model


The KIPS Transactions:PartA, Vol. 16, No. 6, pp. 453-462, Dec. 2009
10.3745/KIPSTA.2009.16.6.453,   PDF Download:

Abstract

Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist(Pa,Pb) to measure the similarity of Pa and Pb. Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist(Pa,Pb) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist(Pa,Pb) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
J. H. Ji, G. Woo, H. G. Cho, "Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model," The KIPS Transactions:PartA, vol. 16, no. 6, pp. 453-462, 2009. DOI: 10.3745/KIPSTA.2009.16.6.453.

[ACM Style]
Jeong Hoon Ji, Gyun Woo, and Hwan Gue Cho. 2009. Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model. The KIPS Transactions:PartA, 16, 6, (2009), 453-462. DOI: 10.3745/KIPSTA.2009.16.6.453.