Plagiarism

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 27684 Experts worldwide ranked by ideXlab platform

Bela Gipp - One of the best experts on this subject based on the ideXlab platform.

  • reducing computational effort for Plagiarism detection by using citation characteristics to limit retrieval space
    ACM IEEE Joint Conference on Digital Libraries, 2014
    Co-Authors: Norman Meuschke, Bela Gipp
    Abstract:

    This paper proposes a hybrid approach to Plagiarism detection in academic documents that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised Plagiarism forms. Currently available software for Plagiarism detection exclusively performs text string comparisons. These systems find copies, but fail to identify disguised Plagiarism, such as paraphrases, translations, or idea Plagiarism. Detection approaches that consider semantic similarity on word and sentence level exist and have consistently achieved higher detection accuracy for disguised Plagiarism forms compared to character-based approaches. However, the high computational effort of these semantic approaches makes them infeasible for use in real-world Plagiarism detection scenarios. The proposed hybrid approach uses citation-based methods as a preliminary heuristic to reduce the retrieval space with a relatively low loss in detection accuracy. This preliminary step can then be followed by a computationally more expensive semantic and character-based analysis. We show that such a hybrid approach allows semantic Plagiarism detection to become feasible even on large collections for the first time.

  • citation based Plagiarism detection practicability on a large scale scientific corpus
    Journal of the Association for Information Science and Technology, 2014
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger
    Abstract:

    The automated detection of Plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated Plagiarism detection approaches is their dependence on high character-based similarity. As a result, heavily disguised Plagiarism forms, such as paraphrases, translated Plagiarism, or structural and idea Plagiarism, remain undetected. A recently proposed language-independent approach to Plagiarism detection, Citation-based Plagiarism Detection (CbPD), allows the detection of semantic similarity even in the absence of text overlap by analyzing the citation placement in a document's full text to determine similarity. This article evaluates the performance of CbPD in detecting Plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles. We benchmark CbPD against two character-based detection approaches using a ground truth approximated in a user study. Our evaluation shows that the citation-based approach achieves superior ranking performance for heavily disguised Plagiarism forms. Additionally, we demonstrate CbPD to be computationally more efficient than character-based approaches. Finally, upon combining the citation-based with the traditional character-based document similarity visualization methods in a hybrid detection prototype, we observe a reduction in the required user effort for document verification.

  • citation based Plagiarism detection detecting disguised and cross language Plagiarism using citation pattern analysis
    2014
    Co-Authors: Bela Gipp
    Abstract:

    Plagiarism is a problem with far-reaching consequences for the sciences. However, even todays best software-based systems can only reliably identify copy & paste Plagiarism. Disguised Plagiarism forms, including paraphrased text, cross-language Plagiarism, as well as structural and idea Plagiarism often remain undetected. This weakness of current systems results in a large percentage of scientific Plagiarism going undetected. Bela Gipp provides an overview of the state-of-the art in Plagiarism detection and an analysis of why these approaches fail to detect disguised Plagiarism forms. The author proposes Citation-based Plagiarism Detection to address this shortcoming. Unlike character-based approaches, this approach does not rely on text comparisons alone, but analyzes citation patterns within documents to form a language-independent "semantic fingerprint" for similarity assessment. The practicability of Citation-based Plagiarism Detection was proven by its capability to identify so-far non-machine detectable Plagiarism in scientific publications.

  • web based demonstration of semantic similarity detection using citation pattern visualization for a cross language Plagiarism case
    International Conference on Enterprise Information Systems, 2014
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger, Jim Pitman, Andreas Nurnberger
    Abstract:

    In a previous paper, we showed that analyzing citation patterns in the well-known plagiarized thesis by K. T. zu Guttenberg clearly outperformed current detection methods in identifying cross-language Plagiarism. However, the experiment was a proof of concept and we did not provide a prototype. This paper presents a fully functional, web-based visualization of citation patterns for this verified cross-language Plagiarism case, allowing the user to interactively experience the benefits of citation pattern analysis for Plagiarism detection. Using examples from the Guttenberg Plagiarism case, we demonstrate that the citation pattern visualization reduces the required examiner effort to verify the extent of Plagiarism.

  • demonstration of citation pattern analysis for Plagiarism detection
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2013
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger, Mario Lipinski, Andreas Nurnberger
    Abstract:

    Limitations of Plagiarism Detection Systems State-of-the-art Plagiarism detection approaches capably identify copy & paste and to some extent slightly modified Plagiarism. However, they cannot reliably identify strongly disguised Plagiarism forms, including paraphrases, translated Plagiarism, and idea Plagiarism, which are forms of Plagiarism more commonly found in scientific texts. This weakness of current systems results in a large fraction of today’s scientific Plagiarism going undetected.

Norman Meuschke - One of the best experts on this subject based on the ideXlab platform.

  • reducing computational effort for Plagiarism detection by using citation characteristics to limit retrieval space
    ACM IEEE Joint Conference on Digital Libraries, 2014
    Co-Authors: Norman Meuschke, Bela Gipp
    Abstract:

    This paper proposes a hybrid approach to Plagiarism detection in academic documents that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised Plagiarism forms. Currently available software for Plagiarism detection exclusively performs text string comparisons. These systems find copies, but fail to identify disguised Plagiarism, such as paraphrases, translations, or idea Plagiarism. Detection approaches that consider semantic similarity on word and sentence level exist and have consistently achieved higher detection accuracy for disguised Plagiarism forms compared to character-based approaches. However, the high computational effort of these semantic approaches makes them infeasible for use in real-world Plagiarism detection scenarios. The proposed hybrid approach uses citation-based methods as a preliminary heuristic to reduce the retrieval space with a relatively low loss in detection accuracy. This preliminary step can then be followed by a computationally more expensive semantic and character-based analysis. We show that such a hybrid approach allows semantic Plagiarism detection to become feasible even on large collections for the first time.

  • citation based Plagiarism detection practicability on a large scale scientific corpus
    Journal of the Association for Information Science and Technology, 2014
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger
    Abstract:

    The automated detection of Plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated Plagiarism detection approaches is their dependence on high character-based similarity. As a result, heavily disguised Plagiarism forms, such as paraphrases, translated Plagiarism, or structural and idea Plagiarism, remain undetected. A recently proposed language-independent approach to Plagiarism detection, Citation-based Plagiarism Detection (CbPD), allows the detection of semantic similarity even in the absence of text overlap by analyzing the citation placement in a document's full text to determine similarity. This article evaluates the performance of CbPD in detecting Plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles. We benchmark CbPD against two character-based detection approaches using a ground truth approximated in a user study. Our evaluation shows that the citation-based approach achieves superior ranking performance for heavily disguised Plagiarism forms. Additionally, we demonstrate CbPD to be computationally more efficient than character-based approaches. Finally, upon combining the citation-based with the traditional character-based document similarity visualization methods in a hybrid detection prototype, we observe a reduction in the required user effort for document verification.

  • web based demonstration of semantic similarity detection using citation pattern visualization for a cross language Plagiarism case
    International Conference on Enterprise Information Systems, 2014
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger, Jim Pitman, Andreas Nurnberger
    Abstract:

    In a previous paper, we showed that analyzing citation patterns in the well-known plagiarized thesis by K. T. zu Guttenberg clearly outperformed current detection methods in identifying cross-language Plagiarism. However, the experiment was a proof of concept and we did not provide a prototype. This paper presents a fully functional, web-based visualization of citation patterns for this verified cross-language Plagiarism case, allowing the user to interactively experience the benefits of citation pattern analysis for Plagiarism detection. Using examples from the Guttenberg Plagiarism case, we demonstrate that the citation pattern visualization reduces the required examiner effort to verify the extent of Plagiarism.

  • demonstration of citation pattern analysis for Plagiarism detection
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2013
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger, Mario Lipinski, Andreas Nurnberger
    Abstract:

    Limitations of Plagiarism Detection Systems State-of-the-art Plagiarism detection approaches capably identify copy & paste and to some extent slightly modified Plagiarism. However, they cannot reliably identify strongly disguised Plagiarism forms, including paraphrases, translated Plagiarism, and idea Plagiarism, which are forms of Plagiarism more commonly found in scientific texts. This weakness of current systems results in a large fraction of today’s scientific Plagiarism going undetected.

  • citation pattern matching algorithms for citation based Plagiarism detection greedy citation tiling citation chunking and longest common citation sequence
    Document Engineering, 2011
    Co-Authors: Bela Gipp, Norman Meuschke
    Abstract:

    Plagiarism Detection Systems have been developed to locate instances of Plagiarism e.g. within scientific papers. Studies have shown that the existing approaches deliver reasonable results in identifying copy&paste Plagiarism, but fail to detect more sophisticated forms such as paraphrased Plagiarism, translation Plagiarism or idea Plagiarism. The authors of this paper demonstrated in recent studies that the detection rate can be significantly improved by not only relying on text analysis, but by additionally analyzing the citations of a document. Citations are valuable language independent markers that are similar to a fingerprint. In fact, our examinations of real world cases have shown that the order of citations in a document often remains similar even if the text has been strongly paraphrased or translated in order to disguise Plagiarism. This paper introduces three algorithms and discusses their suitability for the purpose of citation-based Plagiarism detection. Due to the numerous ways in which Plagiarism can occur, these algorithms need to be versatile. They must be capable of detecting transpositions, scaling and combinations in a local and global form. The algorithms are coined Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence. The evaluation showed that if these algorithms are combined, common forms of Plagiarism can be detected reliably.

Corinna Breitinger - One of the best experts on this subject based on the ideXlab platform.

  • citation based Plagiarism detection practicability on a large scale scientific corpus
    Journal of the Association for Information Science and Technology, 2014
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger
    Abstract:

    The automated detection of Plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated Plagiarism detection approaches is their dependence on high character-based similarity. As a result, heavily disguised Plagiarism forms, such as paraphrases, translated Plagiarism, or structural and idea Plagiarism, remain undetected. A recently proposed language-independent approach to Plagiarism detection, Citation-based Plagiarism Detection (CbPD), allows the detection of semantic similarity even in the absence of text overlap by analyzing the citation placement in a document's full text to determine similarity. This article evaluates the performance of CbPD in detecting Plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles. We benchmark CbPD against two character-based detection approaches using a ground truth approximated in a user study. Our evaluation shows that the citation-based approach achieves superior ranking performance for heavily disguised Plagiarism forms. Additionally, we demonstrate CbPD to be computationally more efficient than character-based approaches. Finally, upon combining the citation-based with the traditional character-based document similarity visualization methods in a hybrid detection prototype, we observe a reduction in the required user effort for document verification.

  • web based demonstration of semantic similarity detection using citation pattern visualization for a cross language Plagiarism case
    International Conference on Enterprise Information Systems, 2014
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger, Jim Pitman, Andreas Nurnberger
    Abstract:

    In a previous paper, we showed that analyzing citation patterns in the well-known plagiarized thesis by K. T. zu Guttenberg clearly outperformed current detection methods in identifying cross-language Plagiarism. However, the experiment was a proof of concept and we did not provide a prototype. This paper presents a fully functional, web-based visualization of citation patterns for this verified cross-language Plagiarism case, allowing the user to interactively experience the benefits of citation pattern analysis for Plagiarism detection. Using examples from the Guttenberg Plagiarism case, we demonstrate that the citation pattern visualization reduces the required examiner effort to verify the extent of Plagiarism.

  • demonstration of citation pattern analysis for Plagiarism detection
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2013
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger, Mario Lipinski, Andreas Nurnberger
    Abstract:

    Limitations of Plagiarism Detection Systems State-of-the-art Plagiarism detection approaches capably identify copy & paste and to some extent slightly modified Plagiarism. However, they cannot reliably identify strongly disguised Plagiarism forms, including paraphrases, translated Plagiarism, and idea Plagiarism, which are forms of Plagiarism more commonly found in scientific texts. This weakness of current systems results in a large fraction of today’s scientific Plagiarism going undetected.

Andreas Nurnberger - One of the best experts on this subject based on the ideXlab platform.

  • web based demonstration of semantic similarity detection using citation pattern visualization for a cross language Plagiarism case
    International Conference on Enterprise Information Systems, 2014
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger, Jim Pitman, Andreas Nurnberger
    Abstract:

    In a previous paper, we showed that analyzing citation patterns in the well-known plagiarized thesis by K. T. zu Guttenberg clearly outperformed current detection methods in identifying cross-language Plagiarism. However, the experiment was a proof of concept and we did not provide a prototype. This paper presents a fully functional, web-based visualization of citation patterns for this verified cross-language Plagiarism case, allowing the user to interactively experience the benefits of citation pattern analysis for Plagiarism detection. Using examples from the Guttenberg Plagiarism case, we demonstrate that the citation pattern visualization reduces the required examiner effort to verify the extent of Plagiarism.

  • demonstration of citation pattern analysis for Plagiarism detection
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2013
    Co-Authors: Bela Gipp, Norman Meuschke, Corinna Breitinger, Mario Lipinski, Andreas Nurnberger
    Abstract:

    Limitations of Plagiarism Detection Systems State-of-the-art Plagiarism detection approaches capably identify copy & paste and to some extent slightly modified Plagiarism. However, they cannot reliably identify strongly disguised Plagiarism forms, including paraphrases, translated Plagiarism, and idea Plagiarism, which are forms of Plagiarism more commonly found in scientific texts. This weakness of current systems results in a large fraction of today’s scientific Plagiarism going undetected.

Joeran Beel - One of the best experts on this subject based on the ideXlab platform.

  • comparative evaluation of text and citation based Plagiarism detection approaches using guttenplag
    ACM IEEE Joint Conference on Digital Libraries, 2011
    Co-Authors: Bela Gipp, Norman Meuschke, Joeran Beel
    Abstract:

    Various approaches for Plagiarism detection exist. All are based on more or less sophisticated text analysis methods such as string matching, fingerprinting or style comparison. In this paper a new approach called Citation-based Plagiarism Detection is evaluated using a doctoral thesis, in which a volunteer crowd-sourcing project called GuttenPlag identified substantial amounts of Plagiarism through careful manual inspection. This new approach is able to identify similar and plagiarized documents based on the citations used in the text. It is shown that citation-based Plagiarism detection performs significantly better than text-based procedures in identifying strong paraphrasing, translation and some idea Plagiarism. Detection rates can be improved by combining citation-based with text-based Plagiarism detection.