Vandalism

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 291 Experts worldwide ranked by ideXlab platform

Martin Potthast - One of the best experts on this subject based on the ideXlab platform.

  • Wikidata Vandalism Corpus 2015 (WDVC-15)
    2020
    Co-Authors: Benno Stein, Martin Potthast, Stefan Heindorf, Gregor Engels
    Abstract:

    The Wikidata Vandalism corpus 2015 (WDVC-15) is a corpus for the evaluation of automatic Vandalism detectors for Wikidata. For research purposes the corpus can be used free of charge.

  • debiasing Vandalism detection models at wikidata
    The Web Conference, 2019
    Co-Authors: Stefan Heindorf, Gregor Engels, Yan Scholten, Martin Potthast
    Abstract:

    Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and Vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher Vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new Vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art Vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

  • WWW - Debiasing Vandalism Detection Models at Wikidata
    The World Wide Web Conference on - WWW '19, 2019
    Co-Authors: Stefan Heindorf, Gregor Engels, Yan Scholten, Martin Potthast
    Abstract:

    Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and Vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher Vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new Vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art Vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

  • overview of the wikidata Vandalism detection task at wsdm cup 2017
    arXiv: Information Retrieval, 2017
    Co-Authors: Stefan Heindorf, Martin Potthast, Gregor Engels, Benno Stein
    Abstract:

    We report on the Wikidata Vandalism detection task at the WSDM Cup 2017. The task received five submissions for which this paper describes their evaluation and a comparison to state of the art baselines. Unlike previous work, we recast Wikidata Vandalism detection as an online learning problem, requiring participant software to predict Vandalism in near real-time. The best-performing approach achieves a ROC-AUC of 0.947 at a PR-AUC of 0.458. In particular, this task was organized as a software submission task: to maximize reproducibility as well as to foster future research and development on this task, the participants were asked to submit their working software to the TIRA experimentation platform along with the source code for open source release.

  • towards Vandalism detection in knowledge bases corpus construction and analysis
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015
    Co-Authors: Stefan Heindorf, Martin Potthast, Benno Stein, Gregor Engels
    Abstract:

    We report on the construction of the Wikidata Vandalism Corpus WDVC-2015, the first corpus for Vandalism in knowledge bases. Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia. Among Wikidata's 24 million manual revisions, we have identified more than 100,000 cases of Vandalism. An in-depth corpus analysis lays the groundwork for research and development on automatic Vandalism detection in public knowledge bases. Our analysis shows that 58% of the Vandalism revisions can be found in the textual portions of Wikidata, and the remainder in structural content, e.g., subject-predicate-object triples. Moreover, we find that some vandals also target Wikidata content whose manipulation may impact content displayed on Wikipedia, revealing potential vulnerabilities. Given today's importance of knowledge bases for information systems, this shows that public knowledge bases must be used with caution.

Kathleen R. Mckeown - One of the best experts on this subject based on the ideXlab platform.

  • got you automatic Vandalism detection in wikipedia with web based shallow syntactic semantic modeling
    International Conference on Computational Linguistics, 2010
    Co-Authors: William Yang Wang, Kathleen R. Mckeown
    Abstract:

    Discriminating Vandalism edits from non-Vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntactic-semantic modeling method, which utilizes Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect Vandalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, surpassing the results reported by major Wikipedia Vandalism detection systems.

  • " Got You! " : Automatic Vandalism Detection in Wikipedia with Web-based Shallow Syntactic-Semantic Modeling
    COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics, 2010
    Co-Authors: William Yang Wang, Kathleen R. Mckeown
    Abstract:

    Discriminating Vandalism edits from non-Vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntactic-semantic modeling method, which utiliz-es Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect van-dalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, sur-passing the results reported by major Wikipedia Vandalism detection systems.

William Yang Wang - One of the best experts on this subject based on the ideXlab platform.

  • got you automatic Vandalism detection in wikipedia with web based shallow syntactic semantic modeling
    International Conference on Computational Linguistics, 2010
    Co-Authors: William Yang Wang, Kathleen R. Mckeown
    Abstract:

    Discriminating Vandalism edits from non-Vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntactic-semantic modeling method, which utilizes Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect Vandalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, surpassing the results reported by major Wikipedia Vandalism detection systems.

  • " Got You! " : Automatic Vandalism Detection in Wikipedia with Web-based Shallow Syntactic-Semantic Modeling
    COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics, 2010
    Co-Authors: William Yang Wang, Kathleen R. Mckeown
    Abstract:

    Discriminating Vandalism edits from non-Vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntactic-semantic modeling method, which utiliz-es Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect van-dalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, sur-passing the results reported by major Wikipedia Vandalism detection systems.

Andrew G West - One of the best experts on this subject based on the ideXlab platform.

  • wikipedia Vandalism detection combining natural language metadata and reputation features
    International Conference on Computational Linguistics, 2011
    Co-Authors: Thomas B Adler, Luca De Alfaro, Santiago M Molavelasco, Paolo Rosso, Andrew G West
    Abstract:

    Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of Vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia Vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia Vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh Vandalism, and for the task of locating Vandalism in the complete set of Wikipedia revisions.

  • spatio temporal analysis of wikipedia metadata and the stiki anti Vandalism tool
    International Symposium on Wikis and Open Collaboration, 2010
    Co-Authors: Andrew G West, Sampath Kannan
    Abstract:

    The bulk of Wikipedia anti-Vandalism tools require natural language processing over the article or diff text. However, our prior work demonstrated the feasibility of using spatio-temporal properties to locate malicious edits. STiki is a real-time, on-Wikipedia tool leveraging this technique. The associated poster reviews STiki's methodology and performance. We find competing anti-Vandalism tools inhibit maximal performance. However, the tool proves particularly adept at mitigating long-term embedded Vandalism. Further, its robust and language-independent nature make it well-suited for use in less-patrolled Wiki installations.

  • stiki an anti Vandalism tool for wikipedia using spatio temporal analysis of revision metadata
    International Symposium on Wikis and Open Collaboration, 2010
    Co-Authors: Andrew G West, Sampath Kannan
    Abstract:

    STiki is an anti-Vandalism tool for Wikipedia. Unlike similar tools, STiki does not rely on natural language processing (NLP) over the article or diff text to locate Vandalism. Instead, STiki leverages spatio-temporal properties of revision metadata. The feasibility of utilizing such properties was demonstrated in our prior work, which found they perform comparably to NLP-efforts while being more efficient, robust to evasion, and language independent. STiki is a real-time, on-Wikipedia implementation based on these properties. It consists of, (1) a server-side processing engine that examines revisions, scoring the likelihood each is Vandalism, and, (2) a client-side GUI that presents likely Vandalism to end-users for definitive classification (and if necessary, reversion on Wikipedia). Our demonstration will provide an introduction to spatio-temporal properties, demonstrate the STiki software, and discuss alternative research uses for the open-source code.

  • detecting wikipedia Vandalism via spatio temporal analysis of revision metadata
    European Workshop on System Security, 2010
    Co-Authors: Andrew G West, Sampath Kannan
    Abstract:

    Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of Vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect Vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with non-offending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags Vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of Vandalism outside our labeled set.

Gregor Engels - One of the best experts on this subject based on the ideXlab platform.

  • Wikidata Vandalism Corpus 2015 (WDVC-15)
    2020
    Co-Authors: Benno Stein, Martin Potthast, Stefan Heindorf, Gregor Engels
    Abstract:

    The Wikidata Vandalism corpus 2015 (WDVC-15) is a corpus for the evaluation of automatic Vandalism detectors for Wikidata. For research purposes the corpus can be used free of charge.

  • debiasing Vandalism detection models at wikidata
    The Web Conference, 2019
    Co-Authors: Stefan Heindorf, Gregor Engels, Yan Scholten, Martin Potthast
    Abstract:

    Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and Vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher Vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new Vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art Vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

  • WWW - Debiasing Vandalism Detection Models at Wikidata
    The World Wide Web Conference on - WWW '19, 2019
    Co-Authors: Stefan Heindorf, Gregor Engels, Yan Scholten, Martin Potthast
    Abstract:

    Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and Vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher Vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new Vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art Vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

  • overview of the wikidata Vandalism detection task at wsdm cup 2017
    arXiv: Information Retrieval, 2017
    Co-Authors: Stefan Heindorf, Martin Potthast, Gregor Engels, Benno Stein
    Abstract:

    We report on the Wikidata Vandalism detection task at the WSDM Cup 2017. The task received five submissions for which this paper describes their evaluation and a comparison to state of the art baselines. Unlike previous work, we recast Wikidata Vandalism detection as an online learning problem, requiring participant software to predict Vandalism in near real-time. The best-performing approach achieves a ROC-AUC of 0.947 at a PR-AUC of 0.458. In particular, this task was organized as a software submission task: to maximize reproducibility as well as to foster future research and development on this task, the participants were asked to submit their working software to the TIRA experimentation platform along with the source code for open source release.

  • towards Vandalism detection in knowledge bases corpus construction and analysis
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015
    Co-Authors: Stefan Heindorf, Martin Potthast, Benno Stein, Gregor Engels
    Abstract:

    We report on the construction of the Wikidata Vandalism Corpus WDVC-2015, the first corpus for Vandalism in knowledge bases. Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia. Among Wikidata's 24 million manual revisions, we have identified more than 100,000 cases of Vandalism. An in-depth corpus analysis lays the groundwork for research and development on automatic Vandalism detection in public knowledge bases. Our analysis shows that 58% of the Vandalism revisions can be found in the textual portions of Wikidata, and the remainder in structural content, e.g., subject-predicate-object triples. Moreover, we find that some vandals also target Wikidata content whose manipulation may impact content displayed on Wikipedia, revealing potential vulnerabilities. Given today's importance of knowledge bases for information systems, this shows that public knowledge bases must be used with caution.