Vandalism

The Experts below are selected from a list of 291 Experts worldwide ranked by ideXlab platform

Martin Potthast - One of the best experts on this subject based on the ideXlab platform.

Wikidata Vandalism Corpus 2015 (WDVC-15)

2020

Co-Authors: Benno Stein, Martin Potthast, Stefan Heindorf, Gregor Engels

Abstract:

The Wikidata Vandalism corpus 2015 (WDVC-15) is a corpus for the evaluation of automatic Vandalism detectors for Wikidata. For research purposes the corpus can be used free of charge.

15 days free trial to Access Article
debiasing Vandalism detection models at wikidata

The Web Conference, 2019

Co-Authors: Stefan Heindorf, Gregor Engels, Yan Scholten, Martin Potthast

Abstract:

Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and Vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher Vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new Vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art Vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

15 days free trial to Access Article
WWW - Debiasing Vandalism Detection Models at Wikidata

The World Wide Web Conference on - WWW '19, 2019

Co-Authors: Stefan Heindorf, Gregor Engels, Yan Scholten, Martin Potthast

Abstract:

Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and Vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher Vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new Vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art Vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

15 days free trial to Access Article
overview of the wikidata Vandalism detection task at wsdm cup 2017

arXiv: Information Retrieval, 2017

Co-Authors: Stefan Heindorf, Martin Potthast, Gregor Engels, Benno Stein

Abstract:

We report on the Wikidata Vandalism detection task at the WSDM Cup 2017. The task received five submissions for which this paper describes their evaluation and a comparison to state of the art baselines. Unlike previous work, we recast Wikidata Vandalism detection as an online learning problem, requiring participant software to predict Vandalism in near real-time. The best-performing approach achieves a ROC-AUC of 0.947 at a PR-AUC of 0.458. In particular, this task was organized as a software submission task: to maximize reproducibility as well as to foster future research and development on this task, the participants were asked to submit their working software to the TIRA experimentation platform along with the source code for open source release.

15 days free trial to Access Article
towards Vandalism detection in knowledge bases corpus construction and analysis

International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015

Co-Authors: Stefan Heindorf, Martin Potthast, Benno Stein, Gregor Engels

Abstract:

We report on the construction of the Wikidata Vandalism Corpus WDVC-2015, the first corpus for Vandalism in knowledge bases. Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia. Among Wikidata's 24 million manual revisions, we have identified more than 100,000 cases of Vandalism. An in-depth corpus analysis lays the groundwork for research and development on automatic Vandalism detection in public knowledge bases. Our analysis shows that 58% of the Vandalism revisions can be found in the textual portions of Wikidata, and the remainder in structural content, e.g., subject-predicate-object triples. Moreover, we find that some vandals also target Wikidata content whose manipulation may impact content displayed on Wikipedia, revealing potential vulnerabilities. Given today's importance of knowledge bases for information systems, this shows that public knowledge bases must be used with caution.

15 days free trial to Access Article

Kathleen R. Mckeown - One of the best experts on this subject based on the ideXlab platform.

got you automatic Vandalism detection in wikipedia with web based shallow syntactic semantic modeling

International Conference on Computational Linguistics, 2010

Co-Authors: William Yang Wang, Kathleen R. Mckeown

Abstract:

Discriminating Vandalism edits from non-Vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntactic-semantic modeling method, which utilizes Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect Vandalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, surpassing the results reported by major Wikipedia Vandalism detection systems.

15 days free trial to Access Article
" Got You! " : Automatic Vandalism Detection in Wikipedia with Web-based Shallow Syntactic-Semantic Modeling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics, 2010

Co-Authors: William Yang Wang, Kathleen R. Mckeown

Abstract:

Discriminating Vandalism edits from non-Vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntactic-semantic modeling method, which utiliz-es Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect van-dalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, sur-passing the results reported by major Wikipedia Vandalism detection systems.

15 days free trial to Access Article

William Yang Wang - One of the best experts on this subject based on the ideXlab platform.

got you automatic Vandalism detection in wikipedia with web based shallow syntactic semantic modeling

International Conference on Computational Linguistics, 2010

Co-Authors: William Yang Wang, Kathleen R. Mckeown

Abstract:

Discriminating Vandalism edits from non-Vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntactic-semantic modeling method, which utilizes Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect Vandalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, surpassing the results reported by major Wikipedia Vandalism detection systems.

15 days free trial to Access Article
" Got You! " : Automatic Vandalism Detection in Wikipedia with Web-based Shallow Syntactic-Semantic Modeling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics, 2010

Co-Authors: William Yang Wang, Kathleen R. Mckeown

Abstract:

Discriminating Vandalism edits from non-Vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntactic-semantic modeling method, which utiliz-es Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect van-dalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, sur-passing the results reported by major Wikipedia Vandalism detection systems.

15 days free trial to Access Article

Andrew G West - One of the best experts on this subject based on the ideXlab platform.

wikipedia Vandalism detection combining natural language metadata and reputation features

International Conference on Computational Linguistics, 2011

Co-Authors: Thomas B Adler, Luca De Alfaro, Santiago M Molavelasco, Paolo Rosso, Andrew G West

Abstract:

Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of Vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia Vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia Vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh Vandalism, and for the task of locating Vandalism in the complete set of Wikipedia revisions.

15 days free trial to Access Article
spatio temporal analysis of wikipedia metadata and the stiki anti Vandalism tool

International Symposium on Wikis and Open Collaboration, 2010

Co-Authors: Andrew G West, Sampath Kannan

Abstract:

The bulk of Wikipedia anti-Vandalism tools require natural language processing over the article or diff text. However, our prior work demonstrated the feasibility of using spatio-temporal properties to locate malicious edits. STiki is a real-time, on-Wikipedia tool leveraging this technique. The associated poster reviews STiki's methodology and performance. We find competing anti-Vandalism tools inhibit maximal performance. However, the tool proves particularly adept at mitigating long-term embedded Vandalism. Further, its robust and language-independent nature make it well-suited for use in less-patrolled Wiki installations.

15 days free trial to Access Article
stiki an anti Vandalism tool for wikipedia using spatio temporal analysis of revision metadata

International Symposium on Wikis and Open Collaboration, 2010

Co-Authors: Andrew G West, Sampath Kannan

Abstract:

STiki is an anti-Vandalism tool for Wikipedia. Unlike similar tools, STiki does not rely on natural language processing (NLP) over the article or diff text to locate Vandalism. Instead, STiki leverages spatio-temporal properties of revision metadata. The feasibility of utilizing such properties was demonstrated in our prior work, which found they perform comparably to NLP-efforts while being more efficient, robust to evasion, and language independent. STiki is a real-time, on-Wikipedia implementation based on these properties. It consists of, (1) a server-side processing engine that examines revisions, scoring the likelihood each is Vandalism, and, (2) a client-side GUI that presents likely Vandalism to end-users for definitive classification (and if necessary, reversion on Wikipedia). Our demonstration will provide an introduction to spatio-temporal properties, demonstrate the STiki software, and discuss alternative research uses for the open-source code.

15 days free trial to Access Article
detecting wikipedia Vandalism via spatio temporal analysis of revision metadata

European Workshop on System Security, 2010

Co-Authors: Andrew G West, Sampath Kannan

Abstract:

Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of Vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect Vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with non-offending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags Vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of Vandalism outside our labeled set.

15 days free trial to Access Article

Gregor Engels - One of the best experts on this subject based on the ideXlab platform.

Wikidata Vandalism Corpus 2015 (WDVC-15)

2020

Co-Authors: Benno Stein, Martin Potthast, Stefan Heindorf, Gregor Engels

Abstract:

The Wikidata Vandalism corpus 2015 (WDVC-15) is a corpus for the evaluation of automatic Vandalism detectors for Wikidata. For research purposes the corpus can be used free of charge.

15 days free trial to Access Article
debiasing Vandalism detection models at wikidata

The Web Conference, 2019

Co-Authors: Stefan Heindorf, Gregor Engels, Yan Scholten, Martin Potthast

Abstract:

Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and Vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher Vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new Vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art Vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

15 days free trial to Access Article
WWW - Debiasing Vandalism Detection Models at Wikidata

The World Wide Web Conference on - WWW '19, 2019

Co-Authors: Stefan Heindorf, Gregor Engels, Yan Scholten, Martin Potthast

Abstract:

Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and Vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher Vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new Vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art Vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

15 days free trial to Access Article
overview of the wikidata Vandalism detection task at wsdm cup 2017

arXiv: Information Retrieval, 2017

Co-Authors: Stefan Heindorf, Martin Potthast, Gregor Engels, Benno Stein

Abstract:

We report on the Wikidata Vandalism detection task at the WSDM Cup 2017. The task received five submissions for which this paper describes their evaluation and a comparison to state of the art baselines. Unlike previous work, we recast Wikidata Vandalism detection as an online learning problem, requiring participant software to predict Vandalism in near real-time. The best-performing approach achieves a ROC-AUC of 0.947 at a PR-AUC of 0.458. In particular, this task was organized as a software submission task: to maximize reproducibility as well as to foster future research and development on this task, the participants were asked to submit their working software to the TIRA experimentation platform along with the source code for open source release.

15 days free trial to Access Article
towards Vandalism detection in knowledge bases corpus construction and analysis

International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015

Co-Authors: Stefan Heindorf, Martin Potthast, Benno Stein, Gregor Engels

Abstract:

We report on the construction of the Wikidata Vandalism Corpus WDVC-2015, the first corpus for Vandalism in knowledge bases. Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia. Among Wikidata's 24 million manual revisions, we have identified more than 100,000 cases of Vandalism. An in-depth corpus analysis lays the groundwork for research and development on automatic Vandalism detection in public knowledge bases. Our analysis shows that 58% of the Vandalism revisions can be found in the textual portions of Wikidata, and the remainder in structural content, e.g., subject-predicate-object triples. Moreover, we find that some vandals also target Wikidata content whose manipulation may impact content displayed on Wikipedia, revealing potential vulnerabilities. Given today's importance of knowledge bases for information systems, this shows that public knowledge bases must be used with caution.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Martin Potthast - One of the best experts on this subject based on the ideXlab platform.

Wikidata Vandalism Corpus 2015 (WDVC-15)

debiasing Vandalism detection models at wikidata

WWW - Debiasing Vandalism Detection Models at Wikidata

overview of the wikidata Vandalism detection task at wsdm cup 2017

towards Vandalism detection in knowledge bases corpus construction and analysis

Kathleen R. Mckeown - One of the best experts on this subject based on the ideXlab platform.

got you automatic Vandalism detection in wikipedia with web based shallow syntactic semantic modeling

" Got You! " : Automatic Vandalism Detection in Wikipedia with Web-based Shallow Syntactic-Semantic Modeling

William Yang Wang - One of the best experts on this subject based on the ideXlab platform.

got you automatic Vandalism detection in wikipedia with web based shallow syntactic semantic modeling

" Got You! " : Automatic Vandalism Detection in Wikipedia with Web-based Shallow Syntactic-Semantic Modeling

Andrew G West - One of the best experts on this subject based on the ideXlab platform.

wikipedia Vandalism detection combining natural language metadata and reputation features

spatio temporal analysis of wikipedia metadata and the stiki anti Vandalism tool

stiki an anti Vandalism tool for wikipedia using spatio temporal analysis of revision metadata

detecting wikipedia Vandalism via spatio temporal analysis of revision metadata

Gregor Engels - One of the best experts on this subject based on the ideXlab platform.

Wikidata Vandalism Corpus 2015 (WDVC-15)

debiasing Vandalism detection models at wikidata

WWW - Debiasing Vandalism Detection Models at Wikidata

overview of the wikidata Vandalism detection task at wsdm cup 2017

towards Vandalism detection in knowledge bases corpus construction and analysis

Vandalism

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Martin Potthast - One of the best experts on this subject based on the ideXlab platform.

Kathleen R. Mckeown - One of the best experts on this subject based on the ideXlab platform.

William Yang Wang - One of the best experts on this subject based on the ideXlab platform.

Andrew G West - One of the best experts on this subject based on the ideXlab platform.

Gregor Engels - One of the best experts on this subject based on the ideXlab platform.

Related terms