Disambiguation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 32772 Experts worldwide ranked by ideXlab platform

Ted Pedersen - One of the best experts on this subject based on the ideXlab platform.

  • maximizing semantic relatedness to perform word sense Disambiguation
    2005
    Co-Authors: Ted Pedersen, Satanjeev Banerjee, Siddharth Patwardhan
    Abstract:

    This article presents a method of word sense Disambiguation that assigns a target word the sense that is most related to the senses of its neighboring words. We explore the use of measures of similarity and relatedness that are based on finding paths in a concept network, information content derived from a large corpus, and word sense glosses. We observe that measures of relatedness are useful sources of information for Disambiguation, and in particular we find that two gloss based measures that we have developed are particularly flexible and effective measures for word sense Disambiguation.

  • an adapted lesk algorithm for word sense Disambiguation using wordnet
    International Conference on Computational Linguistics, 2002
    Co-Authors: Satanjeev Banerjee, Ted Pedersen
    Abstract:

    This paper presents an adaptation of Lesk's dictionary-based word sense Disambiguation algorithm. Rather than using a standard dictionary as the source of glosses for our approach, the lexical database WordNet is employed. This provides a rich hierarchy of semantic relations that our algorithm can exploit. This method is evaluated using the English lexical sample data from the SENSEVAL-2 word sense Disambiguation exercise, and attains an overall accuracy of 32%. This represents a significant improvement over the 16% and 23% accuracy attained by variations of the Lesk algorithm used as benchmarks during the Senseval-2 comparative exercise among word sense Disambiguation systems.

  • a decision tree of bigrams is an accurate predictor of word sense
    arXiv: Computation and Language, 2001
    Co-Authors: Ted Pedersen
    Abstract:

    This paper presents a corpus-based approach to word sense Disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This approach is evaluated using the sense-tagged corpora from the 1998 SENSEVAL word sense Disambiguation exercise. It is more accurate than the average results reported for 30 of 36 words, and is more accurate than the best results for 19 of 36 words.

  • lexical semantic ambiguity resolution with bigram based decision trees
    International Conference on Computational Linguistics, 2001
    Co-Authors: Ted Pedersen
    Abstract:

    This paper presents a corpus-based approach to word sense Disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This approach is evaluated using the sense-tagged corpora from the 1998 SENSEVAL word sense Disambiguation exercise. It is more accurate than the average results reported for 30 of 36 words, and is more accurate than the best results for 19 of 36 words.

Fabio Pammolli - One of the best experts on this subject based on the ideXlab platform.

  • Disambiguation of patent inventors and assignees using high resolution geolocation data
    arXiv: Digital Libraries, 2015
    Co-Authors: Massimo Riccaboni, Fabio Pammolli, Greg Morrison
    Abstract:

    Patent data represent a significant source of information on innovation and the evolution of technology through networks of citations, co-invention and co-assignment of new patents. A major obstacle to extracting useful information from this data is the problem of name Disambiguation: linking alternate spellings of individuals or institutions to a single identifier to uniquely determine the parties involved in the creation of a technology. In this paper, we describe a new algorithm that uses high-resolution geolocation to disambiguate both inventor and assignees on more than 3.6 million patents found in the European Patent Office (EPO), under the Patent Cooperation treaty (PCT), and in the US Patent and Trademark Office (USPTO). We show that our algorithm has both high precision and recall in comparison to a manual Disambiguation of EPO assignee names in Boston and Paris, and show it performs well for a benchmark of USPTO inventor names that can be linked to a high-resolution address (but poorly for inventors that never provided a high quality address). The most significant benefit of this work is the high quality assignee Disambiguation with worldwide coverage coupled with an inventor Disambiguation that is competitive with other state of the art approaches. To our knowledge this is the broadest and most accurate simultaneous Disambiguation and cross-linking of the inventor and assignee names for a significant fraction of patents in these three major patent collections.

  • Disambiguation of patent inventors and assignees using high resolution geolocation data
    arXiv: Digital Libraries, 2015
    Co-Authors: Massimo Riccaboni, Fabio Pammolli, Greg Morrison
    Abstract:

    Patent data represent a significant source of information on innovation and the evolution of technology through networks of citations, co-invention and co-assignment of new patents. A major obstacle to extracting useful information from this data is the problem of name Disambiguation: linking alternate spellings of individuals or institutions to a single identifier to uniquely determine the parties involved in the creation of a technology. In this paper, we describe a new algorithm that uses high-resolution geolocation to disambiguate both inventor and assignees on more than 3.6 million patents found in the European Patent Office (EPO), under the Patent Cooperation treaty (PCT), and in the US Patent and Trademark Office (USPTO). We show that our algorithm has both high precision and recall in comparison to a manual Disambiguation of EPO assignee names in Boston and Paris, and show it performs well for a benchmark of USPTO inventor names that can be linked to a high-resolution address (but poorly for inventors that never provided a high quality address). The most significant benefit of this work is the high quality assignee Disambiguation with worldwide coverage coupled with an inventor Disambiguation that is competitive with other state of the art approaches. To our knowledge this is the broadest and most accurate simultaneous Disambiguation and cross-linking of the inventor and assignee names for a significant fraction of patents in these three major patent collections.

Rada Mihalcea - One of the best experts on this subject based on the ideXlab platform.

  • subjectivity word sense Disambiguation
    Empirical Methods in Natural Language Processing, 2009
    Co-Authors: Cem Akkaya, Janyce Wiebe, Rada Mihalcea
    Abstract:

    This paper investigates a new task, subjectivity word sense Disambiguation (SWSD), which is to automatically determine which word instances in a corpus are being used with subjective senses, and which are being used with objective senses. We provide empirical evidence that SWSD is more feasible than full word sense Disambiguation, and that it can be exploited to improve the performance of contextual subjectivity and sentiment analysis systems.

  • unsupervised graph basedword sense Disambiguation using measures of word semantic similarity
    International Conference on Semantic Computing, 2007
    Co-Authors: R Sinha, Rada Mihalcea
    Abstract:

    This paper describes an unsupervised graph-based method for word sense Disambiguation, and presents comparative evaluations using several measures of word semantic similarity and several algorithms for graph centrality. The results indicate that the right combination of similarity metrics and graph centrality algorithms can lead to a performance competing with the state-of-the-art in unsupervised word sense Disambiguation, as measured on standard data sets.

  • pagerank on semantic networks with application to word sense Disambiguation
    International Conference on Computational Linguistics, 2004
    Co-Authors: Rada Mihalcea, Paul Tarau, Elizabeth Figa
    Abstract:

    This paper presents a new open text word sense Disambiguation method that combines the use of logical inferences with PageRank-style algorithms applied on graphs extracted from natural language documents. We evaluate the accuracy of the proposed algorithm on several sense-annotated texts, and show that it consistently outperforms the accuracy of other previously proposed knowledge-based word sense Disambiguation methods. We also explore and evaluate methods that combine several open-text word sense Disambiguation algorithms.

P. Velardi - One of the best experts on this subject based on the ideXlab platform.

  • Structural semantic interconnections: a knowledge-based approach to word sense Disambiguation
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005
    Co-Authors: R. Navigli, P. Velardi
    Abstract:

    Word sense Disambiguation (WSD) is traditionally considered an AI-hard problem. A break-through in this field would have a significant impact on many relevant Web-based applications, such as Web information retrieval, improved access to Web services, information extraction, etc. Early approaches to WSD, based on knowledge representation techniques, have been replaced in the past few years by more robust machine learning and statistical techniques. The results of recent comparative evaluations of WSD systems, however, show that these methods have inherent limitations. On the other hand, the increasing availability of large-scale, rich lexical knowledge resources seems to provide new challenges to knowledge-based approaches. In this paper, we present a method, called structural semantic interconnections (SSI), which creates structural specifications of the possible senses for each word in a context and selects the best hypothesis according to a grammar G, describing relations between sense specifications. Sense specifications are created from several available lexical resources that we integrated in part manually, in part with the help of automatic procedures. The SSI algorithm has been applied to different semantic Disambiguation problems, like automatic ontology population, Disambiguation of sentences in generic texts, Disambiguation of words in glossary definitions. Evaluation experiments have been performed on specific knowledge domains (e.g., tourism, computer networks, enterprise interoperability), as well as on standard Disambiguation test sets.

Jinseok Kim - One of the best experts on this subject based on the ideXlab platform.

  • Evaluating author name Disambiguation for digital libraries: a case of DBLP
    Scientometrics, 2018
    Co-Authors: Jinseok Kim
    Abstract:

    Author name ambiguity in a digital library may affect the findings of research that mines authorship data of the library. This study evaluates author name Disambiguation in DBLP, a widely used but insufficiently evaluated digital library for its Disambiguation performance. In doing so, this study takes a triangulation approach that author name Disambiguation for a digital library can be better evaluated when its performance is assessed on multiple labeled datasets with comparison to baselines. Tested on three types of labeled data containing 5000 to 6 M disambiguated names, DBLP is shown to assign author names quite accurately to distinct authors, resulting in pairwise precision, recall, and F1 measures around 0.90 or above overall. DBLP’s author name Disambiguation performs well even on large ambiguous name blocks but deficiently on distinguishing authors with the same names. Compared to other Disambiguation algorithms, DBLP’s Disambiguation performance is quite competitive, possibly due to its hybrid Disambiguation approach combining algorithmic Disambiguation and manual error correction. A discussion follows on strengths and weaknesses of labeled datasets used in this study for future efforts to evaluate author name Disambiguation on a digital library scale.

  • the impact of imbalanced training data on machine learning for author name Disambiguation
    arXiv: Information Retrieval, 2018
    Co-Authors: Jinseok Kim, Jenna Kim
    Abstract:

    In supervised machine learning for author name Disambiguation, negative training data are often dominantly larger than positive training data. This paper examines how the ratios of negative to positive training data can affect the performance of machine learning algorithms to disambiguate author names in bibliographic records. On multiple labeled datasets, three classifiers - Logistic Regression, Naive Bayes, and Random Forest - are trained through representative features such as coauthor names, and title words extracted from the same training data but with various positive-negative training data ratios. Results show that increasing negative training data can improve Disambiguation performance but with a few percent of performance gains and sometimes degrade it. Logistic Regression and Naive Bayes learn optimal Disambiguation models even with a base ratio (1:1) of positive and negative training data. Also, the performance improvement by Random Forest tends to quickly saturate roughly after 1:10 ~ 1:15. These findings imply that contrary to the common practice using all training data, name Disambiguation algorithms can be trained using part of negative training data without degrading much Disambiguation performance while increasing computational efficiency. This study calls for more attention from author name Disambiguation scholars to methods for machine learning from imbalanced data.

  • evaluating author name Disambiguation for digital libraries a case of dblp
    arXiv: Digital Libraries, 2018
    Co-Authors: Jinseok Kim
    Abstract:

    Author name ambiguity in a digital library may affect the findings of research that mines authorship data of the library. This study evaluates author name Disambiguation in DBLP, a widely used but insufficiently evaluated digital library for its Disambiguation performance. In doing so, this study takes a triangulation approach that author name Disambiguation for a digital library can be better evaluated when its performance is assessed on multiple labeled datasets with comparison to baselines. Tested on three types of labeled data containing 5,000 ~ 700K disambiguated names and 6M pairs of disambiguated names, DBLP is shown to assign author names quite accurately to distinct authors, resulting in pairwise precision, recall, and F1 measures around 0.90 or above overall. DBLP's author name Disambiguation performs well even on large ambiguous name blocks but deficiently on distinguishing authors with the same names. When compared to other Disambiguation algorithms, DBLP's Disambiguation performance is quite competitive, possibly due to its hybrid Disambiguation approach combining algorithmic Disambiguation and manual error correction. A discussion follows on strengths and weaknesses of labeled datasets used in this study for future efforts to evaluate author name Disambiguation on a digital library scale.