Semantic Similarity

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 41784 Experts worldwide ranked by ideXlab platform

Carlos A Iglesias - One of the best experts on this subject based on the ideXlab platform.

  • Sematch: Semantic Similarity framework for Knowledge Graphs
    Knowledge-Based Systems, 2017
    Co-Authors: Ganggao Zhu, Carlos A Iglesias
    Abstract:

    Sematch is an integrated framework for the development, evaluation and application of Semantic Similarity for Knowledge Graphs. The framework provides a number of Similarity tools and datasets, and allows users to compute Semantic Similarity scores of concepts, words, and entities, as well as to interact with Knowledge Graphs through SPARQL queries. Sematch focuses on knowledge-based Semantic Similarity that relies on structural knowledge in a given taxonomy (e.g. depth, path length, least common subsumer), and statistical information contents. Researchers can use Sematch to develop and evaluate Semantic Similarity metrics and exploit these metrics in applications.

  • Computing Semantic Similarity of Concepts in Knowledge Graphs
    IEEE Transactions on Knowledge and Data Engineering, 2017
    Co-Authors: Ganggao Zhu, Carlos A Iglesias
    Abstract:

    This paper presents a method for measuring the Semantic Similarity between concepts in Knowledge Graphs (KGs) such as WordNet and DBpedia. Previous work on Semantic Similarity methods have focused on either the structure of the Semantic network between concepts (e.g., path length and depth), or only on the Information Content (IC) of concepts. We propose a Semantic Similarity method, namely wpath, to combine these two approaches, using IC to weight the shortest path length between concepts. Conventional corpus-based IC is computed from the distributions of concepts over textual corpus, which is required to prepare a domain corpus containing annotated concepts and has high computational cost. As instances are already extracted from textual corpus and annotated by concepts in KGs, graph-based IC is proposed to compute IC based on the distributions of concepts over instances. Through experiments performed on well known word Similarity datasets, we show that the wpath Semantic Similarity method has produced a statistically significant improvement over other Semantic Similarity methods. Moreover, in a real category classification evaluation, the wpath method has shown the best performance in terms of accuracy and F score.

Ganggao Zhu - One of the best experts on this subject based on the ideXlab platform.

  • Sematch: Semantic Similarity framework for Knowledge Graphs
    Knowledge-Based Systems, 2017
    Co-Authors: Ganggao Zhu, Carlos A Iglesias
    Abstract:

    Sematch is an integrated framework for the development, evaluation and application of Semantic Similarity for Knowledge Graphs. The framework provides a number of Similarity tools and datasets, and allows users to compute Semantic Similarity scores of concepts, words, and entities, as well as to interact with Knowledge Graphs through SPARQL queries. Sematch focuses on knowledge-based Semantic Similarity that relies on structural knowledge in a given taxonomy (e.g. depth, path length, least common subsumer), and statistical information contents. Researchers can use Sematch to develop and evaluate Semantic Similarity metrics and exploit these metrics in applications.

  • Computing Semantic Similarity of Concepts in Knowledge Graphs
    IEEE Transactions on Knowledge and Data Engineering, 2017
    Co-Authors: Ganggao Zhu, Carlos A Iglesias
    Abstract:

    This paper presents a method for measuring the Semantic Similarity between concepts in Knowledge Graphs (KGs) such as WordNet and DBpedia. Previous work on Semantic Similarity methods have focused on either the structure of the Semantic network between concepts (e.g., path length and depth), or only on the Information Content (IC) of concepts. We propose a Semantic Similarity method, namely wpath, to combine these two approaches, using IC to weight the shortest path length between concepts. Conventional corpus-based IC is computed from the distributions of concepts over textual corpus, which is required to prepare a domain corpus containing annotated concepts and has high computational cost. As instances are already extracted from textual corpus and annotated by concepts in KGs, graph-based IC is proposed to compute IC based on the distributions of concepts over instances. Through experiments performed on well known word Similarity datasets, we show that the wpath Semantic Similarity method has produced a statistically significant improvement over other Semantic Similarity methods. Moreover, in a real category classification evaluation, the wpath method has shown the best performance in terms of accuracy and F score.

Vasile Rus - One of the best experts on this subject based on the ideXlab platform.

  • FLAIRS Conference - Opportunities and Challenges in Semantic Similarity
    2014
    Co-Authors: Vasile Rus
    Abstract:

    Semantic Similarity has been increasingly adopted in the recent past as a viable, scalable alternative to the full-understanding approach to natural language understanding. We present here an overview of opportunities and challenges in Semantic Similarity research with an emphasis on methods, data, and tools. A series of methods we developed over the past decade will be summarized. These methods and others have been integrated in a Semantic Similarity toolkit called SEMILAR ( www.SemanticSimilarity.org ), which has been widely adopted by thousands of users sinces its launch in summer of 2013 at the Annual Meeting of the Association of Computational Linguistics. Furthermore, we illustrate some drawbacks of current data sets that hamper a fair comparison of existing methods. Several suggestions will be made to improve the building of future data sets for assessing the performances of approaches to Semantic Similarity.

  • ACL (Conference System Demonstrations) - SEMILAR: The Semantic Similarity Toolkit
    2013
    Co-Authors: Vasile Rus, Mihai Lintean, Rajendra Banjade, Nobal Niraula, Dan Stefanescu
    Abstract:

    We present in this paper SEMILAR, the Semantic Similarity toolkit. SEMILAR implements a number of algorithms for assessing the Semantic Similarity between two texts. It is available as a Java library and as a Java standalone ap-plication offering GUI-based access to the implemented Semantic Similarity methods. Furthermore, it offers facilities for manual se-mantic Similarity annotation by experts through its component SEMILAT (a Semantic Similarity Annotation Tool).

  • SEMILAR : The Semantic Similarity Toolkit
    Association for Computational Linguistics 2013, 2013
    Co-Authors: Vasile Rus, Mihai Lintean, Rajendra Banjade, Nobal Niraula, Dan Stefanescu
    Abstract:

    We present in this paper SEMILAR, the SE- Mantic Similarity toolkit. SEMILAR im- plements a number of algorithms for assessing the Semantic Similarity between two texts. It is available as a Java library and as a Java standalone ap-plication offering GUI-based access to the implemented Semantic Similarity methods. Furthermore, it offers facilities for manual se-mantic Similarity annotation by ex- perts through its component SEMILAT (a Semantic Similarity Annotation Tool).

  • Measuring Semantic Similarity: representations and methods
    2011
    Co-Authors: Vasile Rus, Mihai Lintean
    Abstract:

    This dissertation investigates and proposes ways to quantify and measure Semantic Similarity between texts. The general approach is to rely on linguistic information at various levels, including lexical, lexico-Semantic, and syntactic. The approach starts by mapping texts onto structured representations that include lexical, lexico-Semantic, and syntactic information. The representation is then used as input to methods designed to measure the Semantic Similarity between texts based on the available linguistic information. While world knowledge is needed to properly assess Semantic Similarity of texts, in our approach world knowledge is not used, which is a weakness of it. We limit ourselves to answering the question of how successfully one can measure the Semantic Similarity of texts using just linguistic information. The lexical information in the original texts is retained by using the words in the corresponding representations of the texts. Syntactic information is encoded using dependency relations trees, which represent explicitly the syntactic relations between words. Word-level Semantic information is relatively encoded through the use of Semantic Similarity measures like WordNet Similarity or explicitly encoded using vectorial representations such as Latent Semantic Analysis (LSA). Several methods are being studied to compare the representations, ranging from simple lexical overlap, to more complex methods such as comparing Semantic representations in vector spaces as well as syntactic structures. Furthermore, a few powerful kernel models are proposed to use in combination with Support Vector Machine (SVM) classifiers for the case in which the Semantic Similarity problem is modeled as a classification task.

Dan Stefanescu - One of the best experts on this subject based on the ideXlab platform.

  • ACL (Conference System Demonstrations) - SEMILAR: The Semantic Similarity Toolkit
    2013
    Co-Authors: Vasile Rus, Mihai Lintean, Rajendra Banjade, Nobal Niraula, Dan Stefanescu
    Abstract:

    We present in this paper SEMILAR, the Semantic Similarity toolkit. SEMILAR implements a number of algorithms for assessing the Semantic Similarity between two texts. It is available as a Java library and as a Java standalone ap-plication offering GUI-based access to the implemented Semantic Similarity methods. Furthermore, it offers facilities for manual se-mantic Similarity annotation by experts through its component SEMILAT (a Semantic Similarity Annotation Tool).

  • SEMILAR : The Semantic Similarity Toolkit
    Association for Computational Linguistics 2013, 2013
    Co-Authors: Vasile Rus, Mihai Lintean, Rajendra Banjade, Nobal Niraula, Dan Stefanescu
    Abstract:

    We present in this paper SEMILAR, the SE- Mantic Similarity toolkit. SEMILAR im- plements a number of algorithms for assessing the Semantic Similarity between two texts. It is available as a Java library and as a Java standalone ap-plication offering GUI-based access to the implemented Semantic Similarity methods. Furthermore, it offers facilities for manual se-mantic Similarity annotation by ex- perts through its component SEMILAT (a Semantic Similarity Annotation Tool).

Francisco M Couto - One of the best experts on this subject based on the ideXlab platform.

  • Semantic Similarity Definition
    Encyclopedia of Bioinformatics and Computational Biology, 2019
    Co-Authors: Francisco M Couto, Andre Lamurias
    Abstract:

    In bioinformatics, Semantic Similarity has been used to compare different types of biomedical entities, such as proteins, compounds and phenotypes, based on their biological role instead on what they look like. This manuscript presents a definition of Semantic Similarity between biomedical entities described by a common Semantic base (e.g., ontology) following an information-theoretic perspective of Semantic Similarity. It defines the amount of information content two entries share in a Semantic base, and, by extension, how to compare biomedical entities represented outside the Semantic base but linked through a set of annotations. Software to check how Semantic Similarity works in practice is available at: https://github.com/lasigeBioTM/DiShIn/ .

  • Semantic Similarity in biomedical ontologies
    PLoS Computational Biology, 2009
    Co-Authors: Catia Pesquita, Andre O. Falcao, P Lord, Daniel Faria, Francisco M Couto
    Abstract:

    In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called Semantic Similarity, since it assesses the degree of relatedness between two entities by the Similarity in meaning of their annotations. The application of Semantic Similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic Similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review Semantic Similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of Semantic Similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from Semantic Similarity measures and help them choose the approach most suitable for their studies.Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that Semantic Similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence Similarity is today in biomedical research.

  • CESSM: collaborative evaluation of Semantic Similarity measures
    2009
    Co-Authors: Catia Pesquita, Daniel Faria, Delphine Pessoa, Francisco M Couto
    Abstract:

    The application of Semantic Similarity measures to proteins annotated with Gene Ontology terms has become a common method in bioinformatics. However, the evaluation of these measures is still challenging, since no common standard of evaluation exists. We present an online tool for the automated evaluation of GO-based Semantic Similarity measures, CESSM, that enables the comparison of new measures against previously published ones considering their relation to sequence, Pfam and EC Similarity. The tool also has a collaborative component, by which the authors of published measures can contribute to the enrichment of the evaluation by providing their own results. CESSM is freely available at http://xldb.di.fc.ul.pt/tools/cessm/ BACKGROUND The creation of the Gene Ontology (GO) [1], a controlled vocabulary for the description of gene product functions, triggered the development of computational methods that take advantage of its structured information. One such method is the application of Semantic Similarity measures to GO terms, whereby the Similarity between two terms is calculated according to their relationship in the ontology. Likewise, Semantic Similarity measures can also be used to calculate the Similarity between gene products, provided they are annotated with GO terms. Several Semantic Similarity measures based on GO have been proposed in recent years [2-12], but the evaluation of their performance has been identified as a relevant problem in the field [13]. Various evaluation strategies have been proposed, including the investigation of the relation between the Semantic Similarity measure and other gene product or protein similarities (such as sequence[2-7], family [12,7] or expression Similarity [8,14,15]); and of the feasibility to use Semantic Similarity measures in such distinct scenarios as the prediction of subnuclear location [16], the ability to characterize human regulatory pathways [17], or the performance in gene clustering [9,10]. This multiplicity of evaluation strategies arises from the lack of a gold standard suitable to this scenario, driving researchers to use diverse data sets, to which they apply distinct evaluation strategies, thus rendering comparison among different works unfeasible. We present an online tool CESSM (Collaborative Evaluation of Semantic Similarity Measures) for the collaborative and automated evaluation of Semantic Similarity measures in the context of GO. CESSM allows researchers to compare the performance of their novel Semantic Similarity measures against several existing ones, using the same protein and annotation dataset and according to three distinct aspects: relation with sequence, EC class and Pfam family similarities.

  • metrics for go based protein Semantic Similarity a systematic evaluation
    BMC Bioinformatics, 2008
    Co-Authors: Catia Pesquita, Andre O. Falcao, Daniel Faria, Hugo P Bastos, Antonio E N Ferreira, Francisco M Couto
    Abstract:

    Several Semantic Similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to Semantic Similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in Semantic Similarity calculations. We conducted a systematic evaluation of GO-based Semantic Similarity measures using the relationship with sequence Similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between Semantic and sequence Similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the Semantic Similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation. This work has provided a basis for the comparison of several Semantic Similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid simGIC was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence Similarity.