Vocabularies

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 207987 Experts worldwide ranked by ideXlab platform

Antoine Isaac - One of the best experts on this subject based on the ideXlab platform.

  • Recommendations for the Technical Infrastructure for Standardized International Rights Statements
    arXiv: Digital Libraries, 2015
    Co-Authors: Valentine Charles, Tom Johnson, Esmé Cowles, Karen Estlund, Mark A. Matienzo, Patrick Peiffer, Richard J. Urban, Antoine Isaac, Maarten Zeinstra
    Abstract:

    This white paper is the product of a joint Digital Public Library of America (DPLA)-Europeana working group organized to develop minimum rights statement metadata standards for organizations that contribute to DPLA and Europeana. This white paper deals specifically with the technical infrastructure of a common namespace (rightsstatements.org) that hosts the rights statements to be used by (at minimum) the DPLA and Europeana. These recommendations for a common technical infrastructure for rights statements outline a simple, flexible, and extensible framework to host the rights statements at rightsstatements.org. This white paper specifically outlines the management of rights statements as linked open data. The rights statements are published according to Best Practices for Publishing RDF Vocabularies. They are encoded into dereferenceable URIs, express further information encoded in RDF, and link to existing Vocabularies and standards. The rights statements adhere to expressions of existing rights Vocabularies. Furthermore the paper reviews the publication and implementation to make the rights statements available through human-readable web pages augmented with machine-readable formats.

  • finding quality issues in skos Vocabularies
    arXiv: Digital Libraries, 2012
    Co-Authors: Christian Mader, Bernhard Haslhofer, Antoine Isaac
    Abstract:

    The Simple Knowledge Organization System (SKOS) is a standard model for controlled Vocabularies on the Web. However, SKOS Vocabularies often differ in terms of quality, which reduces their applicability across system boundaries. Here we investigate how we can support taxonomists in improving SKOS Vocabularies by pointing out quality issues that go beyond the integrity constraints defined in the SKOS specification. We identified potential quantifiable quality issues and formalized them into computable quality checking functions that can find affected resources in a given SKOS vocabulary. We implemented these functions in the qSKOS quality assessment tool, analyzed 15 existing Vocabularies, and found possible quality issues in all of them.

  • Les référentiels : typologie et interopérabilité
    2012
    Co-Authors: Antoine Isaac
    Abstract:

    Tout d'abord, nous préciserons ce que la notion de référentiel peut recouvrir pour une approche linked data. A savoir, des artefacts de type: "metadata element sets" (ontologies formalisées à la OWL), "value Vocabularies" (thesauri, fichiers d'autorités...), autres "datasets" (données sur les "objets du monde"). (cf. http://www.w3.org/2005/Incubator/lld/XGR‐lld‐vocabdataset/) Pour chacune de ces catégories, nous discuterons les caractéristiques de référentiels typiques. Nous nous intéresserons à leur provenance et à la manière dont ils ont été conçus (top‐down vs. bottom‐up, choix de modélisation). Nous insisterons également sur ce que la technologie Linked Data (et en particulier le rôle crucial des URIs) permet de changer par rapport aux artefacts qui correspondent à ces catégories, dans des approches plus traditionnelles. Par exemple: pour les thesauri et autres "value Vocabularies", la transition de plus en plus visible d'une approche orientée termes (et noms) à une approche orientée concepts, voire "entités du monde réel", et, pour les schémas de données, les possibilités de réutilisation, voire d'"édition distribuée". Si l'interopérabilité au niveau des "datasets" de référence aura pu être abordée dans la matinée, on pourra dans l'après‐midi se concentrer sur l'interopérabilité au niveau des "value Vocabularies" et des "metadata element sets". Pour ces deux familles on étudiera ce à quoi la notion d'alignement peut faire référence, ainsi que les scenarios de réutilisation par extension et/ou contrainte, en particulier au travers du concept d'"application profile" pour les schémas de données.

  • a web based repository service for Vocabularies and alignments in the cultural heritage domain
    International Semantic Web Conference, 2010
    Co-Authors: Lourens Van Der Meij, Antoine Isaac, Claus Zinn
    Abstract:

    Controlled Vocabularies of various kinds (e.g., thesauri, classification schemes) play an integral part in making Cultural Heritage collections accessible. The various institutions participating in the Dutch CATCH programme maintain and make use of a rich and diverse set of Vocabularies. This makes it hard to provide a uniform point of access to all collections at once. Our SKOS-based vocabulary and alignment repository aims at providing technology for managing the various Vocabularies, and for exploiting semantic alignments across any two of them. The repository system exposes web services that effectively support the construction of tools for searching and browsing across Vocabularies and collections or for collection curation (indexing), as we demonstrate.

  • matching multi lingual subject Vocabularies
    European Conference on Research and Advanced Technology for Digital Libraries, 2009
    Co-Authors: Shenghui Wang, Antoine Isaac, Balthasar Schopman, Stefan Schlobach, Lourens Van Der Meij
    Abstract:

    Most libraries and other cultural heritage institutions use controlled knowledge organisation systems, such as thesauri, to describe their collections. Unfortunately, as most of these institutions use different such systems, unified access to heterogeneous collections is difficult. Things are even worse in an international context when concepts have labels in different languages. In order to overcome the multilingual interoperability problem between European Libraries, extensive work has been done to manually map concepts from different knowledge organisation systems, which is a tedious and expensive process. Within the TELplus project, we developed and evaluated methods to automatically discover these mappings, using different ontology matching techniques. In experiments on major French, English and German subject heading lists Rameau, LCSH and SWD, we show that we can automatically produce mappings of surprisingly good quality, even when using relatively naive translation and matching methods.

Olivier Bodenreider - One of the best experts on this subject based on the ideXlab platform.

Christian Mader - One of the best experts on this subject based on the ideXlab platform.

  • assessing and improving the quality of skos Vocabularies
    Journal on Data Semantics, 2014
    Co-Authors: Osma Suominen, Christian Mader
    Abstract:

    Controlled Vocabularies are increasingly made available on the Web of Data using the Simple Knowledge Organization System (SKOS) ontology. Assessment of vocabulary quality is important for determining the suitability of Vocabularies for reuse in applications and for improving vocabulary development processes. We define 26 quality issues, i.e., computable functions that expose potential quality problems. In an analysis of a representative set of 24 SKOS Vocabularies, we found all of them to contain structural errors and/or other quality problems. We propose a set of correction heuristics which we have used to automatically correct a significant proportion of the identified problems. Our reference implementations of these methods, the quality assessment tool qSKOS and the quality improvement tool Skosify, are available for reuse as open-source software.

  • finding quality issues in skos Vocabularies
    arXiv: Digital Libraries, 2012
    Co-Authors: Christian Mader, Bernhard Haslhofer, Antoine Isaac
    Abstract:

    The Simple Knowledge Organization System (SKOS) is a standard model for controlled Vocabularies on the Web. However, SKOS Vocabularies often differ in terms of quality, which reduces their applicability across system boundaries. Here we investigate how we can support taxonomists in improving SKOS Vocabularies by pointing out quality issues that go beyond the integrity constraints defined in the SKOS specification. We identified potential quantifiable quality issues and formalized them into computable quality checking functions that can find affected resources in a given SKOS vocabulary. We implemented these functions in the qSKOS quality assessment tool, analyzed 15 existing Vocabularies, and found possible quality issues in all of them.

Fleur Mougin - One of the best experts on this subject based on the ideXlab platform.

Kamel Smaïli - One of the best experts on this subject based on the ideXlab platform.

  • Adaptation of speech recognition Vocabularies for improved transcription of YouTube videos
    Journal of International Science and General Applications, 2018
    Co-Authors: Denis Jouvet, Mohamed Amine Menacer, Odile Mella, Dominique Fohr, David Langlois, Kamel Smaïli
    Abstract:

    This paper discusses the adaptation of speech recognition Vocabularies for automatic speech transcription. The context is the transcription of YouTube videos in French, English and Arabic. Base-line automatic speech recognition systems have been developed using previously available data. However, the available text data, including the GigaWord corpora from LDC, are getting quite old with respect to recent YouTube videos that are to be transcribed. After a discussion on the performance of the ASR baseline systems, the paper presents the collection of recent textual data from internet for updating the speech recognition Vocabularies and for training the language models, as well as the elaboration of development data sets necessary for the vocabulary selection process. The paper also compares the coverage of the training data collected from internet, and of the GigaWord data, with finite size Vocabularies made of the most frequent words. Finally, the paper presents and discusses the amount of out-of-vocabulary word occurrences, before and after the update of the speech recognition Vocabularies, for the three languages. Moreover, some speech recognition evaluation results are provided and analyzed.

  • About vocabulary adaptation for automatic speech recognition of video data
    2017
    Co-Authors: Denis Jouvet, Mohamed Amine Menacer, Odile Mella, Dominique Fohr, David Langlois, Kamel Smaïli
    Abstract:

    This paper discusses the adaptation of Vocabularies for automatic speech recognition. The context is the transcriptions of videos in French, English and Arabic. Baseline automatic speech recognition systems have been developed using available data. However, the available text data, including the GigaWord corpora from LDC, are getting quite old with respect to recent videos that are to be transcribed. The paper presents the collection of recent textual data from internet for updating the speech recognition Vocabularies and training the language models, as well as the elaboration of development data sets necessary for the vocabulary selection process. The paper also compares the coverage of the training data collected from internet, and of the GigaWord data, with finite size Vocabularies made of the most frequent words. Finally, the paper presents and discusses the amount of out-of-vocabulary word occurrences, before and after update of the Vocabularies, for the three languages.

  • TR-Classifier and kNN Evaluation for Topic Identification tasks
    International Journal on Information and Communication Technologies, 2010
    Co-Authors: Mourad Abbas, Kamel Smaïli, Daoud Berkani
    Abstract:

    This paper focuses on studying topic identification for Arabic language by using two methods. The first method is the well-known kNN (k Nearest Neighbors) which is used as baseline. The second one is the TR-Classifier, mainly based on computing triggers. The experiments show that TR-Classifier has the advantage to give best performances compared to kNN, by using much reduced sizes of Topic Vocabularies. TR-Classifier performance is enhanced by increasing jointly the number of triggers and the size of topic Vocabularies. It should be noted that topic Vocabularies are used by the TR-Classifier. Whereas, a general vocabulary is needed for kNN, and it is obtained by the concatenation of those used by the TR-Classifier. In addition to the standard measures Recall and Precision used for the evaluation step, we have drawn ROC curves for some topics to illustrate more clearly the difference in performance between the two classifiers. The corpus used in our experiments is downloaded from an online Arabic newspaper. Its size is about 10 millions words, distributed over six selected topics, in this case: culture, religion, economy, local news, international news and sports.

  • Experiment Analysis in Newspaper Topic Detection
    2000
    Co-Authors: Armelle Brun, Kamel Smaïli, Jean-paul Haton
    Abstract:

    This paper presents several methods for topic detection on newspaper articles, using either a general vocabulary or topic-specific Vocabularies. Specific Vocabularies are determined manually or statistically. In both cases, we aim at finding the most representative words of a topic. Several methods have been experimented, the first one is based on perplexity, this method achieves a 100% topic identification rate, on large test corpora, when the two first propositions are taken into account. Other methods are based on statistical counts and achieve 94% of identification on smaller test corpora. The most challenge of this work is to identify topics with only few words in order to be able, during speech recognition, to determine the best adequate language model.