Structured Document

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 71064 Experts worldwide ranked by ideXlab platform

Younes Bennani - One of the best experts on this subject based on the ideXlab platform.

  • semi Structured Document categorization with a semantic kernel
    Pattern Recognition, 2009
    Co-Authors: Sujeevan Aseervatham, Younes Bennani
    Abstract:

    Since a decade, text categorization has become an active field of research in the machine learning community. Most of the approaches are based on the term occurrence frequency. The performance of such surface-based methods can decrease when the texts are too complex, i.e., ambiguous. One alternative is to use the semantic-based approaches to process textual Documents according to their meaning. Furthermore, research in text categorization has mainly focused on ''flat texts'' whereas many Documents are now semi-Structured and especially under the XML format. In this paper, we propose a semantic kernel for semi-Structured biomedical Documents. The semantic meanings of words are extracted using the unified medical language system (UMLS) framework. The kernel, with a SVM classifier, has been applied to a text categorization task on a medical corpus of free text Documents. The results have shown that the semantic kernel outperforms the linear kernel and the naive Bayes classifier. Moreover, this kernel was ranked in the top 10 of the best algorithms among 44 classification methods at the 2007 Computational Medicine Center (CMC) Medical NLP International Challenge.

Olivier De Vel - One of the best experts on this subject based on the ideXlab platform.

  • learning semi Structured Document categorization using bounded length spectrum sub sequence kernels
    Data Mining and Knowledge Discovery, 2006
    Co-Authors: Olivier De Vel
    Abstract:

    In this paper we report an investigation into the learning of semi-Structured Document categorization. We automatically discover low-level, short-range byte data structure patterns from a Document data stream by extracting all byte sub-sequences within a sliding window to form an augmented (or bounded-length) string spectrum feature map and using a modified suffix trie data structure (called the coloured generalized suffix tree or CGST) to efficiently store and manipulate the feature map. Using the CGST we are able to efficiently compute the stream's bounded-length sequence spectrum kernel. We compare the performance of two classifier algorithms to categorize the data streams, namely, the SVM and Naive Bayes (NB) classifiers. Experiments have provided good classification performance results on a variety of Document byte streams, particularly when using the NB classifier under certain parameter settings. Results indicate that the bounded-length kernel is superior to the standard fixed-length kernel for semi-Structured Documents.

Mounia Lalmas - One of the best experts on this subject based on the ideXlab platform.

  • structural relevance a common basis for the evaluation of Structured Document retrieval
    Conference on Information and Knowledge Management, 2008
    Co-Authors: M S Ali, Gabriella Kazai, Mariano P Consens, Mounia Lalmas
    Abstract:

    This paper presents a unified framework for the evaluation of a range of Structured Document retrieval (SDR) approaches and tasks. The framework is based on a model of tree retrieval, evaluated using a novel extension of the Structural elevance (SR) measure. The measure replaces the assumption of independence in traditional information retrieval (IR) with a notion of redundancy that takes into account the user navigation inside Documents while seeking relevant information. Unlike existing metrics for SDR, our proposed framework does not require the computation of an ideal ranking which has, thus far, prevented the practical application of such measures. Instead, SR builds on a Markovian model of user navigation that can be estimated through the use of structural summaries. The results of this paper (supported by experimental validation using INEX data) show that SR defined over a tree retrieval model can provide a common basis for the evaluation of SDR approaches across various Structured search tasks.

  • focused access to xml Documents 6th international workshop of the initiative for the evaluation of xml retrieval inex 2007 dagstuhl castle germany december 17 19 2007 selected papers
    Lecture Notes in Computer Science, 2008
    Co-Authors: Norbe Fuh, Mounia Lalmas, Jaap Kamps, Andrew Trotma
    Abstract:

    Ad Hoc Track.- Overview of the INEX 2007 Ad Hoc Track.- INEX 2007 Evaluation Measures.- XML Retrieval by Improving Structural Relevance Measures Obtained from Summary Models.- TopX @ INEX 2007.- The Garnata Information Retrieval System at INEX'07.- Dynamic Element Retrieval in the Wikipedia Collection.- The Simplest XML Retrieval Baseline That Could Possibly Work.- Using Language Models and Topic Models for XML Retrieval.- UJM at INEX 2007: Document Model Integrating XML Tags.- Phrase Detection in the Wikipedia.- Indian Statistical Institute at INEX 2007 Adhoc Track: VSM Approach.- A Fast Retrieval Algorithm for Large-Scale XML Data.- LIG at INEX 2007 Ad Hoc Track: Using Collectionlinks as Context.- Book Search Track.- Overview of the INEX 2007 Book Search Track (BookSearch'07).- Logistic Regression and EVIs for XML Books and the Heterogeneous Track.- CMIC at INEX 2007: Book Search Track.- XML-Mining Track.- Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach.- Probabilistic Methods for Structured Document Classification at INEX'07.- Efficient Clustering of Structured Documents Using Graph Self-Organizing Maps.- Document Clustering Using Incremental and Pairwise Approaches.- XML Document Classification Using Extended VSM.- Entity Ranking Track.- Overview of the INEX 2007 Entity Ranking Track.- L3S at INEX 2007: Query Expansion for Entity Ranking Using a Highly Accurate Ontology.- Entity Ranking Based on Category Expansion.- Entity Ranking from Annotated Text Collections Using Multitype Topic Models.- An n-Gram and Initial Description Based Approach for Entity Ranking Track.- Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah.- Using Wikipedia Categories and Links in Entity Ranking.- Integrating Document Features for Entity Ranking.- Interactive Track.- A Comparison of Interactive and Ad-Hoc Relevance Assessments.- Task Effects on Interactive Search: The Query Factor.- Link-the-Wiki Track.- Overview of INEX 2007 Link the Wiki Track.- Using and Detecting Links in Wikipedia.- GPX: Ad-Hoc Queries and Automated Link Discovery in the Wikipedia.- University of Waterloo at INEX2007: Adhoc and Link-the-Wiki Tracks.- Wikipedia Ad Hoc Passage Retrieval and Wikipedia Document Linking.- Multimedia Track.- The INEX 2007 Multimedia Track.

  • uniform representation of content and structure for Structured Document retrieval
    2001
    Co-Authors: Mounia Lalmas
    Abstract:

    Documents often display a hierarchical structure. For example, a SGML Document contains a title, several sections, which themselves contain paragraphs. In this paper, we develop a formal model to represent in a uniform manner Structured Documents by their content and structure. As a result, querying Structured Documents can be done with respect to their content, their structure, or both. The model is based on a possible worlds approach, modal operators and uncertainty distributions.

Sujeevan Aseervatham - One of the best experts on this subject based on the ideXlab platform.

  • semi Structured Document categorization with a semantic kernel
    Pattern Recognition, 2009
    Co-Authors: Sujeevan Aseervatham, Younes Bennani
    Abstract:

    Since a decade, text categorization has become an active field of research in the machine learning community. Most of the approaches are based on the term occurrence frequency. The performance of such surface-based methods can decrease when the texts are too complex, i.e., ambiguous. One alternative is to use the semantic-based approaches to process textual Documents according to their meaning. Furthermore, research in text categorization has mainly focused on ''flat texts'' whereas many Documents are now semi-Structured and especially under the XML format. In this paper, we propose a semantic kernel for semi-Structured biomedical Documents. The semantic meanings of words are extracted using the unified medical language system (UMLS) framework. The kernel, with a SVM classifier, has been applied to a text categorization task on a medical corpus of free text Documents. The results have shown that the semantic kernel outperforms the linear kernel and the naive Bayes classifier. Moreover, this kernel was ranked in the top 10 of the best algorithms among 44 classification methods at the 2007 Computational Medicine Center (CMC) Medical NLP International Challenge.

Axel Van Lamsweerde - One of the best experts on this subject based on the ideXlab platform.

  • goal oriented requirements enginering a roundtrip from research to practice
    IEEE International Conference on Requirements Engineering, 2004
    Co-Authors: Axel Van Lamsweerde
    Abstract:

    The software industry is more than ever facing the challenge of delivering WYGIWYW software (What You Get Is What You Want). A well-Structured Document specifying adequate, complete, consistent, precise, and measurable requirements is a critical prerequisite for such software. Goals have been recognized to be among the driving forces for requirements elicitation, elaboration, organization, analysis, negotiation, Documentation, and evolution. Growing experience with goal-oriented requirements engineering suggests synergistic links between research in this area and good practice. We discuss one journey along this road from influencing ideas and research results to tool developments to good practice in industrial projects. On the way, we discuss some lessons learnt, obstacles to technonogy transfer, and challenges for better requirements engineering research and practice.