Full Text Search

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 39537 Experts worldwide ranked by ideXlab platform

Jayavel Shanmugasundaram - One of the best experts on this subject based on the ideXlab platform.

  • A texquery-based xml Full-Text Search engine
    2015
    Co-Authors: Chavdar Botev, Sihem Amer-yahia, Jayavel Shanmugasundaram
    Abstract:

    We demonstrate an XML Full-Text Search engine that imple-ments the TeXQuery language. TeXQuery is a powerful Full-Text Search extension to XQuery that provides a rich set of Fully composable Full-Text primitives, such as phrase match-ing, proximity distance, stemming and thesauri. TeXQuery enables users to seamlessly query over both structure data and Text, by embedding Full-Text primitives in XQuery and vice versa. TeXQuery also supports a flexible scoring construct that scores query results based on Full-Text predicates and permits top-k queries. TeXQuery is the precursor of the Full-Text lan-guage extension to XPath 2.0 and XQuery 1.0 currently being developed by W3C. 1

  • expressiveness and performance of Full Text Search languages
    Extending Database Technology, 2006
    Co-Authors: Chavdar Botev, Sihem Ameryahia, Jayavel Shanmugasundaram
    Abstract:

    We study the expressiveness and performance of Full-Text Search languages. Our motivation is to provide a formal basis for comparing Full-Text Search languages and to develop a model for Full-Text Search that can be tightly integrated with structured Search. We design a model based on the positions of tokens (words) in the input Text, and develop a Full-Text calculus (FTC) and a Full-Text algebra (FTA) with equivalent expressive power; this suggests a notion of completeness for Full-Text Search languages. We show that existing Full-Text languages are incomplete and identify a practical subset of the FTC and FTA that is more powerful than existing languages, but which can still be evaluated efficiently.

  • xml Full Text Search challenges and opportunities
    Very Large Data Bases, 2005
    Co-Authors: Sihem Ameryahia, Jayavel Shanmugasundaram
    Abstract:

    An ever growing number of XML repositories are being made available for Search. A lot of activity has been deployed in the past few years to query such repositories. In particular, Full-Text querying of Text-rich XML documents has generated a wealth of issues that are being addressed by both the database (DB) and information retrieval (IR) communities. The DB community has traditionally focused on developing query languages and efficient evaluation algorithms for highly structured data. In contrast, the IR community has focused on Searching unstructured data, and has developed various techniques for ranking query results and evaluating their effectiveness. Fortunately, recent trends in DB and IR reSearch demonstrate a growing interest in adopting IR techniques in DBs and vice versa [1, 2, 3, 4, 5, 6, 7, 9].

  • texquery a Full Text Search extension to xquery
    The Web Conference, 2004
    Co-Authors: Sihem Ameryahia, Chavdar Botev, Jayavel Shanmugasundaram
    Abstract:

    One of the key benefits of XML is its ability to represent a mix of structured and unstructured (Text) data. Although current XML query languages such as XPath and XQuery can express rich queries over structured data, they can only express very rudimentary queries over Text data. We thus propose TeXQuery, which is a powerful Full-Text Search extension to XQuery. TeXQuery provides a rich set of Fully composable Full-Text Search primitives,such as Boolean connectives, phrase matching, proximity distance, stemming and thesauri. TeXQuery also enables users to seamlessly query over both structured and Text data by embedding TeXQuery primitives in XQuery, and vice versa. Finally, TeXQuery supports a flexible scoring construct that can be used toscore query results based on Full-Text predicates. TeXQuery is the precursor ofthe Full-Text language extensions to XPath 2.0 and XQuery 1.0 currently being developed by the W3C.

Alexander B Veretennikov - One of the best experts on this subject based on the ideXlab platform.

  • proximity Full Text Search by means of additional indexes with multi component keys in pursuit of optimal performance
    arXiv: Information Retrieval, 2018
    Co-Authors: Alexander B Veretennikov
    Abstract:

    Full-Text Search engines are important tools for information retrieval. In a proximity Full-Text Search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in a Text, we use additional indexes to store information about nearby words that are at distances from the given word of less than or equal to the MaxDistance parameter. We showed that additional indexes with three-component keys can be used to improve the average query execution time by up to 94.7 times if the queries consist of high-frequency occurring words. In this paper, we present a new Search algorithm with even more performance gains. We consider several strategies for selecting multi-component key indexes for a specific query and compare these strategies with the optimal strategy. We also present the results of Search experiments, which show that three-component key indexes enable much faster Searches in comparison with two-component key indexes. This is a pre-print of a contribution "Veretennikov A.B. (2019) Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance." published in "Manolopoulos Y., Stupnikov S. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2018. Communications in Computer and Information Science, vol 1003" published by Springer, Cham. This book constitutes the refereed proceedings of the 20th International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2018, held in Moscow, Russia, in October 2018. The 9 revised Full papers presented together with three invited papers were careFully reviewed and selected from 54 submissions. The final authenticated version is available online at this https URL.

  • proximity Full Text Search with a response time guarantee by means of additional indexes
    arXiv: Information Retrieval, 2018
    Co-Authors: Alexander B Veretennikov
    Abstract:

    Full-Text Search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity Full-Text Search, we assume that a relevant document contains query terms near each other, especially if the query terms are frequently occurring words. A methodology for high-performance Full-Text query execution is discussed. We build additional indexes to achieve better efficiency. For a word that occurs in the Text, we include in the indexes some information about nearby words. What types of additional indexes do we use? How do we use them? These questions are discussed in this work. We present the results of experiments showing that the average time of Search query execution is 44-45 times less than that required when using ordinary inverted indexes. This is a pre-print of a contribution "Veretennikov A.B. Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes" published in "Arai K., Kapoor S., Bhatia R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868" published by Springer, Cham. The final authenticated version is available online at: this https URL. The work was supported by Act 211 Government of the Russian Federation, contract no 02.A03.21.0006.

  • proximity Full Text Search by means of additional indexes with multi component keys in pursuit of optimal performance
    International Conference on Data Analytics and Management in Data Intensive Domains, 2018
    Co-Authors: Alexander B Veretennikov
    Abstract:

    Full-Text Search engines are important tools for information retrieval. In a proximity Full-Text Search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in a Text, we use additional indexes to store information about nearby words that are at distances from the given word of less than or equal to the MaxDistance parameter. We showed that additional indexes with three-component keys can be used to improve the average query execution time by up to 94.7 times if the queries consist of high-frequency occurring words. In this paper, we present a new Search algorithm with even more performance gains. We consider several strategies for selecting multi-component key indexes for a specific query and compare these strategies with the optimal strategy. We also present the results of Search experiments, which show that three-component key indexes enable much faster Searches in comparison with two-component key indexes.

  • proximity Full Text Search with a response time guarantee by means of additional indexes
    SAI Intelligent Systems Conference, 2018
    Co-Authors: Alexander B Veretennikov
    Abstract:

    Full-Text Search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity Full-Text Search, we assume that a relevant document contains query terms near each other, especially if the query terms are frequently occurring words. A methodology for high-performance Full-Text query execution is discussed. We build additional indexes to achieve better efficiency. For a word that occurs in the Text, we include in the indexes some information about nearby words. What types of additional indexes do we use? How do we use them? These questions are discussed in this work. We present the results of experiments showing that the average time of Search query execution is 44–45 times less than that required when using ordinary inverted indexes.

  • proximity Full Text Search with a response time guarantee by means of additional indexes with multi component keys
    DAMDID RCDL, 2018
    Co-Authors: Alexander B Veretennikov
    Abstract:

    Full-Text Search engines are important tools for information retrieval. In a proximity Full-Text Search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in the Text, we use additional indexes to store information about nearby words at distances from the given word of less than or equal to MaxDistance, which is a parameter. We had shown that additional indexes with three-component keys can be used to improve the average query execution time up to 94.7 times if the queries consist of high-frequency used words. In this paper, we present a new Search algorithm with even more performance gains. We also present results of Search experiments, which show that three-component key indexes enable much faster Searches in comparison with two-component key indexes.

Hannah Bast - One of the best experts on this subject based on the ideXlab platform.

  • semantic Full Text Search with broccoli
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014
    Co-Authors: Hannah Bast, Florian Baurle, Bjorn Buchhold, Elmar Hausmann
    Abstract:

    We combine Search in triple stores with Full-Text Search into what we call \emph{semantic Full-Text Search}. We provide a Fully functional web application that allows the incremental construction of complex queries on the English Wikipedia combined with the facts from Freebase. The user is guided by conText-sensitive suggestions of matching words, instances, classes, and relations after each keystroke. We also provide a powerful API, which may be used for reSearch tasks or as a back end, e.g., for a question answering system. Our web application and public API are available under \url{http://broccoli.cs.uni-freiburg.de}.

  • an index for efficient semantic Full Text Search
    Conference on Information and Knowledge Management, 2013
    Co-Authors: Hannah Bast, Bjorn Buchhold
    Abstract:

    In this paper we present a novel index data structure tailored towards semantic Full-Text Search. Semantic Full-Text Search, as we call it, deeply integrates keyword-based Full-Text Search with structured Search in ontologies. Queries are SPARQL-like, with additional relations for specifying word-entity co-occurrences. In order to build such queries the user needs to be guided. We believe that incremental query construction with conText-sensitive suggestions in every step serves that purpose well. Our index has to answer queries and provide such suggestions in real time. We achieve this through a novel kind of posting lists and query processing, avoiding very long (intermediate) result lists and expensive (non-local) operations on these lists. In an evaluation of 8000 queries on the Full English Wikipedia (40 GB XML dump) and the YAGO ontology (26.6 million facts), we achieve average query and suggestion times of around 150ms.

  • a case for semantic Full Text Search
    Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search, 2012
    Co-Authors: Hannah Bast, Florian Baurle, Bjorn Buchhold, Elmar Haussmann
    Abstract:

    We discuss the advantages and shortcomings of Full-Text Search on the one hand and Search in ontologies/triple stores on the other hand. We argue that both techniques have an important quality missing from the other. We advocate a deep integration of the two, and describe the associated requirements and challenges.

  • broccoli semantic Full Text Search at your fingertips
    arXiv: Information Retrieval, 2012
    Co-Authors: Hannah Bast, Florian Baurle, Bjorn Buchhold, Elmar Haussmann
    Abstract:

    We present Broccoli, a fast and easy-to-use Search engine for what we call semantic Full-Text Search. Semantic Full-Text Search combines the capabilities of standard Full-Text Search and ontology Search. The Search operates on four kinds of objects: ordinary words (e.g., edible), classes (e.g., plants), instances (e.g., Broccoli), and relations (e.g., occurs-with or native-to). Queries are trees, where nodes are arbitrary bags of these objects, and arcs are relations. The user interface guides the user in incrementally constructing such trees by instant (Search-as-you-type) suggestions of words, classes, instances, or relations that lead to good hits. Both standard Full-Text Search and pure ontology Search are included as special cases. In this paper, we describe the query language of Broccoli, the main idea behind a new kind of index that enables fast processing of queries from that language as well as fast query suggestion, the natural language processing required, and the user interface. We evaluated query times and result quality on the Full version of the English Wikipedia (40 GB XML dump) combined with the YAGO ontology (26 million facts). We have implemented a Fully functional prototype based on our ideas and provide a web application to reproduce our quality experiments. Both are accessible via this http URL .

Jamie Callan - One of the best experts on this subject based on the ideXlab platform.

  • Full-Text federated Search of Text-based digital libraries in peer-to-peer networks
    Information Retrieval, 2006
    Co-Authors: Jamie Callan
    Abstract:

    Peer-to-peer (P2P) networks integrate autonomous computing resources without requiring a central coordinating authority, which makes them a potentially robust and scalable model for providing federated Search capability to large-scale networks of Text-based digital libraries. However, peer-to-peer networks have so far provided very limited support for Full-Text federated Search with relevance-based document ranking. This paper provides solutions to Full-Text federated Search of Text-based digital libraries in hierarchical peer-to-peer networks. Existing approaches to Full-Text Search are adapted and new methods are developed for the problems of resource representation, resource selection, and result merging according to the unique characteristics of hierarchical peer-to-peer networks. Experimental results demonstrate that the proposed approaches offer a better combination of accuracy and efficiency than more common alternatives for federated Search of Text-based digital libraries in peer-to-peer networks.

  • the fedlemur project federated Search in the real world
    Journal of the Association for Information Science and Technology, 2006
    Co-Authors: Thi Truong Avrahami, Lawrence Yau, Jamie Callan
    Abstract:

    Federated Search and distributed information retrieval systems provide a single user interface for Searching multiple Full-Text Search engines. They have been an active area of reSearch for more than a decade, but in spite of their success as a reSearch topic, they are still rare in operational environments. This article discusses a prototype federated Search system developed for the U.S. government's FedStats Web portal, and the issues addressed in adapting reSearch solutions to this operational environment. A series of experiments explore how well prior reSearch results, parameter settings, and heuristics apply in the FedStats environment. The article concludes with a set of lessons learned from this technology transfer effort, including observations about Search engine quality in the “real world.” © 2006 Wiley Periodicals, Inc.

Wang Shenkang - One of the best experts on this subject based on the ideXlab platform.