Full Text Search - Explore the Science & Experts

The Experts below are selected from a list of 39537 Experts worldwide ranked by ideXlab platform

Jayavel Shanmugasundaram - One of the best experts on this subject based on the ideXlab platform.

A texquery-based xml Full-Text Search engine

2015

Co-Authors: Chavdar Botev, Sihem Amer-yahia, Jayavel Shanmugasundaram

Abstract:

We demonstrate an XML Full-Text Search engine that imple-ments the TeXQuery language. TeXQuery is a powerful Full-Text Search extension to XQuery that provides a rich set of Fully composable Full-Text primitives, such as phrase match-ing, proximity distance, stemming and thesauri. TeXQuery enables users to seamlessly query over both structure data and Text, by embedding Full-Text primitives in XQuery and vice versa. TeXQuery also supports a flexible scoring construct that scores query results based on Full-Text predicates and permits top-k queries. TeXQuery is the precursor of the Full-Text lan-guage extension to XPath 2.0 and XQuery 1.0 currently being developed by W3C. 1

15 days free trial to Access Article
expressiveness and performance of Full Text Search languages

Extending Database Technology, 2006

Co-Authors: Chavdar Botev, Sihem Ameryahia, Jayavel Shanmugasundaram

Abstract:

We study the expressiveness and performance of Full-Text Search languages. Our motivation is to provide a formal basis for comparing Full-Text Search languages and to develop a model for Full-Text Search that can be tightly integrated with structured Search. We design a model based on the positions of tokens (words) in the input Text, and develop a Full-Text calculus (FTC) and a Full-Text algebra (FTA) with equivalent expressive power; this suggests a notion of completeness for Full-Text Search languages. We show that existing Full-Text languages are incomplete and identify a practical subset of the FTC and FTA that is more powerful than existing languages, but which can still be evaluated efficiently.

15 days free trial to Access Article
xml Full Text Search challenges and opportunities

Very Large Data Bases, 2005

Co-Authors: Sihem Ameryahia, Jayavel Shanmugasundaram

Abstract:

An ever growing number of XML repositories are being made available for Search. A lot of activity has been deployed in the past few years to query such repositories. In particular, Full-Text querying of Text-rich XML documents has generated a wealth of issues that are being addressed by both the database (DB) and information retrieval (IR) communities. The DB community has traditionally focused on developing query languages and efficient evaluation algorithms for highly structured data. In contrast, the IR community has focused on Searching unstructured data, and has developed various techniques for ranking query results and evaluating their effectiveness. Fortunately, recent trends in DB and IR reSearch demonstrate a growing interest in adopting IR techniques in DBs and vice versa [1, 2, 3, 4, 5, 6, 7, 9].

15 days free trial to Access Article
texquery a Full Text Search extension to xquery

The Web Conference, 2004

Co-Authors: Sihem Ameryahia, Chavdar Botev, Jayavel Shanmugasundaram

Abstract:

One of the key benefits of XML is its ability to represent a mix of structured and unstructured (Text) data. Although current XML query languages such as XPath and XQuery can express rich queries over structured data, they can only express very rudimentary queries over Text data. We thus propose TeXQuery, which is a powerful Full-Text Search extension to XQuery. TeXQuery provides a rich set of Fully composable Full-Text Search primitives,such as Boolean connectives, phrase matching, proximity distance, stemming and thesauri. TeXQuery also enables users to seamlessly query over both structured and Text data by embedding TeXQuery primitives in XQuery, and vice versa. Finally, TeXQuery supports a flexible scoring construct that can be used toscore query results based on Full-Text predicates. TeXQuery is the precursor ofthe Full-Text language extensions to XPath 2.0 and XQuery 1.0 currently being developed by the W3C.

15 days free trial to Access Article

Alexander B Veretennikov - One of the best experts on this subject based on the ideXlab platform.

proximity Full Text Search by means of additional indexes with multi component keys in pursuit of optimal performance

arXiv: Information Retrieval, 2018

Co-Authors: Alexander B Veretennikov

Abstract:

Full-Text Search engines are important tools for information retrieval. In a proximity Full-Text Search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in a Text, we use additional indexes to store information about nearby words that are at distances from the given word of less than or equal to the MaxDistance parameter. We showed that additional indexes with three-component keys can be used to improve the average query execution time by up to 94.7 times if the queries consist of high-frequency occurring words. In this paper, we present a new Search algorithm with even more performance gains. We consider several strategies for selecting multi-component key indexes for a specific query and compare these strategies with the optimal strategy. We also present the results of Search experiments, which show that three-component key indexes enable much faster Searches in comparison with two-component key indexes. This is a pre-print of a contribution "Veretennikov A.B. (2019) Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance." published in "Manolopoulos Y., Stupnikov S. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2018. Communications in Computer and Information Science, vol 1003" published by Springer, Cham. This book constitutes the refereed proceedings of the 20th International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2018, held in Moscow, Russia, in October 2018. The 9 revised Full papers presented together with three invited papers were careFully reviewed and selected from 54 submissions. The final authenticated version is available online at this https URL.

15 days free trial to Access Article
proximity Full Text Search with a response time guarantee by means of additional indexes

arXiv: Information Retrieval, 2018

Co-Authors: Alexander B Veretennikov

Abstract:

Full-Text Search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity Full-Text Search, we assume that a relevant document contains query terms near each other, especially if the query terms are frequently occurring words. A methodology for high-performance Full-Text query execution is discussed. We build additional indexes to achieve better efficiency. For a word that occurs in the Text, we include in the indexes some information about nearby words. What types of additional indexes do we use? How do we use them? These questions are discussed in this work. We present the results of experiments showing that the average time of Search query execution is 44-45 times less than that required when using ordinary inverted indexes. This is a pre-print of a contribution "Veretennikov A.B. Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes" published in "Arai K., Kapoor S., Bhatia R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868" published by Springer, Cham. The final authenticated version is available online at: this https URL. The work was supported by Act 211 Government of the Russian Federation, contract no 02.A03.21.0006.

15 days free trial to Access Article
proximity Full Text Search by means of additional indexes with multi component keys in pursuit of optimal performance

International Conference on Data Analytics and Management in Data Intensive Domains, 2018

Co-Authors: Alexander B Veretennikov

Abstract:

Full-Text Search engines are important tools for information retrieval. In a proximity Full-Text Search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in a Text, we use additional indexes to store information about nearby words that are at distances from the given word of less than or equal to the MaxDistance parameter. We showed that additional indexes with three-component keys can be used to improve the average query execution time by up to 94.7 times if the queries consist of high-frequency occurring words. In this paper, we present a new Search algorithm with even more performance gains. We consider several strategies for selecting multi-component key indexes for a specific query and compare these strategies with the optimal strategy. We also present the results of Search experiments, which show that three-component key indexes enable much faster Searches in comparison with two-component key indexes.

15 days free trial to Access Article
proximity Full Text Search with a response time guarantee by means of additional indexes

SAI Intelligent Systems Conference, 2018

Co-Authors: Alexander B Veretennikov

Abstract:

Full-Text Search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity Full-Text Search, we assume that a relevant document contains query terms near each other, especially if the query terms are frequently occurring words. A methodology for high-performance Full-Text query execution is discussed. We build additional indexes to achieve better efficiency. For a word that occurs in the Text, we include in the indexes some information about nearby words. What types of additional indexes do we use? How do we use them? These questions are discussed in this work. We present the results of experiments showing that the average time of Search query execution is 44–45 times less than that required when using ordinary inverted indexes.

15 days free trial to Access Article
proximity Full Text Search with a response time guarantee by means of additional indexes with multi component keys

DAMDID RCDL, 2018

Co-Authors: Alexander B Veretennikov

Abstract:

Full-Text Search engines are important tools for information retrieval. In a proximity Full-Text Search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in the Text, we use additional indexes to store information about nearby words at distances from the given word of less than or equal to MaxDistance, which is a parameter. We had shown that additional indexes with three-component keys can be used to improve the average query execution time up to 94.7 times if the queries consist of high-frequency used words. In this paper, we present a new Search algorithm with even more performance gains. We also present results of Search experiments, which show that three-component key indexes enable much faster Searches in comparison with two-component key indexes.

15 days free trial to Access Article

Hannah Bast - One of the best experts on this subject based on the ideXlab platform.

semantic Full Text Search with broccoli

International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Co-Authors: Hannah Bast, Florian Baurle, Bjorn Buchhold, Elmar Hausmann

Abstract:

We combine Search in triple stores with Full-Text Search into what we call \emph{semantic Full-Text Search}. We provide a Fully functional web application that allows the incremental construction of complex queries on the English Wikipedia combined with the facts from Freebase. The user is guided by conText-sensitive suggestions of matching words, instances, classes, and relations after each keystroke. We also provide a powerful API, which may be used for reSearch tasks or as a back end, e.g., for a question answering system. Our web application and public API are available under \url{http://broccoli.cs.uni-freiburg.de}.

15 days free trial to Access Article
an index for efficient semantic Full Text Search

Conference on Information and Knowledge Management, 2013

Co-Authors: Hannah Bast, Bjorn Buchhold

Abstract:

In this paper we present a novel index data structure tailored towards semantic Full-Text Search. Semantic Full-Text Search, as we call it, deeply integrates keyword-based Full-Text Search with structured Search in ontologies. Queries are SPARQL-like, with additional relations for specifying word-entity co-occurrences. In order to build such queries the user needs to be guided. We believe that incremental query construction with conText-sensitive suggestions in every step serves that purpose well. Our index has to answer queries and provide such suggestions in real time. We achieve this through a novel kind of posting lists and query processing, avoiding very long (intermediate) result lists and expensive (non-local) operations on these lists. In an evaluation of 8000 queries on the Full English Wikipedia (40 GB XML dump) and the YAGO ontology (26.6 million facts), we achieve average query and suggestion times of around 150ms.

15 days free trial to Access Article
a case for semantic Full Text Search

Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search, 2012

Co-Authors: Hannah Bast, Florian Baurle, Bjorn Buchhold, Elmar Haussmann

Abstract:

We discuss the advantages and shortcomings of Full-Text Search on the one hand and Search in ontologies/triple stores on the other hand. We argue that both techniques have an important quality missing from the other. We advocate a deep integration of the two, and describe the associated requirements and challenges.

15 days free trial to Access Article
broccoli semantic Full Text Search at your fingertips

arXiv: Information Retrieval, 2012

Co-Authors: Hannah Bast, Florian Baurle, Bjorn Buchhold, Elmar Haussmann

Abstract:

We present Broccoli, a fast and easy-to-use Search engine for what we call semantic Full-Text Search. Semantic Full-Text Search combines the capabilities of standard Full-Text Search and ontology Search. The Search operates on four kinds of objects: ordinary words (e.g., edible), classes (e.g., plants), instances (e.g., Broccoli), and relations (e.g., occurs-with or native-to). Queries are trees, where nodes are arbitrary bags of these objects, and arcs are relations. The user interface guides the user in incrementally constructing such trees by instant (Search-as-you-type) suggestions of words, classes, instances, or relations that lead to good hits. Both standard Full-Text Search and pure ontology Search are included as special cases. In this paper, we describe the query language of Broccoli, the main idea behind a new kind of index that enables fast processing of queries from that language as well as fast query suggestion, the natural language processing required, and the user interface. We evaluated query times and result quality on the Full version of the English Wikipedia (40 GB XML dump) combined with the YAGO ontology (26 million facts). We have implemented a Fully functional prototype based on our ideas and provide a web application to reproduce our quality experiments. Both are accessible via this http URL .

15 days free trial to Access Article

Jamie Callan - One of the best experts on this subject based on the ideXlab platform.

Full-Text federated Search of Text-based digital libraries in peer-to-peer networks

Information Retrieval, 2006

Co-Authors: Jamie Callan

Abstract:

Peer-to-peer (P2P) networks integrate autonomous computing resources without requiring a central coordinating authority, which makes them a potentially robust and scalable model for providing federated Search capability to large-scale networks of Text-based digital libraries. However, peer-to-peer networks have so far provided very limited support for Full-Text federated Search with relevance-based document ranking. This paper provides solutions to Full-Text federated Search of Text-based digital libraries in hierarchical peer-to-peer networks. Existing approaches to Full-Text Search are adapted and new methods are developed for the problems of resource representation, resource selection, and result merging according to the unique characteristics of hierarchical peer-to-peer networks. Experimental results demonstrate that the proposed approaches offer a better combination of accuracy and efficiency than more common alternatives for federated Search of Text-based digital libraries in peer-to-peer networks.

15 days free trial to Access Article
the fedlemur project federated Search in the real world

Journal of the Association for Information Science and Technology, 2006

Co-Authors: Thi Truong Avrahami, Lawrence Yau, Jamie Callan

Abstract:

Federated Search and distributed information retrieval systems provide a single user interface for Searching multiple Full-Text Search engines. They have been an active area of reSearch for more than a decade, but in spite of their success as a reSearch topic, they are still rare in operational environments. This article discusses a prototype federated Search system developed for the U.S. government's FedStats Web portal, and the issues addressed in adapting reSearch solutions to this operational environment. A series of experiments explore how well prior reSearch results, parameter settings, and heuristics apply in the FedStats environment. The article concludes with a set of lessons learned from this technology transfer effort, including observations about Search engine quality in the “real world.” © 2006 Wiley Periodicals, Inc.

15 days free trial to Access Article

Wang Shenkang - One of the best experts on this subject based on the ideXlab platform.

reSearch and development of Full Text Search engine based on lucene

Computer Engineering, 2006

Co-Authors: Wang Shenkang

Abstract:

The paper proposes a system model for Full Text Search engine based on Jakarta Lucene.This model provides more apparent advantages comparing to Google in-site and the original database Search engine.Its division and comparison technology of keyword,the speed rate to index information and the target sorting results have their own special features.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Full Text Search with ideXlab!

Jayavel Shanmugasundaram - One of the best experts on this subject based on the ideXlab platform.

A texquery-based xml Full-Text Search engine

expressiveness and performance of Full Text Search languages

xml Full Text Search challenges and opportunities

texquery a Full Text Search extension to xquery

Alexander B Veretennikov - One of the best experts on this subject based on the ideXlab platform.

proximity Full Text Search by means of additional indexes with multi component keys in pursuit of optimal performance

proximity Full Text Search with a response time guarantee by means of additional indexes

proximity Full Text Search by means of additional indexes with multi component keys in pursuit of optimal performance

proximity Full Text Search with a response time guarantee by means of additional indexes

proximity Full Text Search with a response time guarantee by means of additional indexes with multi component keys

Hannah Bast - One of the best experts on this subject based on the ideXlab platform.

semantic Full Text Search with broccoli

an index for efficient semantic Full Text Search

a case for semantic Full Text Search

broccoli semantic Full Text Search at your fingertips

Jamie Callan - One of the best experts on this subject based on the ideXlab platform.

Full-Text federated Search of Text-based digital libraries in peer-to-peer networks

the fedlemur project federated Search in the real world

Wang Shenkang - One of the best experts on this subject based on the ideXlab platform.

reSearch and development of Full Text Search engine based on lucene