Information Retrieval

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 269325 Experts worldwide ranked by ideXlab platform

Jerry Chunwei Lin - One of the best experts on this subject based on the ideXlab platform.

  • Cluster-based Information Retrieval using pattern mining
    Applied Intelligence, 2020
    Co-Authors: Youcef Djenouri, Asma Belhadi, Djamel Djenouri, Jerry Chunwei Lin
    Abstract:

    This paper addresses the problem of responding to user queries by fetching the most relevant object from a clustered set of objects. It addresses the common drawbacks of cluster-based approaches and targets fast, high-quality Information Retrieval. For this purpose, a novel cluster-based Information Retrieval approach is proposed, named Cluster-based Retrieval using Pattern Mining (CRPM). This approach integrates various clustering and pattern mining algorithms. First, it generates clusters of objects that contain similar objects. Three clustering algorithms based on k-means, DBSCAN (Density-based spatial clustering of applications with noise), and Spectral are suggested to minimize the number of shared terms among the clusters of objects. Second, frequent and high-utility pattern mining algorithms are performed on each cluster to extract the pattern bases. Third, the clusters of objects are ranked for every query. In this context, two ranking strategies are proposed: i) Score Pattern Computing (SPC), which calculates a score representing the similarity between a user query and a cluster; and ii) Weighted Terms in Clusters (WTC), which calculates a weight for every term and uses the relevant terms to compute the score between a user query and each cluster. Irrelevant Information derived from the pattern bases is also used to deal with unexpected user queries. To evaluate the proposed approach, extensive experiments were carried out on two use cases: the documents and tweets corpus. The results showed that the designed approach outperformed traditional and cluster-based Information Retrieval approaches in terms of the quality of the returned objects while being very competitive in terms of runtime.

  • fast and effective cluster based Information Retrieval using frequent closed itemsets
    Information Sciences, 2018
    Co-Authors: Youcef Djenouri, Asma Belhadi, Philippe Fournierviger, Jerry Chunwei Lin
    Abstract:

    Abstract Document Information Retrieval consists of finding the documents in a collection of documents that are the most relevant to a user query. Information Retrieval techniques are widely-used by organizations to facilitate the search for Information. However, applying traditional Information Retrieval techniques is time consuming for large document collections. Recently, cluster-based Information Retrieval approaches have been developed. Although these approaches are often much faster than traditional approaches for processing large document collections, the quality of the documents retrieved by cluster-based approaches is often less than that of traditional approaches. To address this drawback of cluster-based approaches, and improve the performance of Information Retrieval both in terms of runtime and quality of retrieved documents, this paper proposes a new cluster-based Information Retrieval approach named ICIR (Intelligent Cluster-based Information Retrieval). The proposed approach combines k-means clustering with frequent closed itemset mining to extract clusters of documents and find frequent terms in each cluster. Patterns discovered in each cluster are then used to select the most relevant document clusters to answer each user query. Four alternative heuristics are proposed to select the most relevant clusters, and two alternative heuristics for choosing documents in the selected clusters. Thus, eight versions of the proposed approach are obtained. To validate the proposed approach, extensive experiments have been carried out on well-known document collections. Results show that the designed approach outperforms traditional and cluster-based Information Retrieval approaches both in terms of execution time and quality of the returned documents.

Rafail Ostrovsky - One of the best experts on this subject based on the ideXlab platform.

  • Universal Service-Providers for Private Information Retrieval
    Journal of Cryptology, 2001
    Co-Authors: Giovanni Crescenzo, Yuval Ishai, Rafail Ostrovsky
    Abstract:

    A private Information Retrieval scheme allows a user to retrieve a data item of his choice from a remote database (or several copies of a database) while hiding from the database owner which particular data item he is interested in. We consider the question of private Information Retrieval in the so-called ``commodity-based'' model, recently proposed by Beaver for practically oriented service-provider Internet applications. We present simple and modular schemes allowing us to reduce dramatically the overall communication involving users, and substantially reduce their computation, using off-line messages sent from service-providers to databases and users. The service-providers do not need to know the database contents nor the future user's requests; all they need to know is an upper bound on the data size. Our solutions can be made resilient against collusions of databases with more than a majority (in fact, all-but-one) of the service-providers.

  • replication is not needed single database computationally private Information Retrieval
    Foundations of Computer Science, 1997
    Co-Authors: Eyal Kushilevitz, Rafail Ostrovsky
    Abstract:

    We establish the following, quite unexpected, result: replication of data for the computational private Information Retrieval problem is not necessary. More specifically, based on the quadratic residuosity assumption, we present a single database, computationally private Information Retrieval scheme with O(n/sup /spl epsiv//) communication complexity for any /spl epsiv/>0.

Carl D Meyer - One of the best experts on this subject based on the ideXlab platform.

  • a survey of eigenvector methods for web Information Retrieval
    Siam Review, 2005
    Co-Authors: Amy N Langville, Carl D Meyer
    Abstract:

    Web Information Retrieval is significantly more challenging than traditional well-controlled, small document collection Information Retrieval. One main difference between traditional Information Retrieval and Web Information Retrieval is the Web's hyperlink structure. This structure has been exploited by several of today's leading Web search engines, particularly Google and Teoma. In this survey paper, we focus on Web Information Retrieval methods that use eigenvector computations, presenting the three popular methods of HITS, PageRank, and SALSA.

D.s. Tudhope - One of the best experts on this subject based on the ideXlab platform.

  • Geographical Information Retrieval with Ontologies of Place
    D. Montello (ed) Spatial Information Theory Foundations of Geographic Information Science COSIT 2001 Lecture Notes in Computer Science 2205, 2001
    Co-Authors: Christopher B Jones, Harith Alani, D.s. Tudhope
    Abstract:

    Geographical context is required of many Information Retrieval tasks in which the target of the search may be documents, images or records which are referenced to geographical space only by means of place names. Often there may be an imprecise match between the query name and the names associated with candidate sources of Information. There is a need therefore for geographical Information Retrieval facilities that can rank the relevance of candidate Information with respect to geographical closeness as well as semantic closeness with respect to the topic of interest. Here we present an ontology of place that combines limited coordinate data with qualitative spatial relationships between places. This parsimonious model of place is intended to support Information Retrieval tasks that may be global in scope. The ontology has been implemented with a semantic modelling system linking non-spatial conceptual hierarchies with the place ontology. An hierarchical distance measure is combined with Euclidean distance between place centroids to create a hybrid spatial distance measure. This can be combined with thematic distance, based on classification semantics, to create an integrated semantic closeness measure that can be used for a relevance ranking of retrieved objects.

Barry Vercoe - One of the best experts on this subject based on the ideXlab platform.

  • using user models in music Information Retrieval systems
    International Symposium Conference on Music Information Retrieval, 2000
    Co-Authors: Wei Chai, Barry Vercoe
    Abstract:

    To make multimedia data easily retrieved, we use metadata to describe the Information, so that search engines or other Information filter tools can effectively and efficiently locate and retrieve the multimedia content. Since many features of multimedia content are perceptual and user-dependent, user modeling is also necessary for multimedia Information Retrieval systems, e.g., music Information Retrieval systems. Furthermore, to make the user models sharable, we need standardized language to describe them. In this paper, an XMLlike language is proposed to describe the user model for music Information Retrieval purposes. We also propose some paradigms to acquire, deploy and share the user Information to improve current music Information systems. A prototype system, MusicCat, is analyzed and implemented as a case.