Web Search Engine

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 13416 Experts worldwide ranked by ideXlab platform

Mitsuru Ishizuka - One of the best experts on this subject based on the ideXlab platform.

  • A Web Search Engine-Based Approach to Measure Semantic Similarity between Words
    IEEE Transactions on Knowledge and Data Engineering, 2011
    Co-Authors: Danushka Tarupathi Bollegala, Yutaka Matsuo, Mitsuru Ishizuka
    Abstract:

    Measuring the semantic similarity between words is an important component in various tasks on the Web such as relation extraction, community mining, document clustering, and automatic metadata extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring semantic similarity between two words (or entities) remains a challenging task. We propose an empirical method to estimate semantic similarity using page counts and text snippets retrieved from a Web Search Engine for two words. Specifically, we define various word co-occurrence measures using page counts and integrate those with lexical patterns extracted from text snippets. To identify the numerous semantic relations that exist between two given words, we propose a novel pattern extraction algorithm and a pattern clustering algorithm. The optimal combination of page counts-based co-occurrence measures and lexical pattern clusters is learned using support vector machines. The proposed method outperforms various baselines and previously proposed Web-based semantic similarity measures on three benchmark data sets showing a high correlation with human ratings. Moreover, the proposed method significantly improves the accuracy in a community mining task.

  • graph based word clustering using a Web Search Engine
    Empirical Methods in Natural Language Processing, 2006
    Co-Authors: Yutaka Matsuo, Takeshi Sakaki, Koki Uchiyama, Mitsuru Ishizuka
    Abstract:

    Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the Web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by Web counts. Each pair of words is queried to a Search Engine, which produces a co-occurrence matrix. By calculating the similarity of words, a word co-occurrence graph is obtained. A new kind of graph clustering algorithm called Newman clustering is applied for efficiently identifying word clusters. Evaluations are made on two sets of word groups derived from a Web directory and WordNet.

Stefan Decker - One of the best experts on this subject based on the ideXlab platform.

  • Searching and browsing linked data with swse the semantic Web Search Engine
    Journal of Web Semantics, 2011
    Co-Authors: Aidan Hogan, Andreas Harth, Sheila Kinsella, Jürgen Umbrich, Axel Polleres, Stefan Decker
    Abstract:

    In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional Search Engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for Search, browsing and retrieval of information; unlike traditional Search Engines, SWSE operates over RDF Web data - loosely also known as Linked Data - which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web - in terms of scale, unreliability, inconsistency and noise - are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a best-effort manner, for use on Web data. Throughout, we offer evaluation and complementary argumentation to support our design choices, and also offer discussion on future directions and open reSearch questions. Later, we also provide candid discussion relating to the difficulties currently faced in bringing such a Search Engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.

  • Searching and browsing linked data with swse the semantic Web Search Engine
    Journal of Web Semantics, 2011
    Co-Authors: Aidan Hogan, Andreas Harth, Sheila Kinsella, Jürgen Umbrich, Axel Polleres, Stefan Decker
    Abstract:

    In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional Search Engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for Search, browsing and retrieval of information; unlike traditional Search Engines, SWSE operates over RDF Web data - loosely also known as Linked Data - which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web - in terms of scale, unreliability, inconsistency and noise - are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a best-effort manner, for use on Web data. Throughout, we offer evaluation and complementary argumentation to support our design choices, and also offer discussion on future directions and open reSearch questions. Later, we also provide candid discussion relating to the difficulties currently faced in bringing such a Search Engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.

  • Building a Semantic Web Search Engine: Challenges and Solutions
    2008
    Co-Authors: Andreas Harth, Aidan Hogan, Jürgen Umbrich, Stefan Decker
    Abstract:

    Current Web Search Engines return links to documents for user-specified keywords queries. Users have to then manually trawl through lists of links and glean the required information from documents. In contrast, semantic Search Engines allow more expressive queries over information integrated from multiple sources, and return specific information about entities, for example people, locations, news items. An entity-centric data model furthermore permits powerful query and browsing techniques. In this paper, we report on our experiences in collecting and integrating Web data from millions of sources, and describe both application-developer query services and end-user navigation services offered by SWSE, the Semantic Web Search Engine.

Amanda Spink - One of the best experts on this subject based on the ideXlab platform.

  • Time series analysis of a Web Search Engine transaction log
    Information Processing and Management, 2009
    Co-Authors: Ying Zhang, Bernard J. Jansen, Amanda Spink
    Abstract:

    In this paper, we use time series analysis to evaluate predictive scenarios using Search Engine transactional logs. Our goal is to develop models for the analysis of Searchers' behaviors over time and investigate if time series analysis is a valid method for predicting relationships between Searcher actions. Time series analysis is a method often used to understand the underlying characteristics of temporal data in order to make forecasts. In this study, we used a Web Search Engine transactional log and time series analysis to investigate users' actions. We conducted our analysis in two phases. In the initial phase, we employed a basic analysis and found that 10% of Searchers clicked on sponsored links. However, from 22:00 to 24:00, Searchers almost exclusively clicked on the organic links, with almost no clicks on sponsored links. In the second and more extensive phase, we used a one-step prediction time series analysis method along with a transfer function method. The period rarely affects navigational and transactional queries, while rates for transactional queries vary during different periods. Our results show that the average length of a Searcher session is approximately 2.9 interactions and that this average is consistent across time periods. Most importantly, our findings shows that Searchers who submit the shortest queries (i.e., in number of terms) click on highest ranked results. We discuss implications, including predictive value, and future reSearch.

  • Web Search Engine multimedia functionality
    Faculty of Science and Technology; Institute for Creative Industries and Innovation, 2008
    Co-Authors: Dian Tjondronegoro, Amanda Spink
    Abstract:

    Web Search Engines are beginning to offer access to multimedia Searching, including audio, video and image Searching. In this paper we report findings from a study examining the state of multimedia Search functionality on major general and specialized Web Search Engines. We investigated 102 Web Search Engines to examine: (1) how many Web Search Engines offer multimedia Searching, (2) the type of multimedia Search functionality and methods offered, such as "query by example", and (3) the supports for personalization or customization which are accessible as advanced Search. Findings include: (1) few major Web Search Engines offer multimedia Searching and (2) multimedia Web Search functionality is generally limited. Our findings show that despite the increasing level of interest in multimedia Web Search, those few Web Search Engines offering multimedia Web Search, provide limited multimedia Search functionality. Keywords are still the only means of multimedia retrieval, while other methods such as "query by example" are offered by less than 1% of Web Search Engines examined.

  • determining the user intent of Web Search Engine queries
    The Web Conference, 2007
    Co-Authors: Bernard J. Jansen, Danielle L Booth, Amanda Spink
    Abstract:

    Determining the user intent of Web Searches is a difficult problem due to the sparse data available concerning the Searcher. In this paper, we examine a method to determine the user intent underlying Web Search Engine queries. We qualitatively analyze samples of queries from seven transaction logs from three different Web Search Engines containing more than five million queries. From this analysis, we identified characteristics of user queries based on three broad classifications of user intent. The classifications of informational, navigational, and transactional represent the type of content destination the Searcher desired as expressed by their query. We implemented our classification algorithm and automatically classified a separate Web Search Engine transaction log of over a million queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the classification to the results from our algorithm. This comparison showed that our automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is generally vague or multi-faceted, pointing to the need to for probabilistic classification. We illustrate how knowledge of Searcher intent might be used to enhance future Web Search Engines.

  • WWW - Determining the user intent of Web Search Engine queries
    Proceedings of the 16th international conference on World Wide Web - WWW '07, 2007
    Co-Authors: Bernard J. Jansen, Danielle L Booth, Amanda Spink
    Abstract:

    Determining the user intent of Web Searches is a difficult problem due to the sparse data available concerning the Searcher. In this paper, we examine a method to determine the user intent underlying Web Search Engine queries. We qualitatively analyze samples of queries from seven transaction logs from three different Web Search Engines containing more than five million queries. From this analysis, we identified characteristics of user queries based on three broad classifications of user intent. The classifications of informational, navigational, and transactional represent the type of content destination the Searcher desired as expressed by their query. We implemented our classification algorithm and automatically classified a separate Web Search Engine transaction log of over a million queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the classification to the results from our algorithm. This comparison showed that our automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is generally vague or multi-faceted, pointing to the need to for probabilistic classification. We illustrate how knowledge of Searcher intent might be used to enhance future Web Search Engines.

  • how are we Searching the world wide Web a comparison of nine Search Engine transaction logs
    Information Processing and Management, 2006
    Co-Authors: Bernard J. Jansen, Amanda Spink
    Abstract:

    The Web and especially major Web Search Engines are essential tools in the quest to locate online information for many people. This paper reports results from reSearch that examines characteristics and changes in Web Searching from nine studies of five Web Search Engines based in the US and Europe. We compare interactions occurring between users and Web Search Engines from the perspectives of session length, query length, query complexity, and content viewed among the Web Search Engines. The results of our reSearch shows (1) users are viewing fewer result pages, (2) Searchers on US-based Web Search Engines use more query operators than Searchers on European-based Search Engines, (3) there are statistically significant differences in the use of Boolean operators and result pages viewed, and (4) one cannot necessary apply results from studies of one particular Web Search Engine to another Web Search Engine. The wide spread use of Web Search Engines, employment of simple queries, and decreased viewing of result pages may have resulted from algorithmic enhancements by Web Search Engine companies. We discuss the implications of the findings for the development of Web Search Engines and design of online content.

Enrico Motta - One of the best experts on this subject based on the ideXlab platform.

  • Watson, more than a Semantic Web Search Engine
    Semantic Web, 2011
    Co-Authors: Mathieu D' Aquin, Enrico Motta
    Abstract:

    In this tool report, we present an overview of the Watson system, a Semantic Web Search Engine providing various functionalities not only to find and locate ontologies and semantic data online, but also to explore the content of these semantic documents. Beyond the simple facade of a Search Engine for the Semantic Web, we show that the availability of such a component brings new possibilities in terms of developing semantic applications that exploit the content of the Semantic Web. Indeed, Watson provides a set of APIs containing high level functions for finding, exploring and querying semantic data and ontologies that have been published online. Thanks to these APIs, new applications have emerged that connect activities such as ontology construction, matching, sense disambiguation and question answering to the Semantic Web, developed by our group and others. In addition, we also describe Watson as a unprecedented reSearch platform for the study the Semantic Web, and of formalised knowledge in general.

Yutaka Matsuo - One of the best experts on this subject based on the ideXlab platform.

  • A Web Search Engine-Based Approach to Measure Semantic Similarity between Words
    IEEE Transactions on Knowledge and Data Engineering, 2011
    Co-Authors: Danushka Tarupathi Bollegala, Yutaka Matsuo, Mitsuru Ishizuka
    Abstract:

    Measuring the semantic similarity between words is an important component in various tasks on the Web such as relation extraction, community mining, document clustering, and automatic metadata extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring semantic similarity between two words (or entities) remains a challenging task. We propose an empirical method to estimate semantic similarity using page counts and text snippets retrieved from a Web Search Engine for two words. Specifically, we define various word co-occurrence measures using page counts and integrate those with lexical patterns extracted from text snippets. To identify the numerous semantic relations that exist between two given words, we propose a novel pattern extraction algorithm and a pattern clustering algorithm. The optimal combination of page counts-based co-occurrence measures and lexical pattern clusters is learned using support vector machines. The proposed method outperforms various baselines and previously proposed Web-based semantic similarity measures on three benchmark data sets showing a high correlation with human ratings. Moreover, the proposed method significantly improves the accuracy in a community mining task.

  • graph based word clustering using a Web Search Engine
    Empirical Methods in Natural Language Processing, 2006
    Co-Authors: Yutaka Matsuo, Takeshi Sakaki, Koki Uchiyama, Mitsuru Ishizuka
    Abstract:

    Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the Web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by Web counts. Each pair of words is queried to a Search Engine, which produces a co-occurrence matrix. By calculating the similarity of words, a word co-occurrence graph is obtained. A new kind of graph clustering algorithm called Newman clustering is applied for efficiently identifying word clusters. Evaluations are made on two sets of word groups derived from a Web directory and WordNet.