Academic Search

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 72465 Experts worldwide ranked by ideXlab platform

Neal R. Haddaway - One of the best experts on this subject based on the ideXlab platform.

  • which Academic Search systems are suitable for systematic reviews or meta analyses evaluating retrieval qualities of google scholar pubmed and 26 other resources
    Research Synthesis Methods, 2020
    Co-Authors: Michael Gusenbauer, Neal R. Haddaway
    Abstract:

    Rigorous evidence identification is essential for systematic reviews and meta-analyses (evidence syntheses) because the sample selection of relevant studies determines a review's outcome, validity, and explanatory power. Yet, the Search systems allowing access to this evidence provide varying levels of precision, recall, and reproducibility and also demand different levels of effort. To date, it remains unclear which Search systems are most appropriate for evidence synthesis and why. Advice on which Search engines and bibliographic databases to choose for systematic Searches is limited and lacking systematic, empirical performance assessments. This study investigates and compares the systematic Search qualities of 28 widely used Academic Search systems, including Google Scholar, PubMed, and Web of Science. A novel, query-based method tests how well users are able to interact and retrieve records with each system. The study is the first to show the extent to which Search systems can effectively and efficiently perform (Boolean) Searches with regards to precision, recall, and reproducibility. We found substantial differences in the performance of Search systems, meaning that their usability in systematic Searches varies. Indeed, only half of the Search systems analyzed and only a few Open Access databases can be recommended for evidence syntheses without adding substantial caveats. Particularly, our findings demonstrate why Google Scholar is inappropriate as principal Search system. We call for database owners to recognize the requirements of evidence synthesis and for Academic journals to reassess quality requirements for systematic reviews. Our findings aim to support reSearchers in conducting better Searches for better evidence synthesis.

  • Which Academic Search Systems are Suitable for Systematic Reviews or Meta‐Analyses? Evaluating Retrieval Qualities of Google Scholar, PubMed and 26 other Resources
    Research synthesis methods, 2020
    Co-Authors: Michael Gusenbauer, Neal R. Haddaway
    Abstract:

    Rigorous evidence identification is essential for systematic reviews and meta-analyses (evidence syntheses) because the sample selection of relevant studies determines a review's outcome, validity, and explanatory power. Yet, the Search systems allowing access to this evidence provide varying levels of precision, recall, and reproducibility and also demand different levels of effort. To date, it remains unclear which Search systems are most appropriate for evidence synthesis and why. Advice on which Search engines and bibliographic databases to choose for systematic Searches is limited and lacking systematic, empirical performance assessments. This study investigates and compares the systematic Search qualities of 28 widely used Academic Search systems, including Google Scholar, PubMed, and Web of Science. A novel, query-based method tests how well users are able to interact and retrieve records with each system. The study is the first to show the extent to which Search systems can effectively and efficiently perform (Boolean) Searches with regards to precision, recall, and reproducibility. We found substantial differences in the performance of Search systems, meaning that their usability in systematic Searches varies. Indeed, only half of the Search systems analyzed and only a few Open Access databases can be recommended for evidence syntheses without adding substantial caveats. Particularly, our findings demonstrate why Google Scholar is inappropriate as principal Search system. We call for database owners to recognize the requirements of evidence synthesis and for Academic journals to reassess quality requirements for systematic reviews. Our findings aim to support reSearchers in conducting better Searches for better evidence synthesis.

Michael Gusenbauer - One of the best experts on this subject based on the ideXlab platform.

  • which Academic Search systems are suitable for systematic reviews or meta analyses evaluating retrieval qualities of google scholar pubmed and 26 other resources
    Research Synthesis Methods, 2020
    Co-Authors: Michael Gusenbauer, Neal R. Haddaway
    Abstract:

    Rigorous evidence identification is essential for systematic reviews and meta-analyses (evidence syntheses) because the sample selection of relevant studies determines a review's outcome, validity, and explanatory power. Yet, the Search systems allowing access to this evidence provide varying levels of precision, recall, and reproducibility and also demand different levels of effort. To date, it remains unclear which Search systems are most appropriate for evidence synthesis and why. Advice on which Search engines and bibliographic databases to choose for systematic Searches is limited and lacking systematic, empirical performance assessments. This study investigates and compares the systematic Search qualities of 28 widely used Academic Search systems, including Google Scholar, PubMed, and Web of Science. A novel, query-based method tests how well users are able to interact and retrieve records with each system. The study is the first to show the extent to which Search systems can effectively and efficiently perform (Boolean) Searches with regards to precision, recall, and reproducibility. We found substantial differences in the performance of Search systems, meaning that their usability in systematic Searches varies. Indeed, only half of the Search systems analyzed and only a few Open Access databases can be recommended for evidence syntheses without adding substantial caveats. Particularly, our findings demonstrate why Google Scholar is inappropriate as principal Search system. We call for database owners to recognize the requirements of evidence synthesis and for Academic journals to reassess quality requirements for systematic reviews. Our findings aim to support reSearchers in conducting better Searches for better evidence synthesis.

  • Which Academic Search Systems are Suitable for Systematic Reviews or Meta‐Analyses? Evaluating Retrieval Qualities of Google Scholar, PubMed and 26 other Resources
    Research synthesis methods, 2020
    Co-Authors: Michael Gusenbauer, Neal R. Haddaway
    Abstract:

    Rigorous evidence identification is essential for systematic reviews and meta-analyses (evidence syntheses) because the sample selection of relevant studies determines a review's outcome, validity, and explanatory power. Yet, the Search systems allowing access to this evidence provide varying levels of precision, recall, and reproducibility and also demand different levels of effort. To date, it remains unclear which Search systems are most appropriate for evidence synthesis and why. Advice on which Search engines and bibliographic databases to choose for systematic Searches is limited and lacking systematic, empirical performance assessments. This study investigates and compares the systematic Search qualities of 28 widely used Academic Search systems, including Google Scholar, PubMed, and Web of Science. A novel, query-based method tests how well users are able to interact and retrieve records with each system. The study is the first to show the extent to which Search systems can effectively and efficiently perform (Boolean) Searches with regards to precision, recall, and reproducibility. We found substantial differences in the performance of Search systems, meaning that their usability in systematic Searches varies. Indeed, only half of the Search systems analyzed and only a few Open Access databases can be recommended for evidence syntheses without adding substantial caveats. Particularly, our findings demonstrate why Google Scholar is inappropriate as principal Search system. We call for database owners to recognize the requirements of evidence synthesis and for Academic journals to reassess quality requirements for systematic reviews. Our findings aim to support reSearchers in conducting better Searches for better evidence synthesis.

  • google scholar to overshadow them all comparing the sizes of 12 Academic Search engines and bibliographic databases
    Scientometrics, 2019
    Co-Authors: Michael Gusenbauer
    Abstract:

    Information on the size of Academic Search engines and bibliographic databases (ASEBDs) is often outdated or entirely unavailable. Hence, it is difficult to assess the scope of specific databases, such as Google Scholar. While scientometric studies have estimated ASEBD sizes before, the methods employed were able to compare only a few databases. Consequently, there is no up-to-date comparative information on the sizes of popular ASEBDs. This study aims to fill this blind spot by providing a comparative picture of 12 of the most commonly used ASEBDs. In doing so, we build on and refine previous scientometric reSearch by counting query hit data as an indicator of the number of accessible records. Iterative query optimization makes it possible to identify a maximum number of hits for most ASEBDs. The results were validated in terms of their capacity to assess database size by comparing them with official information on database sizes or previous scientometric studies. The queries used here are replicable, so size information can be updated quickly. The findings provide first-time size estimates of ProQuest and EbscoHost and indicate that Google Scholar’s size might have been underestimated so far by more than 50%. By our estimation Google Scholar, with 389 million records, is currently the most comprehensive Academic Search engine.

Maarten De Rijke - One of the best experts on this subject based on the ideXlab platform.

  • Characterizing and predicting downloads in Academic Search
    Information Processing & Management, 2019
    Co-Authors: Maarten De Rijke
    Abstract:

    Abstract Numerous studies have been conducted on the information interaction behavior of Search engine users. Few studies have considered information interactions in the domain of Academic Search. We focus on conversion behavior in this domain. Conversions have been widely studied in the e-commerce domain, e.g., for online shopping and hotel booking, but little is known about conversions in Academic Search. We start with a description of a unique dataset of a particular type of conversion in Academic Search, viz. users’ downloads of scientific papers. Then we move to an observational analysis of users’ download actions. We first characterize user actions and show their statistics in sessions. Then we focus on behavioral and topical aspects of downloads, revealing behavioral correlations across download sessions. We discover unique properties that differ from other conversion settings such as online shopping. Using insights gained from these observations, we consider the task of predicting the next download. In particular, we focus on predicting the time until the next download session, and on predicting the number of downloads. We cast these as time series prediction problems and model them using LSTMs. We develop a specialized model built on user segmentations that achieves significant improvements over the state-of-the art.

  • do topic shift and query reformulation patterns correlate in Academic Search
    European Conference on Information Retrieval, 2017
    Co-Authors: Maarten De Rijke
    Abstract:

    While it is known that Academic Searchers differ from typical web Searchers, little is known about the Search behavior of Academic Searchers over longer periods of time. In this study we take a look at Academic Searchers through a large-scale log analysis on a major Academic Search engine. We focus on two aspects: query reformulation patterns and topic shifts in queries. We first analyze how each of these aspects evolve over time. We identify important query reformulation patterns: revisiting and issuing new queries tend to happen more often over time. We also find that there are two distinct types of users: one type of users becomes increasingly focused on the topics they Search for as time goes by, and the other becomes increasingly diversifying. After analyzing these two aspects separately, we investigate whether, and to which degree, there is a correlation between topic shifts and query reformulations. Surprisingly, users’ preferences of query reformulations correlate little with their topic shift tendency. However, certain reformulations may help predict the magnitude of the topic shift that happens in the immediate next timespan. Our results shed light on Academic Searchers’ information seeking behavior and may benefit Search personalization.

  • ECIR - Do Topic Shift and Query Reformulation Patterns Correlate in Academic Search
    Lecture Notes in Computer Science, 2017
    Co-Authors: Maarten De Rijke
    Abstract:

    While it is known that Academic Searchers differ from typical web Searchers, little is known about the Search behavior of Academic Searchers over longer periods of time. In this study we take a look at Academic Searchers through a large-scale log analysis on a major Academic Search engine. We focus on two aspects: query reformulation patterns and topic shifts in queries. We first analyze how each of these aspects evolve over time. We identify important query reformulation patterns: revisiting and issuing new queries tend to happen more often over time. We also find that there are two distinct types of users: one type of users becomes increasingly focused on the topics they Search for as time goes by, and the other becomes increasingly diversifying. After analyzing these two aspects separately, we investigate whether, and to which degree, there is a correlation between topic shifts and query reformulations. Surprisingly, users’ preferences of query reformulations correlate little with their topic shift tendency. However, certain reformulations may help predict the magnitude of the topic shift that happens in the immediate next timespan. Our results shed light on Academic Searchers’ information seeking behavior and may benefit Search personalization.

  • Investigating queries and Search failures in Academic Search
    Information Processing & Management, 2017
    Co-Authors: Bob J. A. Schijvenaars, Maarten De Rijke
    Abstract:

    Academic Search concerns the retrieval and profiling of information objects in the domain of Academic reSearch. In this paper we reveal important observations of Academic Search queries, and provide an algorithmic solution to address a type of failure during Search sessions: null queries. We start by providing a general characterization of Academic Search queries, by analyzing a large-scale transaction log of a leading Academic Search engine. Unlike previous small-scale analyses of Academic Search queries, we find important differences with query characteristics known from web Search. E.g., in Academic Search there is a substantially bigger proportion of entity queries, and a heavier tail in query length distribution. We then focus on Search failures and, in particular, on null queries that lead to an empty Search engine result page, on null sessions that contain such null queries, and on users who are prone to issue null queries. In Academic Search approximately 1 in 10 queries is a null query, and 25% of the sessions contain a null query. They appear in different types of Search sessions, and prevent users from achieving their Search goal. To address the high rate of null queries in Academic Search, we consider the task of providing query suggestions. Specifically we focus on a highly frequent query type: non-boolean informational queries. To this end we need to overcome query sparsity and make effective use of session information. We find that using entities helps to surface more relevant query suggestions in the face of query sparsity. We also find that query suggestions should be conditioned on the type of session in which they are offered to be more effective. After casting the session classification problem as a multi-label classification problem, we generate session-conditional query suggestions based on predicted session type. We find that this session-conditional method leads to significant improvements over a generic query suggestion method. Personalization yields very little further improvements over session-conditional query suggestions

Jose Luis Ortega - One of the best experts on this subject based on the ideXlab platform.

  • influence of co authorship networks in the reSearch impact ego network analyses from microsoft Academic Search
    Journal of Informetrics, 2014
    Co-Authors: Jose Luis Ortega
    Abstract:

    Abstract The main objective of this study is to analyze the relationship between reSearch impact and the structural properties of co-author networks. A new bibliographic source, Microsoft Academic Search, is introduced to test its suitability for bibliometric analyses. Citation counts and 500 one-step ego networks were extracted from this engine. Results show that tiny and sparse networks – characterized by a high Betweenness centrality and a high Average path length – achieved more citations per document than dense and compact networks – described by a high Clustering coefficient and a high Average degree. According to disciplinary differences, Mathematics , Social Sciences and Economics & Business are the disciplines with more sparse and tiny networks; while Physics , Engineering and Geosciences are characterized by dense and crowded networks. This suggests that in sparse ego networks, the central author have more control on their collaborators being more selective in their recruitment and concluding that this behaviour has positive implications in the reSearch impact.

  • microsoft Academic Search and google scholar citations comparative analysis of author profiles
    Association for Information Science and Technology, 2014
    Co-Authors: Jose Luis Ortega, Isidro F Aguillo
    Abstract:

    This article offers a comparative analysis of the personal profiling capabilities of the two most important free citation-based Academic Search engines, namely, Microsoft Academic Search MAS and Google Scholar Citations GSC. Author profiles can be useful for evaluation purposes once the advantages and the shortcomings of these services are described and taken into consideration. In total, 771 personal profiles appearing in both the MAS and the GSC databases were analyzed. Results show that the GSC profiles include more documents and citations than those in MAS but with a strong bias toward the information and computing sciences, whereas the MAS profiles are disciplinarily better balanced. MAS shows technical problems such as a higher number of duplicated profiles and a lower updating rate than GSC. It is concluded that both services could be used for evaluation proposes only if they are applied along with other citation indices as a way to supplement that information.

  • Academic Search engines a quantitative outlook
    2014
    Co-Authors: Jose Luis Ortega
    Abstract:

    Academic Search Engines: intends to run through the current panorama of the Academic Search engines through a quantitative approach that analyses the reliability and consistence of these services. The objective is to describe the main characteristics of these engines, to highlight their advantages and drawbacks, and to discuss the implications of these new products in the future of scientific communication and their impact on the reSearch measurement and evaluation. In short, Academic Search Engines presents a summary view of the new challenges that the Web set to the scientific activity through the most novel and innovative Searching services available on the Web. * This is the first approach to analyze Search engines exclusively addressed to the reSearch community in an integrative handbook. The novelty, expectation and usefulness of many of these services justify their analysis.* This book is not merely a description of the web functionalities of these services; it is a scientific review of the most outstanding characteristics of each platform, discussing their significance to the scholarly communication and reSearch evaluation.* This book introduces an original methodology based on a quantitative analysis of the covered data through the extensive use of crawlers and harvesters which allow going in depth into how these engines are working. Beside of this, a detailed descriptive review of their functionalities and a critical discussion about their use for scientific community is displayed.

  • other Academic Search engines
    Academic Search Engines#R##N#A Quantitative Outlook, 2014
    Co-Authors: Jose Luis Ortega
    Abstract:

    This chapter examines other important Academic Search engines which due to their functionality and sources do not have their own chapter. First, we look at BASE – a Search engine specialized in open sources and built on open protocols that allow it to harvest mainly institutional repositories and digital libraries. Next, Q-Sensei Scholar is examined because it is a promising filtering tool, although based on only five sources. Finally, WorldWideScience is studied as an example of a federated Search engine fuelled by bibliographic databases from scientific agencies all over the world; this model, however, produces a large amount of duplicated results and is very time consuming.

Vani Mandava - One of the best experts on this subject based on the ideXlab platform.

  • EDBT - ALIAS: Author Disambiguation in Microsoft Academic Search Engine Dataset
    2014
    Co-Authors: Michael Pitts, Swapna Savvana, Senjuti Basu Roy, Vani Mandava
    Abstract:

    We present a system called ALIAS, that is designed to Search for duplicate authors from Microsoft Academic Search Engine dataset. Author-ambiguity is a prevalent problem in this dataset, as many authors publish under several variations of their own name, or different authors share similar or same name. ALIAS takes an author name as an input (who may or may not exist in the corpus), and outputs a set of author names from the database, that are determined as duplicates of the input author. It also provides a confidence score with each output. Additionally, ALIAS has the feature of finding a Top-k list of similar authors, given an input author name. The underlying techniques heavily rely on a mix of learning, mining, and efficient Search techniques, including partitioning, clustering, supervised learning using ensemble algorithms, and indexing to perform efficient Search to enable fast response for near real time user interaction. While the system is designed using Academic Search Engine data, the proposed solution is generic and could be extended to other problems in the category of entity disambiguation. In this demonstration paper, we describe different components of ALIAS and the intelligent algorithms associated with each of these components to perform author name disambiguation or similar authors finding.

  • alias author disambiguation in microsoft Academic Search engine dataset
    Extending Database Technology, 2014
    Co-Authors: Michael Pitts, Swapna Savvana, Senjuti Basu Roy, Vani Mandava
    Abstract:

    We present a system called ALIAS, that is designed to Search for duplicate authors from Microsoft Academic Search Engine dataset. Author-ambiguity is a prevalent problem in this dataset, as many authors publish under several variations of their own name, or different authors share similar or same name. ALIAS takes an author name as an input (who may or may not exist in the corpus), and outputs a set of author names from the database, that are determined as duplicates of the input author. It also provides a confidence score with each output. Additionally, ALIAS has the feature of finding a Top-k list of similar authors, given an input author name. The underlying techniques heavily rely on a mix of learning, mining, and efficient Search techniques, including partitioning, clustering, supervised learning using ensemble algorithms, and indexing to perform efficient Search to enable fast response for near real time user interaction. While the system is designed using Academic Search Engine data, the proposed solution is generic and could be extended to other problems in the category of entity disambiguation. In this demonstration paper, we describe different components of ALIAS and the intelligent algorithms associated with each of these components to perform author name disambiguation or similar authors finding.

  • the microsoft Academic Search challenges at kdd cup 2013
    International Conference on Big Data, 2013
    Co-Authors: Martine De Cock, Swapna Savvana, Senjuti Basu Roy, Vani Mandava, Brian Dalessandro, Claudia Perlich, William Cukierski, Ben Hamner
    Abstract:

    Microsoft Academic Search is a free Search engine specific to scholarly material. It currently covers more than 50 million publications and over 19 million authors across a variety of domains. One of the main challenges in correctly indexing this material is author name ambiguity and the resulting noise in author profiles. KDD Cup 2013 invited participants to tackle this problem in 2 ways: (1) by automatically determining which papers in an author profile are truly written by a given author, and (2) by identifying which author profiles need to be merged because they belong to the same author. This paper presents a brief account of the contest and the lessons learned.

  • the microsoft Academic Search dataset and kdd cup 2013
    Knowledge Discovery and Data Mining, 2013
    Co-Authors: Senjuti Basu Roy, Vani Mandava, Martine De Cock, Brian Dalessandro, Claudia Perlich, William Cukierski, Swapna Savanna, Ben Hamner
    Abstract:

    KDD Cup 2013 challenged participants to tackle the problem of author name ambiguity in a digital library of scientific publications. The competition consisted of two tracks, which were based on large-scale datasets from a snapshot of Microsoft Academic Search, taken in January 2013 and including 250K authors and 2.5M papers. Participants were asked to determine which papers in an author profile are truly written by a given author (track 1), as well as to identify duplicate author profiles (track 2). Track 1 and track 2 were launched respectively on April 18 and April 20, 2013, with a common final submission deadline on June 12, 2013. For track 1 a training dataset with correct labels was diclosed at the start of the competition. This track was the most popular one, attracting submissions of 561 different teams. Track 2, which was formulated as an unsupervised learning task, received submissions from 241 participants. This paper presents details about the problem definitions, the datasets, the evaluation metrics and the results.

  • KDD Cup - The Microsoft Academic Search dataset and KDD Cup 2013
    Proceedings of the 2013 KDD Cup 2013 Workshop on - KDD Cup '13, 2013
    Co-Authors: Senjuti Basu Roy, Vani Mandava, Martine De Cock, Brian Dalessandro, Claudia Perlich, William Cukierski, Swapna Savanna, Ben Hamner
    Abstract:

    KDD Cup 2013 challenged participants to tackle the problem of author name ambiguity in a digital library of scientific publications. The competition consisted of two tracks, which were based on large-scale datasets from a snapshot of Microsoft Academic Search, taken in January 2013 and including 250K authors and 2.5M papers. Participants were asked to determine which papers in an author profile are truly written by a given author (track 1), as well as to identify duplicate author profiles (track 2). Track 1 and track 2 were launched respectively on April 18 and April 20, 2013, with a common final submission deadline on June 12, 2013. For track 1 a training dataset with correct labels was diclosed at the start of the competition. This track was the most popular one, attracting submissions of 561 different teams. Track 2, which was formulated as an unsupervised learning task, received submissions from 241 participants. This paper presents details about the problem definitions, the datasets, the evaluation metrics and the results.