Transaction Log

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 3645 Experts worldwide ranked by ideXlab platform

Bernard J. Jansen - One of the best experts on this subject based on the ideXlab platform.

  • Time series analysis of a Web search engine Transaction Log
    Information Processing and Management, 2009
    Co-Authors: Ying Zhang, Bernard J. Jansen, Amanda Spink
    Abstract:

    In this paper, we use time series analysis to evaluate predictive scenarios using search engine Transactional Logs. Our goal is to develop models for the analysis of searchers' behaviors over time and investigate if time series analysis is a valid method for predicting relationships between searcher actions. Time series analysis is a method often used to understand the underlying characteristics of temporal data in order to make forecasts. In this study, we used a Web search engine Transactional Log and time series analysis to investigate users' actions. We conducted our analysis in two phases. In the initial phase, we employed a basic analysis and found that 10% of searchers clicked on sponsored links. However, from 22:00 to 24:00, searchers almost exclusively clicked on the organic links, with almost no clicks on sponsored links. In the second and more extensive phase, we used a one-step prediction time series analysis method along with a transfer function method. The period rarely affects navigational and Transactional queries, while rates for Transactional queries vary during different periods. Our results show that the average length of a searcher session is approximately 2.9 interactions and that this average is consistent across time periods. Most importantly, our findings shows that searchers who submit the shortest queries (i.e., in number of terms) click on highest ranked results. We discuss implications, including predictive value, and future research.

  • research and methodoLogical foundations of Transaction Log analysis
    Handbook of Research on Web Log Analysis, 2008
    Co-Authors: Bernard J. Jansen, Isak Taksa, Amanda Spink
    Abstract:

    This chapter outlines and discusses theoretical and methodoLogical foundations for Transaction Log analysis. We first address the fundamentals of Transaction Log analysis from a research viewpoint and the concept of Transaction Logs as a data collection technique from the perspective of behaviorism. From this research foundation, we move to the methodoLogical aspects of Transaction Log analysis and examine the strengths and limitation of Transaction Logs as trace data. We then review the conceptualization of Transaction Log analysis as an unobtrusive approach to research, and present the power and deficiency of the unobtrusive methodoLogical concept, including benefits and risks of Transaction Log analysis specifically from the perspective of an unobtrusive method. Some of the ethical questions concerning the collection of data via Transaction Log application are discussed.

  • determining the user intent of web search engine queries
    The Web Conference, 2007
    Co-Authors: Bernard J. Jansen, Danielle L Booth, Amanda Spink
    Abstract:

    Determining the user intent of Web searches is a difficult problem due to the sparse data available concerning the searcher. In this paper, we examine a method to determine the user intent underlying Web search engine queries. We qualitatively analyze samples of queries from seven Transaction Logs from three different Web search engines containing more than five million queries. From this analysis, we identified characteristics of user queries based on three broad classifications of user intent. The classifications of informational, navigational, and Transactional represent the type of content destination the searcher desired as expressed by their query. We implemented our classification algorithm and automatically classified a separate Web search engine Transaction Log of over a million queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and Transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the classification to the results from our algorithm. This comparison showed that our automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is generally vague or multi-faceted, pointing to the need to for probabilistic classification. We illustrate how knowledge of searcher intent might be used to enhance future Web search engines.

  • preserving the collective expressions of the human consciousness
    2007
    Co-Authors: Bernard J. Jansen
    Abstract:

    search engines use Transaction Log files to record a copious number of interactions that occur between the user, the Web search engine, and Web content. Search engine companies use these records of interactions to improve system design and online marketing. In order to address privacy concerns, some question whether it is wise for search engine companies to preserve these query Logs. However, not preserving the query Logs from Web search engines would be (and is) a critical loss of a temporal record of the expression of the collective human consciousness. In this paper, an outline of an action plan to preserve these records is proposed to generate discussion of such a course of action.

  • web searching on the vivisimo search engine
    Journal of the Association for Information Science and Technology, 2006
    Co-Authors: Sherry Koshman, Amanda Spink, Bernard J. Jansen
    Abstract:

    The application of clustering to Web search engine technoLogy is a novel approach that offers structure to the information deluge often faced by Web searchers. Clustering methods have been well studied in research labs; however, real user searching with clustering systems in operational Web environments is not well understood. This article reports on results from a Transaction Log analysis of Vivisimo.com, which is a Web meta-search engine that dynamically clusters users' search results. A Transaction Log analysis was conducted on 2-week's worth of data collected from March 28 to April 4 and April 25 to May 2, 2004, representing 100p of site traffic during these periods and 2,029,734 queries overall. The results show that the highest percentage of queries contained two terms. The highest percentage of search sessions contained one query and was less than 1 minute in duration. Almost half of user interactions with clusters consisted of displaying a cluster's result set, and a small percentage of interactions showed cluster tree expansion. Findings show that 11.1p of search sessions were multitasking searches, and there are a broad variety of search topics in multitasking search sessions. Other searching interactions and statistics on repeat users of the search engine are reported. These results provide insights into search characteristics with a cluster-based Web search engine and extend research into Web searching trends. © 2006 Wiley Periodicals, Inc.

Amanda Spink - One of the best experts on this subject based on the ideXlab platform.

  • Time series analysis of a Web search engine Transaction Log
    Information Processing and Management, 2009
    Co-Authors: Ying Zhang, Bernard J. Jansen, Amanda Spink
    Abstract:

    In this paper, we use time series analysis to evaluate predictive scenarios using search engine Transactional Logs. Our goal is to develop models for the analysis of searchers' behaviors over time and investigate if time series analysis is a valid method for predicting relationships between searcher actions. Time series analysis is a method often used to understand the underlying characteristics of temporal data in order to make forecasts. In this study, we used a Web search engine Transactional Log and time series analysis to investigate users' actions. We conducted our analysis in two phases. In the initial phase, we employed a basic analysis and found that 10% of searchers clicked on sponsored links. However, from 22:00 to 24:00, searchers almost exclusively clicked on the organic links, with almost no clicks on sponsored links. In the second and more extensive phase, we used a one-step prediction time series analysis method along with a transfer function method. The period rarely affects navigational and Transactional queries, while rates for Transactional queries vary during different periods. Our results show that the average length of a searcher session is approximately 2.9 interactions and that this average is consistent across time periods. Most importantly, our findings shows that searchers who submit the shortest queries (i.e., in number of terms) click on highest ranked results. We discuss implications, including predictive value, and future research.

  • research and methodoLogical foundations of Transaction Log analysis
    Handbook of Research on Web Log Analysis, 2008
    Co-Authors: Bernard J. Jansen, Isak Taksa, Amanda Spink
    Abstract:

    This chapter outlines and discusses theoretical and methodoLogical foundations for Transaction Log analysis. We first address the fundamentals of Transaction Log analysis from a research viewpoint and the concept of Transaction Logs as a data collection technique from the perspective of behaviorism. From this research foundation, we move to the methodoLogical aspects of Transaction Log analysis and examine the strengths and limitation of Transaction Logs as trace data. We then review the conceptualization of Transaction Log analysis as an unobtrusive approach to research, and present the power and deficiency of the unobtrusive methodoLogical concept, including benefits and risks of Transaction Log analysis specifically from the perspective of an unobtrusive method. Some of the ethical questions concerning the collection of data via Transaction Log application are discussed.

  • determining the user intent of web search engine queries
    The Web Conference, 2007
    Co-Authors: Bernard J. Jansen, Danielle L Booth, Amanda Spink
    Abstract:

    Determining the user intent of Web searches is a difficult problem due to the sparse data available concerning the searcher. In this paper, we examine a method to determine the user intent underlying Web search engine queries. We qualitatively analyze samples of queries from seven Transaction Logs from three different Web search engines containing more than five million queries. From this analysis, we identified characteristics of user queries based on three broad classifications of user intent. The classifications of informational, navigational, and Transactional represent the type of content destination the searcher desired as expressed by their query. We implemented our classification algorithm and automatically classified a separate Web search engine Transaction Log of over a million queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and Transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the classification to the results from our algorithm. This comparison showed that our automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is generally vague or multi-faceted, pointing to the need to for probabilistic classification. We illustrate how knowledge of searcher intent might be used to enhance future Web search engines.

  • web searching on the vivisimo search engine
    Journal of the Association for Information Science and Technology, 2006
    Co-Authors: Sherry Koshman, Amanda Spink, Bernard J. Jansen
    Abstract:

    The application of clustering to Web search engine technoLogy is a novel approach that offers structure to the information deluge often faced by Web searchers. Clustering methods have been well studied in research labs; however, real user searching with clustering systems in operational Web environments is not well understood. This article reports on results from a Transaction Log analysis of Vivisimo.com, which is a Web meta-search engine that dynamically clusters users' search results. A Transaction Log analysis was conducted on 2-week's worth of data collected from March 28 to April 4 and April 25 to May 2, 2004, representing 100p of site traffic during these periods and 2,029,734 queries overall. The results show that the highest percentage of queries contained two terms. The highest percentage of search sessions contained one query and was less than 1 minute in duration. Almost half of user interactions with clusters consisted of displaying a cluster's result set, and a small percentage of interactions showed cluster tree expansion. Findings show that 11.1p of search sessions were multitasking searches, and there are a broad variety of search topics in multitasking search sessions. Other searching interactions and statistics on repeat users of the search engine are reported. These results provide insights into search characteristics with a cluster-based Web search engine and extend research into Web searching trends. © 2006 Wiley Periodicals, Inc.

  • a temporal comparison of altavista web searching
    Journal of the Association for Information Science and Technology, 2005
    Co-Authors: Bernard J. Jansen, Amanda Spink, Jan O Pedersen
    Abstract:

    Major Web search engines, such as AltaVista, are essential tools in the quest to locate online information. This article reports research that used Transaction Log analysis to examine the characteristics and changes in AltaVista Web searching that occurred from 1998 to 2002. The research questions we examined are (1) What are the changes in AltaVista Web searching from 1998 to 2002? (2) What are the current characteristics of AltaVista searching, including the duration and frequency of search sessions? (3) What changes in the information needs of AltaVista users occurred between 1998 and 2002? The results of our research show (1) a move toward more interactivity with increases in session and query length, (2) with 70% of session durations at 5 minutes or less, the frequency of interaction is increasing, but it is happening very quickly, and (3) a broadening range of Web searchers' information needs, with the most frequent terms accounting for less than 1% of total term usage. We discuss the implications of these findings for the development of Web search engines.

Jan O Pedersen - One of the best experts on this subject based on the ideXlab platform.

  • a temporal comparison of altavista web searching
    Journal of the Association for Information Science and Technology, 2005
    Co-Authors: Bernard J. Jansen, Amanda Spink, Jan O Pedersen
    Abstract:

    Major Web search engines, such as AltaVista, are essential tools in the quest to locate online information. This article reports research that used Transaction Log analysis to examine the characteristics and changes in AltaVista Web searching that occurred from 1998 to 2002. The research questions we examined are (1) What are the changes in AltaVista Web searching from 1998 to 2002? (2) What are the current characteristics of AltaVista searching, including the duration and frequency of search sessions? (3) What changes in the information needs of AltaVista users occurred between 1998 and 2002? The results of our research show (1) a move toward more interactivity with increases in session and query length, (2) with 70% of session durations at 5 minutes or less, the frequency of interaction is increasing, but it is happening very quickly, and (3) a broadening range of Web searchers' information needs, with the most frequent terms accounting for less than 1% of total term usage. We discuss the implications of these findings for the development of Web search engines.

  • a temporal comparison of altavista web searching research articles
    Journal of the Association for Information Science and Technology, 2005
    Co-Authors: Bernard J. Jansen, Amanda Spink, Jan O Pedersen
    Abstract:

    Major Web search engines, such as AltaVista, are essential tools in the quest to locate online information. This article reports research that used Transaction Log analysis to examine the characteristics and changes in AltaVista Web searching that occurred from 1998 to 2002. The research questions we examined are (1) What are the changes in AltaVista Web searching from 1998 to 2002? (2) What are the current characteristics of AltaVista searching, including the duration and frequency of search sessions? (3) What changes in the information needs of AltaVista users occurred between 1998 and 2002? The results of our research show (1) a move toward more interactivity with increases in session and query length, (2) with 70p of session durations at 5 minutes or less, the frequency of interaction is increasing, but it is happening very quickly, and (3) a broadening range of Web searchers' information needs, with the most frequent terms accounting for less than 1p of total term usage. We discuss the implications of these findings for the development of Web search engines. © 2005 Wiley Periodicals, Inc.

David Nicholas - One of the best experts on this subject based on the ideXlab platform.

  • characterising and evaluating information seeking behaviour in a digital environment spotlight on the bouncer
    Information Processing and Management, 2007
    Co-Authors: David Nicholas, Paul Huntington, Hamid R Jamali, Tom Dobrowolski
    Abstract:

    The paper delineates and explains an emerging, but significant, form of digital information seeking behaviour among information consumers, which the authors have called 'bouncing'. The evidence for this behaviour has emerged from five years of deep Log analysis studies - an advanced form of Transaction Log analysis - of a wide range of users of digital information resources. Much of the evidence and discussion provided comes from the scholarly communication field. Two main bouncing metrics were applied in the Log studies: site penetration, which is the number of items or pages viewed in a session, and return visits. The evidence shows that (1) a high proportion of people view just a few items or pages during a visit to a site and, (2) a high proportion of visitors either do not come back to the site or they did so infrequently. Typically those who penetrated a site least tended to return the least frequently. These people are termed 'bouncers'. They bounce into the site and then bounce out again, presumably, to another site, as a high proportion of them do not appear to come back again. Possible explanations - negative and positive, for the form of behaviour are discussed.

  • the information seeking behaviour of the users of digital scholarly journals
    Information Processing and Management, 2006
    Co-Authors: David Nicholas, Paul Huntington, Hamid R Jamali, Anthony Watkinson
    Abstract:

    The article employs deep Log analysis (DLA) techniques, a more sophisticated form of Transaction Log analysis, to demonstrate what usage data can disclose about information seeking behaviour of virtual scholars - academics, and researchers. DLA works with the raw server Log data, not the processed, pre-defined and selective data provided by journal publishers. It can generate types of analysis that are not generally available via proprietary web Logging software because the software filters out relevant data and makes unhelpful assumptions about the meaning of the data. DLA also enables usage data to be associated with search/navigational and/or user demographic data, hence the name 'deep'. In this connection the usage of two digital journal libraries, those of Emerald Insight, and Blackwell Synergy are investigated. The information seeking behaviour of nearly three million users is analyzed in respect to the extent to which they penetrate the site, the number of visits made, as well as the type of items and content they view. The users are broken down by occupation, place of work, type of subscriber ("Big Deal", non-subscriber, etc.), geographical location, type of university (old and new), referrer link used, and number of items viewed in a session.

  • Digital visibility and its impact upon online usage: Case study of a health Web site
    Libri, 2004
    Co-Authors: Paul Huntington, David Nicholas, Dominic Warren
    Abstract:

    Digital visibility, a term coined by the Ciber research team at UCL, argues that use/consumption in the digital environment is not simply a function of need; it is also a function of visibility or prominence. It is a concept that describes and explains the impact of menu and topic prominence on Transaction Log (usage) statistics. This study repeats Ciber’s digital visibility study, which was concerned with a Digital Interactive Television (DiTV) health information service, but this time the subject is a UK consumer health Web site, MedicDirect, a site attracting around a thousand users and 10,000 page views a day. Specifically the paper examines the change in use that results from increasing the prominence of two health pages – the cancer menu and cardiopulmonary resuscitation page on the Web site. The study was conducted over a three-month period when the links to the two health topics were placed on the home page and then moved and changed three times. Transactional Log analysis was used to monitor the changes that occurred, the key metrics employed being number of users, number of page views, number of pages viewed in a session and number of visits made. It was found that, as predicted on the basis of our earlier results, use did indeed increase as a result of improving topic visibility. The results add further support to the findings of the initial digital visibility study.

  • Assessing used content across five digital health information services using Transaction Log files
    Journal of Information Science, 2003
    Co-Authors: David Nicholas, Paul Huntington, Janet Homewood
    Abstract:

    A digital service, like a web site, may contain a lot of information but we often do not know if it is used, relevant or valuable. Transaction Log files generated by digital information services do record the pages (topics or content) viewed by users and this is perhaps the most interesting aspect of the Logs. However, analysing these pages poses plenty of problems for researchers, especially when comparing content coverage of various related services. It is quite normal, even for digital services of the same organization, to adopt different page naming conventions for each service. This is even truer about digital services run by different organizations. What all this means is that there is no easy way to compare topic use as revealed by access behaviour. This paper looks at the problems of describing and comparing the content usage of digital information services, covering three digital platforms operating in the health field. This paper discusses problems posed in making health content comparisons base...

  • micro mining and segmented Log file analysis a method for enriching the data yield from internet Log files
    Journal of Information Science, 2003
    Co-Authors: David Nicholas, Paul Huntington
    Abstract:

    The authors propose improved ways of analysing web server Log files. Traditionally web site statistics focus on giving a big (and shallow) picture analysis based on all Transaction Log entries. The pictures are, however, distorted because of the problems associated with resolving Internet protocol (IP) numbers to a single user and cross-border IP registration. The authors argue that analysing extracted sub-groups and categories presents a more accurate picture of the data and that the analysis of the online behaviour of selected individuals (rather than of very large groups) can add much to our understanding of how people use web sites and, indeed, any digital information source. The analysis is labelled `micro' to distinguish it from traditional macro, big picture Transactional Log analysis. The methods are illustrated with recourse to the Logs of the Surgery Door (www.surgerydoor.co.uk) consumer health web site. It was found that use attributed to academic users gave a better approximation of the sites'...

Paul Huntington - One of the best experts on this subject based on the ideXlab platform.

  • characterising and evaluating information seeking behaviour in a digital environment spotlight on the bouncer
    Information Processing and Management, 2007
    Co-Authors: David Nicholas, Paul Huntington, Hamid R Jamali, Tom Dobrowolski
    Abstract:

    The paper delineates and explains an emerging, but significant, form of digital information seeking behaviour among information consumers, which the authors have called 'bouncing'. The evidence for this behaviour has emerged from five years of deep Log analysis studies - an advanced form of Transaction Log analysis - of a wide range of users of digital information resources. Much of the evidence and discussion provided comes from the scholarly communication field. Two main bouncing metrics were applied in the Log studies: site penetration, which is the number of items or pages viewed in a session, and return visits. The evidence shows that (1) a high proportion of people view just a few items or pages during a visit to a site and, (2) a high proportion of visitors either do not come back to the site or they did so infrequently. Typically those who penetrated a site least tended to return the least frequently. These people are termed 'bouncers'. They bounce into the site and then bounce out again, presumably, to another site, as a high proportion of them do not appear to come back again. Possible explanations - negative and positive, for the form of behaviour are discussed.

  • the information seeking behaviour of the users of digital scholarly journals
    Information Processing and Management, 2006
    Co-Authors: David Nicholas, Paul Huntington, Hamid R Jamali, Anthony Watkinson
    Abstract:

    The article employs deep Log analysis (DLA) techniques, a more sophisticated form of Transaction Log analysis, to demonstrate what usage data can disclose about information seeking behaviour of virtual scholars - academics, and researchers. DLA works with the raw server Log data, not the processed, pre-defined and selective data provided by journal publishers. It can generate types of analysis that are not generally available via proprietary web Logging software because the software filters out relevant data and makes unhelpful assumptions about the meaning of the data. DLA also enables usage data to be associated with search/navigational and/or user demographic data, hence the name 'deep'. In this connection the usage of two digital journal libraries, those of Emerald Insight, and Blackwell Synergy are investigated. The information seeking behaviour of nearly three million users is analyzed in respect to the extent to which they penetrate the site, the number of visits made, as well as the type of items and content they view. The users are broken down by occupation, place of work, type of subscriber ("Big Deal", non-subscriber, etc.), geographical location, type of university (old and new), referrer link used, and number of items viewed in a session.

  • Digital visibility and its impact upon online usage: Case study of a health Web site
    Libri, 2004
    Co-Authors: Paul Huntington, David Nicholas, Dominic Warren
    Abstract:

    Digital visibility, a term coined by the Ciber research team at UCL, argues that use/consumption in the digital environment is not simply a function of need; it is also a function of visibility or prominence. It is a concept that describes and explains the impact of menu and topic prominence on Transaction Log (usage) statistics. This study repeats Ciber’s digital visibility study, which was concerned with a Digital Interactive Television (DiTV) health information service, but this time the subject is a UK consumer health Web site, MedicDirect, a site attracting around a thousand users and 10,000 page views a day. Specifically the paper examines the change in use that results from increasing the prominence of two health pages – the cancer menu and cardiopulmonary resuscitation page on the Web site. The study was conducted over a three-month period when the links to the two health topics were placed on the home page and then moved and changed three times. Transactional Log analysis was used to monitor the changes that occurred, the key metrics employed being number of users, number of page views, number of pages viewed in a session and number of visits made. It was found that, as predicted on the basis of our earlier results, use did indeed increase as a result of improving topic visibility. The results add further support to the findings of the initial digital visibility study.

  • Assessing used content across five digital health information services using Transaction Log files
    Journal of Information Science, 2003
    Co-Authors: David Nicholas, Paul Huntington, Janet Homewood
    Abstract:

    A digital service, like a web site, may contain a lot of information but we often do not know if it is used, relevant or valuable. Transaction Log files generated by digital information services do record the pages (topics or content) viewed by users and this is perhaps the most interesting aspect of the Logs. However, analysing these pages poses plenty of problems for researchers, especially when comparing content coverage of various related services. It is quite normal, even for digital services of the same organization, to adopt different page naming conventions for each service. This is even truer about digital services run by different organizations. What all this means is that there is no easy way to compare topic use as revealed by access behaviour. This paper looks at the problems of describing and comparing the content usage of digital information services, covering three digital platforms operating in the health field. This paper discusses problems posed in making health content comparisons base...

  • micro mining and segmented Log file analysis a method for enriching the data yield from internet Log files
    Journal of Information Science, 2003
    Co-Authors: David Nicholas, Paul Huntington
    Abstract:

    The authors propose improved ways of analysing web server Log files. Traditionally web site statistics focus on giving a big (and shallow) picture analysis based on all Transaction Log entries. The pictures are, however, distorted because of the problems associated with resolving Internet protocol (IP) numbers to a single user and cross-border IP registration. The authors argue that analysing extracted sub-groups and categories presents a more accurate picture of the data and that the analysis of the online behaviour of selected individuals (rather than of very large groups) can add much to our understanding of how people use web sites and, indeed, any digital information source. The analysis is labelled `micro' to distinguish it from traditional macro, big picture Transactional Log analysis. The methods are illustrated with recourse to the Logs of the Surgery Door (www.surgerydoor.co.uk) consumer health web site. It was found that use attributed to academic users gave a better approximation of the sites'...