Session Identification

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 10509 Experts worldwide ranked by ideXlab platform

Michal Munk - One of the best experts on this subject based on the ideXlab platform.

  • User Identification in the Process of Web Usage Data Preprocessing
    International Journal of Emerging Technologies in Learning (ijet), 2019
    Co-Authors: Jozef Kapusta, Michal Munk, Dominik Halvoník, Martin Drlik
    Abstract:

    If we are talking about user behavior analytics, we have to understand what the main source of valuable information is. One of these sources is definitely a web server. There are multiple places where we can extract the necessary data. The most common ways are to search for these data in access log, error log, custom log files of web server, proxy server log file, web browser log, browser cookies etc. A web server log is in its default form known as a Common Log File (W3C, 1995) and keeps information about IP address; date and time of visit; ac-cessed and referenced resource. There are standardized methodologies which contain several steps leading to extract new knowledge from provided data. Usu-ally, the first step is in each one of them to identify users, users’ Sessions, page views, and clickstreams. This process is called pre-processing. Main goal of this stage is to receive unprocessed web server log file as input and after processing outputs meaningful representations which can be used in next phase. In this pa-per, we describe in detail user Session Identification which can be considered as most important part of data pre-processing. Our paper aims to compare the us-er/Session Identification using the STT with the Identification of user/Session us-ing cookies. This comparison was performed concerning the quality of the se-quential rules generated, i.e., a comparison was made regarding generation useful, trivial and inexplicable rules.

  • Unconventional Usage of Entropy in the Field of Web Usage Data Preprocessing and Machine Translation Evaluation
    Lecture Notes in Electrical Engineering, 2017
    Co-Authors: Michal Munk, Ľubomír Benko
    Abstract:

    This paper focuses on an unconventional usage of entropy. On one side it deals with preprocessing phase, especially the Session Identification using the Reference Length method. Entropy, in this case, offers an alternative to determining the ratio of auxiliary pages that is important for this method. With the approach introduced in this paper, the need of a sitemap becomes void. On the other hand, the paper looks at entropy in the case of reliability analysis of Machine Translation metrics. In this case, entropy offers also an alternative mean to validate the metrics.

  • Quality of Extracted Sequential Rules by Session Identification Using STT and Cookies
    2017 European Conference on Electrical Engineering and Computer Science (EECS), 2017
    Co-Authors: Jozef Kapusta, Michal Munk, Dominik Halvoník
    Abstract:

    The data source of web usage mining is a web server access log file. This paper is focus on the user Session Identification that is the important step of data pre-processing. There are two useful methods to Identification user and Session: Session Time Thresholds (STT) and cookies Identification. We identified user/Session with cookies and STT methods and create different data pre-processing files. We will compare the user/Session Identification according to following levels of data pre-processing. By comparing these methods, we focused on the quality of extracted sequential rules. Sequence rules are an important output of the web usage mining process. Generated sequential rules from pre-processed files are classified into one of the following groups: useful, trivial, inexplicable. The aim of our paper is to compare the user/Session Identification using cookies with the Identification of user/Session using the STT. This comparison was performed with respect to the quality of the sequential rules generated, i.e., a comparison was made in terms of generation useful, trivial and inexplicable rules.

  • Quantitative and Qualitative Evaluation of Sequence Patterns Found by Application of Different Educational Data Preprocessing Techniques
    IEEE Access, 2017
    Co-Authors: Michal Munk, Martin Drlik, Lubomir Benko, Jaroslav Reichel
    Abstract:

    Educational data preprocessing from log files represents a time-consuming phase of the knowledge discovery process. It consists of data cleaning, user Identification, Session Identification, and path completion phase. This paper attempts to identify phases, which are necessary in the case of preprocessing of educational data for further application of learning analytics methods. Since the sequential patterns analysis is considered suitable for estimating of discovered knowledge, this paper tries answering the question, which of these preprocessing phases has a significant impact on discovered knowledge in general, as well as in the meaning of quality and quantity of found sequence patterns. Therefore, several data preprocessing techniques for Session Identification and path completion were applied to prepare log files with different levels of data preprocessing. The results showed that the Session Identification technique using the reference length, calculated from the sitemap, had a significant impact on the quality of extracted sequence rules. The path completion technique had a significant impact only on the quantity of extracted sequence rules. The found results together with the results of the previous systematic research in educational data preprocessing can improve the automation of the educational data preprocessing phase as well as it can contribute to the development of learning analytics tools suitable for different groups of stakeholders engaged in the educational data mining research activities.

  • Improving the Session Identification Using the Ratio of Auxiliary Pages Estimate
    Lecture Notes in Electrical Engineering, 2016
    Co-Authors: Michal Munk, Ľubomír Benko
    Abstract:

    Data pre-processing is an important part of web log mining. This paper focuses on one of the phases of data pre-processing—on Session Identification. Cutoff time is an important part of the Session Identification using the Reference Length method. The aim of this paper is to compare the influence of subjective and sitemap estimation of the auxiliary pages ratio on calculation of cutoff time. Based on the sitemap and subjective estimation the calculation of auxiliary pages ratio was compared. The ratio of auxiliary pages has only impact on the quantity of extracted rules in the files with path completion.

Paul A. Crook - One of the best experts on this subject based on the ideXlab platform.

  • CIKM - Impact of Domain and User's Learning Phase on Task and Session Identification in Smart Speaker Intelligent Assistants
    Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018
    Co-Authors: Seyyed Hadi Hashemi, Kyle Williams, Ahmed El Kholy, Imed Zitouni, Paul A. Crook
    Abstract:

    Task and Session Identification is a key element of system evaluation and user behavior modeling in Intelligent Assistant (IA) systems. However, identifying task and Sessions for IAs is challenging due to the multi-task nature of IAs and the differences in the ways they are used on different platforms, such as smart-phones, cars, and smart speakers. Furthermore, usage behavior may differ among users depending on their expertise with the system and the tasks they are interested in performing. In this study, we investigate how to identify tasks and Sessions in IAs given these differences. To do this, we analyze data based on the interaction logs of two IAs integrated with smart-speakers. We fit Gaussian Mixture Models to estimate task and Session boundaries and show how a model with 3 components models user interactivity time better than a model with 2 components. We then show how Session boundaries differ for users depending on whether they are in a learning-phase or not. Finally, we study how user inter-activity times differs depending on the task that the user is trying to perform. Our findings show that there is no single task or Session boundary that can be used for IA evaluation. Instead, these boundaries are influenced by the experience of the user and the task they are trying to perform. Our findings have implications for the study and evaluation of Intelligent Agent Systems.

  • impact of domain and user s learning phase on task and Session Identification in smart speaker intelligent assistants
    Conference on Information and Knowledge Management, 2018
    Co-Authors: Seyyed Hadi Hashemi, Kyle Williams, Ahmed El Kholy, Imed Zitouni, Paul A. Crook
    Abstract:

    Task and Session Identification is a key element of system evaluation and user behavior modeling in Intelligent Assistant (IA) systems. However, identifying task and Sessions for IAs is challenging due to the multi-task nature of IAs and the differences in the ways they are used on different platforms, such as smart-phones, cars, and smart speakers. Furthermore, usage behavior may differ among users depending on their expertise with the system and the tasks they are interested in performing. In this study, we investigate how to identify tasks and Sessions in IAs given these differences. To do this, we analyze data based on the interaction logs of two IAs integrated with smart-speakers. We fit Gaussian Mixture Models to estimate task and Session boundaries and show how a model with 3 components models user interactivity time better than a model with 2 components. We then show how Session boundaries differ for users depending on whether they are in a learning-phase or not. Finally, we study how user inter-activity times differs depending on the task that the user is trying to perform. Our findings show that there is no single task or Session boundary that can be used for IA evaluation. Instead, these boundaries are influenced by the experience of the user and the task they are trying to perform. Our findings have implications for the study and evaluation of Intelligent Agent Systems.

Zhang Changshui - One of the best experts on this subject based on the ideXlab platform.

  • Session Identification based on time intervals in Web log mining
    Journal of Tsinghua University, 2005
    Co-Authors: Zhang Changshui
    Abstract:

    This paper presents a method for Session Identification based on an analysis of intervals of user access logs. This method separates the access logs into distinct Sessions at points where the access intervals exceed some threshold. The threshold for a specific IP is defined by the statistic of its frequency vectors. Tests show that the frequency vectors of proxy IPs and single user IPs are different. For a proxy IP, the frequency vector often shows a power-law distribution, however for a single user IP, it approximates a Gauss distribution. A method based on the Gauss hypothesis was proposed for computing different thresholds for each single user IP. Compare to the traditional approach that experimentially defines a uniform threshold for all IP addresses, the method presented is more reasonable and effective.

  • Session Identification based on time interval in web log mining
    International Conference on Intelligent Information Processing, 2004
    Co-Authors: Zhuang Like, Kou Zhongbao, Zhang Changshui
    Abstract:

    In this paper, we calculate the time intervals of page views, and analyze the time intervals to obtain a certain threshold, which is then used to break the web logs into Sessions. Based on the time intervals, frequencies for each interval are counted and frequency vectors are obtained for each IP. Some IPs with special features of frequency distributions can be deemed as single users. For these IPs, we can define threshold for each individual IP, and separate Sessions at the points of long access time intervals.

  • Intelligent Information Processing - Session Identification based on time interval in web log mining
    Intelligent Information Processing II, 1
    Co-Authors: Zhuang Like, Kou Zhongbao, Zhang Changshui
    Abstract:

    In this paper, we calculate the time intervals of page views, and analyze the time intervals to obtain a certain threshold, which is then used to break the web logs into Sessions. Based on the time intervals, frequencies for each interval are counted and frequency vectors are obtained for each IP. Some IPs with special features of frequency distributions can be deemed as single users. For these IPs, we can define threshold for each individual IP, and separate Sessions at the points of long access time intervals.

Zidrina Pabarskaiteaistis Raudys - One of the best experts on this subject based on the ideXlab platform.

  • A process of knowledge discovery from web log data: Systematization and critical review
    Journal of Intelligent Information Systems, 2007
    Co-Authors: Zidrina Pabarskaiteaistis Raudys
    Abstract:

    This paper presents a comprehensive survey of web log/usage mining based on over 100 research papers. This is the first survey dedicated exclusively to web log/usage mining. The paper identifies several web log mining sub-topics including specific ones such as data cleaning, user and Session Identification. Each sub-topic is explained, weaknesses and strong points are discussed and possible solutions are presented. The paper describes examples of web log mining and lists some major web log mining software packages. [PUBLICATION ABSTRACT]

Jozef Kapusta - One of the best experts on this subject based on the ideXlab platform.

  • User Identification in the Process of Web Usage Data Preprocessing
    International Journal of Emerging Technologies in Learning (ijet), 2019
    Co-Authors: Jozef Kapusta, Michal Munk, Dominik Halvoník, Martin Drlik
    Abstract:

    If we are talking about user behavior analytics, we have to understand what the main source of valuable information is. One of these sources is definitely a web server. There are multiple places where we can extract the necessary data. The most common ways are to search for these data in access log, error log, custom log files of web server, proxy server log file, web browser log, browser cookies etc. A web server log is in its default form known as a Common Log File (W3C, 1995) and keeps information about IP address; date and time of visit; ac-cessed and referenced resource. There are standardized methodologies which contain several steps leading to extract new knowledge from provided data. Usu-ally, the first step is in each one of them to identify users, users’ Sessions, page views, and clickstreams. This process is called pre-processing. Main goal of this stage is to receive unprocessed web server log file as input and after processing outputs meaningful representations which can be used in next phase. In this pa-per, we describe in detail user Session Identification which can be considered as most important part of data pre-processing. Our paper aims to compare the us-er/Session Identification using the STT with the Identification of user/Session us-ing cookies. This comparison was performed concerning the quality of the se-quential rules generated, i.e., a comparison was made regarding generation useful, trivial and inexplicable rules.

  • Quality of Extracted Sequential Rules by Session Identification Using STT and Cookies
    2017 European Conference on Electrical Engineering and Computer Science (EECS), 2017
    Co-Authors: Jozef Kapusta, Michal Munk, Dominik Halvoník
    Abstract:

    The data source of web usage mining is a web server access log file. This paper is focus on the user Session Identification that is the important step of data pre-processing. There are two useful methods to Identification user and Session: Session Time Thresholds (STT) and cookies Identification. We identified user/Session with cookies and STT methods and create different data pre-processing files. We will compare the user/Session Identification according to following levels of data pre-processing. By comparing these methods, we focused on the quality of extracted sequential rules. Sequence rules are an important output of the web usage mining process. Generated sequential rules from pre-processed files are classified into one of the following groups: useful, trivial, inexplicable. The aim of our paper is to compare the user/Session Identification using cookies with the Identification of user/Session using the STT. This comparison was performed with respect to the quality of the sequential rules generated, i.e., a comparison was made in terms of generation useful, trivial and inexplicable rules.

  • ICIC (3) - Experimental Verification of the Dependence Between the Expected and Observed Visit Rate of Web Pages
    Lecture Notes in Computer Science, 2015
    Co-Authors: Jozef Kapusta, Michal Munk, Martin Drlik
    Abstract:

    This paper is focused on a utilization of the web usage mining and web structure mining methods. We tried to answer the question if the expected visit rate of individual web pages correlates with the observed visit rate of the same web pages. We used web server log files as a data source. We applied several log file pre-processing methods to identify the user Sessions on different levels of granularity. We found out that the quality of acquired knowledge about the users’ behaviour depends on the method of the Session Identification. We have experimentally proved a higher dependence between the observed and expected visit rates of the examined web pages in well-prepared files with identified user Sessions. We found out statistically significant differences between PageRank and a real visit rate in the files with application of more advanced methods of Session Identification.

  • ICCS - Determining the Time Window Threshold to Identify User Sessions of Stakeholders of a Commercial Bank Portal
    Procedia Computer Science, 2014
    Co-Authors: Jozef Kapusta, Michal Munk, Peter Svec, Anna Pilková
    Abstract:

    Abstract In this paper, we focus on finding the suitable value of the time threshold, which is then used in the method of user Session Identification based on the time. To determine its value, we used the Length variable representing the time a user spent on a particular site. We compared two values of time threshold with experimental methods of user Session Identification based on the structure of the web: Reference Length and H-ref. When comparing the usefulness of extracted rules using all four methods, we proved that the use of the time threshold calculated from the quartile range is the most appropriate method for identifying Sessions for web usage mining.

  • Cut-off time calculation for user Session Identification by reference length
    2012 6th International Conference on Application of Information and Communication Technologies (AICT), 2012
    Co-Authors: Jozef Kapusta, Michal Munk, Martin Drlí K
    Abstract:

    One of the methods of web log mining is also discovering patterns of behavior of web site visitors. Based on the found users' behavior patterns that are represented by sequence rules, it is possible to modify and improve web site of the organization. Data for the analysis are gained from the web server log file. These anonymous data represent the problem of unique Identification of the web site visitor. The paper deals with less commonly used navigation-driven methods of user Session Identification. These methods assume that the user goes over several navigation pages during her/his visit until she/he finds the content page with required information. The content page is a page where the user spends considerably more time in comparison with navigation pages. The content page is considered to be the end of the Session. Searching of the next content page using navigation pages constitutes a new user Session. The division of pages into content and navigation pages is based on the calculation of cut-off time C. The verification of exponential distribution of variable that represents the time which user spent on the particular page is coessential. We prepared an experiment with data gained from log file of university web server. We tried to verify, if the time spent on web pages has exponential distribution and we estimated the value of cut-off time. The found results confirm our assumptions that the navigation oriented methods could be used to proper user Session Identification.