Sequential Pattern

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 33723 Experts worldwide ranked by ideXlab platform

Zhu Jianqiu - One of the best experts on this subject based on the ideXlab platform.

  • Time-enriched Sequential Pattern Mining Algorithm TESP
    Computer Engineering, 2004
    Co-Authors: Zhu Jianqiu
    Abstract:

    In this paper, the time-enriched Sequential Pattern concept is introduced, and a novel mining algorithm, called TESP(time-enriched Sequential Pattern mining ), is developed, which also enables users to issue many time focused constraints and enhances flexibility and usefulness of Sequential Patterns mining.

Jia Wei Han - One of the best experts on this subject based on the ideXlab platform.

  • Sequential Pattern mining
    Frequent Pattern Mining, 2014
    Co-Authors: Wei Shen, Jianyong Wang, Jia Wei Han
    Abstract:

    Sequential Pattern mining, which discovers frequent subsequences as Patterns in a sequence database, has been a focused theme in data mining research for over a decade. This problem has broad applications, such as mining customer purchase Patterns and Web access Patterns. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Abundant literature has been dedicated to this research and tremendous progress has been made so far. This chapter will present a thorough overview and analysis of the main approaches to Sequential Pattern mining.

  • Stream Sequential Pattern mining with precise error bounds
    Proceedings - IEEE International Conference on Data Mining ICDM, 2008
    Co-Authors: Luiz F. Mendes, Bolin Ding, Jia Wei Han
    Abstract:

    Sequential Pattern mining is an interesting data mining problem with many real-world applications. This problem has been studied extensively in static databases. However, in recent years, emerging applications have introduced a new form of data called data stream. In a data stream, new elements are generated continuously. This poses additional constraints on the methods used for mining such data: memory usage is restricted, the infinitely flowing original dataset cannot be scanned multiple times, and current results should be available on demand.This paper introduces two effective methods for mining Sequential Patterns from data streams: the SS-BE method and the SS-MB method. The proposed methods break the stream into batches and only process each batch once. The two methods use different pruning strategies that restrict the memory usage but can still guarantee that all true Sequential Patterns are output at the end of any batch. Both algorithms scale linearly in execution time as the number of sequences grows, making them effective methods for Sequential Pattern mining in data streams. The experimental results also show that our methods are very accurate in that only a small fraction of the Patterns that are output are false positives. Even for these false positives, SS-BE guarantees that their true support is above a pre-defined threshold.

  • Constraint-based Sequential Pattern mining: The Pattern-growth methods
    Journal of Intelligent Information Systems, 2007
    Co-Authors: Jian Pei, Jia Wei Han, Wei Wang
    Abstract:

    Abstract Constraints are essential for many Sequential Pattern mining applications. However, there is no systematic study on constraint-based Sequential Pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-Pattern mining\ndoes not fit our mission well. An extended framework is developed based on a Sequential Pattern growth methodology. Our study\nshows that constraints can be effectively and efficiently pushed deep into the Sequential Pattern mining under this new framework.\nMoreover, this framework can be extended to constraint-based structured Pattern mining as well.

  • Sequential Pattern Mining by Pattern-Growth: Principles and Extensions
    2005
    Co-Authors: Jia Wei Han, Jian Pei, Xifeng Yan
    Abstract:

    Sequential Pattern mining is an important data mining problem with broad applications. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of Sequential Pattern mining methods: (1) a candidate generation-and-test approach, represented by (i) GSP [30], a horizontal format-based Sequential Pattern mining method, and (ii) SPADE [36], a vertical format-based method; and (2) a Sequential Pattern growth method, represented by PrefixSpan [26] and its further extensions, such as CloSpan for mining closed Sequential Patterns [35]. In this study, we perform a systematic introduction and presentation of the Pattern-growth methodology and study its principles and extensions. We first introduce two interesting Pattern growth algorithms, FreeSpan [11] and PrefixSpan [26], for efficient Sequential Pattern mining. Then we introduce CloSpan for mining closed Sequential Patterns. Their relative performance in large sequence databases is presented and analyzed. The various kinds of extension of these methods for (1) mining constraint-based Sequential Patterns, (2) mining multi-level, multi-dimensional Sequential Patterns, (3) mining top-k closed Sequential Patterns, and (4) their applications in bio-sequence Pattern analysis and clustering sequences are also discussed in the paper.

  • From Sequential Pattern mining to structured Pattern mining: A Pattern-growth approach
    Journal of Computer Science and Technology, 2004
    Co-Authors: Jia Wei Han, Jian Pei, Xifeng Yan
    Abstract:

    Sequential Pattern mining is an important data mining problem with broad applications. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of Sequential Pattern mining methods: (1) a candidate generation-and-test approach, represented by (i) GSP, a horizontal format-based Sequential Pattern mining method, and (ii) SPADE, a vertical format-based method; and (2) a Pattern-growth method, represented by Pre xSpan and its further extensions, such as gSpan for mining structured Patterns. In this study, we perform a systematic introduction and presentation of the Pattern-growth methodology and study its principles and extensions. We rst introduce two interesting Pattern-growth algorithms, FreeSpan and Pre xSpan, for eÆcient Sequential Pattern mining. Then we introduce gSpan for mining structured Patterns using the same methodology. Their relative performance in large databases is presented and analyzed. Several extensions of these methods are also discussed in the paper, including mining multi-level, multi-dimensional Patterns and mining constraint-based Patterns.

Jurgen Adamy - One of the best experts on this subject based on the ideXlab platform.

  • Sequential Pattern recognition employing recurrent fuzzy systems
    Fuzzy Sets and Systems, 2004
    Co-Authors: Roland Kempf, Jurgen Adamy
    Abstract:

    Sequential Pattern-recognition systems check whether data strings, e.g., time series, exhibit certain Pattern primitives in a specified order. As in the case of most other Pattern-recognition methods, either conventional methods or fuzzy systems may be used here. This paper presents a Sequential Pattern-recognition system employing recurrent fuzzy systems that is employed as a monitoring system on continuous-casting systems in the steel industry worldwide. Taking that application as a starting point, a general method for Sequential Pattern recognition in time series that uses recurrent fuzzy systems is described.

Ming-syan Chen - One of the best experts on this subject based on the ideXlab platform.

  • Distributed and scalable Sequential Pattern mining through stream processing
    Knowledge and Information Systems, 2017
    Co-Authors: Chun-chieh Chen, Hong-han Shuai, Ming-syan Chen
    Abstract:

    Scalability is a primary issue in existing Sequential Pattern mining algorithms for dealing with a large amount of data. Previous work, namely Sequential Pattern mining on the cloud (SPAMC), has already addressed the scalability problem. It supports the MapReduce cloud computing architecture for mining frequent Sequential Patterns on large datasets. However, this existing algorithm does not address the iterative mining problem, which is the problem that reloading data incur additional costs. Furthermore, it did not study the load balancing problem. To remedy these problems, we devised a powerful Sequential Pattern mining algorithm, the Sequential Pattern mining in the cloud-uniform distributed lexical sequence tree algorithm (SPAMC-UDLT), exploiting MapReduce and streaming processes. SPAMC-UDLT dramatically improves overall performance without launching multiple MapReduce rounds and provides perfect load balancing across machines in the cloud. The results show that SPAMC-UDLT can significantly reduce execution time, achieves extremely high scalability, and provides much better load balancing than existing algorithms in the cloud.

  • highly scalable Sequential Pattern mining based on mapreduce model on the cloud
    International Congress on Big Data, 2013
    Co-Authors: Chun-chieh Chen, Chi-yao Tseng, Ming-syan Chen
    Abstract:

    Sequential Pattern mining is an essential data mining technique that has been widely applied to many real world applications. However, traditional algorithms generally suffer from the scalability problem when dealing with big data. In this paper, we aim to significantly upgrade the scale and propose Sequential Pattern Mining algorithm based on MapReduce model on the Cloud (abbreviated as SPAMC). Derived from the prior SPAM algorithm, we design an iterative MapReduce framework to efficiently generate and prune candidate Patterns when constructing the lexical sequence tree. This framework not only distributes the sub-tasks of tree construction to independent mappers in parallel, but also enables the parallel processing of support counting. We conduct extensive experiments on the cloud environment of 32 virtual machines with up to 12.8 million transactional sequences. Experimental results show that SPAMC can significantly reduce mining time with big data, achieve extremely high scalability, and provide perfect load balancing on the cloud cluster.

  • dpsp distributed progressive Sequential Pattern mining on the cloud
    Knowledge Discovery and Data Mining, 2010
    Co-Authors: Jen-wei Huang, Suchen Lin, Ming-syan Chen
    Abstract:

    The progressive Sequential Pattern mining problem has been discussed in previous research works With the increasing amount of data, single processors struggle to scale up Traditional algorithms running on a single machine may have scalability troubles Therefore, mining progressive Sequential Patterns intrinsically suffers from the scalability problem In view of this, we design a distributed mining algorithm to address the scalability problem of mining progressive Sequential Patterns The proposed algorithm DPSP, standing for Distributed Progressive Sequential Pattern mining algorithm, is implemented on top of Hadoop platform, which realizes the cloud computing environment We propose Map/Reduce jobs in DPSP to delete obsolete itemsets, update current candidate Sequential Patterns and report up-to-date frequent Sequential Patterns within each POI The experimental results show that DPSP possesses great scalability and consequently increases the performance and the practicability of mining algorithms.

  • a general model for Sequential Pattern mining with a progressive database
    IEEE Transactions on Knowledge and Data Engineering, 2008
    Co-Authors: Jen-wei Huang, Chi-yao Tseng, Ming-syan Chen
    Abstract:

    Although there have been many recent studies on the mining of Sequential Patterns in a static database and in a database with increasing data, these works, in general, do not fully explore the effect of deleting old data from the sequences in the database. When Sequential Patterns are generated, the newly arriving Patterns may not be identified as frequent Sequential Patterns due to the existence of old data and sequences. Even worse, the obsolete Sequential Patterns that are not frequent recently may stay in the reported results. In practice, users are usually more interested in the recent data than the old ones. To capture the dynamic nature of data addition and deletion, we propose a general model of Sequential Pattern mining with a progressive database while the data in the database may be static, inserted, or deleted. In addition, we present a progressive algorithm Pisa, which stands for progressive mining of Sequential Patterns, to progressively discover Sequential Patterns in defined time period of interest (POI). The POI is a sliding window continuously advancing as the time goes by. Pisa utilizes a progressive Sequential tree to efficiently maintain the latest data sequences, discover the complete set of up-to-date Sequential Patterns, and delete obsolete data and Patterns accordingly. The height of the Sequential Pattern tree proposed is bounded by the length of POI, thereby effectively limiting the memory space required by Pisa that is significantly smaller than the memory needed by the alternative method, direct appending (DirApp). Note that the Sequential Pattern mining with a static database and with an incremental database are special cases of the progressive Sequential Pattern mining. By changing start time and end time of the POI, Pisa can easily deal with a static database or an incremental database as well. Complexity of algorithms proposed is analyzed. The experimental results show that Pisa not only significantly outperforms the prior methods in execution time by orders of magnitude but also possesses graceful scalability.

  • CIKM - On progressive Sequential Pattern mining
    Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06, 2006
    Co-Authors: Jen-wei Huang, Chi-yao Tseng, Ming-syan Chen
    Abstract:

    When Sequential Patterns are generated, the newly arriving Patterns may not be identified as frequent Sequential Patterns due to the existence of old data and sequences. In practice, users are usually more interested in the recent data than the old ones. To capture the dynamic nature of data addition and deletion, we propose a general model of Sequential Pattern mining with a progressive database. In addition, we present a progressive concept to progressively discover Sequential Patterns in recent time period of interest.

Pinar Karagoz - One of the best experts on this subject based on the ideXlab platform.

  • crom and huspext improving efficiency of high utility Sequential Pattern extraction
    International Conference on Data Engineering, 2016
    Co-Authors: Oznur Alkan, Pinar Karagoz
    Abstract:

    This paper presents efficient data structures and a pruning technique in order to improve the efficiency of high utility Sequential Pattern mining. CRoM (Cumulated Rest of Match) based upper bound, which is a tight upper bound on the utility of the candidates is proposed in order to perform more conservative pruning before candidate Pattern generation in comparison to the existing techniques. In addition, an efficient algorithm, HuspExt (High Utility Sequential Pattern Extraction), is presented which calculates the utilities of the child Patterns based on that of the parents'. Substantial experiments on both synthetic and real datasets from different domains show that, the solution efficiently discovers high utility Sequential Patterns under low thresholds.

  • crom and huspext improving efficiency of high utility Sequential Pattern extraction
    IEEE Transactions on Knowledge and Data Engineering, 2015
    Co-Authors: Oznur Alkan, Pinar Karagoz
    Abstract:

    High utility Sequential Pattern mining has been considered as an important research problem and a number of relevant algorithms have been proposed for this topic. The main challenge of high utility Sequential Pattern mining is that, the search space is large and the efficiency of the solutions is directly affected by the degree at which they can eliminate the candidate Patterns. Therefore, the efficiency of any high utility Sequential Pattern mining solution depends on its ability to reduce this big search space, and as a result, lower the computational complexity of calculating the utilities of the candidate Patterns. In this paper, we propose efficient data structures and pruning technique which is based on Cumulated Rest of Match (CRoM) based upper bound. CRoM, by defining a tighter upper bound on the utility of the candidates, allows more conservative pruning before candidate Pattern generation in comparison to the existing techniques. In addition, we have developed an efficient algorithm, High Utility Sequential Pattern Extraction (HuspExt), which calculates the utilities of the child Patterns based on that of the parents’. Substantial experiments on both synthetic and real datasets from different domains show that, the proposed solution efficiently discovers high utility Sequential Patterns from large scale datasets with different data characteristics, under low utility thresholds.