Pattern Mining

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 42081 Experts worldwide ranked by ideXlab platform

Jia Wei Han - One of the best experts on this subject based on the ideXlab platform.

  • Advanced Pattern Mining
    Data Mining, 2012
    Co-Authors: Jia Wei Han, Micheline Kamber, Jian Pei
    Abstract:

    This chapter discusses the advanced methods of frequent Pattern Mining, which mines more complex forms of frequent Patterns and considers user preferences or constraints to speed up the Mining process. Frequent Pattern Mining has reached far beyond the basics due to substantial research, numerous extensions of the problem scope, and broad application studies. An in-depth coverage of methods for Mining many kinds of Patterns is included elaborating on: multilevel Patterns, multidimensional Patterns, Patterns in continuous data, rare Patterns, negative Patterns, constrained frequent Patterns, frequent Patterns in high-dimensional data, colossal Patterns, and compressed and approximate Patterns. Other Pattern Mining themes, including Mining sequential and structured Patterns and Mining Patterns from spatiotemporal, multimedia, and stream data, are considered more advanced. Pattern Mining is a more general term than frequent Pattern Mining since the former covers rare and negative Patterns as well. However, when there is no ambiguity, the two terms are used interchangeably. In addition to Mining for basic frequent itemsets and associations, advanced forms of Patterns can be mined such as multilevel associations and multidimensional associations, quantitative association rules, rare Patterns, and negative Patterns. Users can also mine high-dimensional Patterns and compressed or approximate Patterns. Frequent Pattern Mining has many diverse applications, ranging from Pattern-based data cleaning to Pattern-based classification, clustering, and outlier or exception analysis.

  • Frequent Pattern Mining: Current status and future directions
    Data Mining and Knowledge Discovery, 2007
    Co-Authors: Jia Wei Han, Dong Xin, Hong Cheng, Xifeng Yan
    Abstract:

    Frequent PatternMining has been a focused theme in dataMining re- search foroveradecade.Abundantliteraturehasbeendedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset Mining in transaction databases to numerous research frontiers, such as sequential Pattern Mining, structured Pattern Mining, correlation Mining, associative classification, and frequent Pattern-based clus- tering, as well as their broad applications. In this article, we provide a brief over- view of the current status of frequent Pattern Mining and discuss a few promising research directions.We believe that frequent Pattern Mining research has sub- stantiallybroadenedthe scopeof data analysisandwillhavedeepimpactondata Mining methodologies and applications in the long run. However, there are still some challenging research issues that need to be solved before frequent Pattern Mining can claim a cornerstone approach in data Mining applications.

  • Constraint-based sequential Pattern Mining: The Pattern-growth methods
    Journal of Intelligent Information Systems, 2007
    Co-Authors: Jian Pei, Jia Wei Han, Wei Wang
    Abstract:

    Abstract Constraints are essential for many sequential Pattern Mining applications. However, there is no systematic study on constraint-based sequential Pattern Mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-Pattern Mining\ndoes not fit our mission well. An extended framework is developed based on a sequential Pattern growth methodology. Our study\nshows that constraints can be effectively and efficiently pushed deep into the sequential Pattern Mining under this new framework.\nMoreover, this framework can be extended to constraint-based structured Pattern Mining as well.

  • From sequential Pattern Mining to structured Pattern Mining: A Pattern-growth approach
    Journal of Computer Science and Technology, 2004
    Co-Authors: Jia Wei Han, Jian Pei, Xifeng Yan
    Abstract:

    Sequential Pattern Mining is an important data Mining problem with broad applications. However, it is also a challenging problem since the Mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequential Pattern Mining methods: (1) a candidate generation-and-test approach, represented by (i) GSP, a horizontal format-based sequential Pattern Mining method, and (ii) SPADE, a vertical format-based method; and (2) a Pattern-growth method, represented by Pre xSpan and its further extensions, such as gSpan for Mining structured Patterns. In this study, we perform a systematic introduction and presentation of the Pattern-growth methodology and study its principles and extensions. We rst introduce two interesting Pattern-growth algorithms, FreeSpan and Pre xSpan, for eÆcient sequential Pattern Mining. Then we introduce gSpan for Mining structured Patterns using the same methodology. Their relative performance in large databases is presented and analyzed. Several extensions of these methods are also discussed in the paper, including Mining multi-level, multi-dimensional Patterns and Mining constraint-based Patterns.

  • gspan graph based substructure Pattern Mining
    International Conference on Data Mining, 2002
    Co-Authors: Xifeng Yan, Jia Wei Han
    Abstract:

    We investigate new approaches for frequent graph-based Pattern Mining in graph datasets and propose a novel algorithm called gSpan (graph-based substructure Pattern Mining), which discovers frequent substructures without candidate generation. gSpan builds a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label. Based on this lexicographic order gSpan adopts the depth-first search strategy to mine frequent connected subgraphs efficiently. Our performance study shows that gSpan substantially outperforms previous algorithms, sometimes by an order of magnitude.

Charu C. Aggarwal - One of the best experts on this subject based on the ideXlab platform.

  • Association Pattern Mining
    Data Mining, 2015
    Co-Authors: Charu C. Aggarwal
    Abstract:

    The classical problem of association Pattern Mining is defined in the context of supermarket data containing sets of items bought by customers, which are referred to as transactions. The goal is to determine associations between groups of items bought by customers, which can intuitively be viewed as k-way correlations between items. The most popular model for association Pattern Mining uses the frequencies of sets of items as the quantification of the level of association.

  • Frequent Pattern Mining - Frequent Pattern Mining Algorithms: A Survey
    Frequent Pattern Mining, 2014
    Co-Authors: Charu C. Aggarwal, Mansurul Bhuiyan, Mohammad Al Hasan
    Abstract:

    This chapter will provide a detailed survey of frequent Pattern Mining algorithms. A wide variety of algorithms will be covered starting from Apriori. Many algorithms such as Eclat, TreeProjection, and FP-growth will be discussed. In addition a discussion of several maximal and closed frequent Pattern Mining algorithms will be provided. Thus, this chapter will provide one of most detailed surveys of frequent Pattern Mining algorithms available in the literature.

  • Frequent Pattern Mining - Applications of Frequent Pattern Mining
    Frequent Pattern Mining, 2014
    Co-Authors: Charu C. Aggarwal
    Abstract:

    Frequent Pattern Mining has broad applications which encompass clustering, classification, software bug detection, recommendations, and a wide variety of other problems. In fact, the greatest utility of frequent Pattern Mining (unlike other major data Mining problems such as outlier analysis and classification), is as an intermediate tool to provide Pattern-centered insights for a variety of problems. In this chapter, we will study a wide variety of applications of frequent Pattern Mining. The purpose of this chapter is not to provide a detailed description of every possible application, but to provide the reader an overview of what is possible with the use of methods such as frequent Pattern Mining.

  • Frequent Pattern Mining - An Introduction to Frequent Pattern Mining
    Frequent Pattern Mining, 2014
    Co-Authors: Charu C. Aggarwal
    Abstract:

    The problem of frequent Pattern Mining has been widely studied in the literature because of its numerous applications to a variety of data Mining problems such as clustering and classification. In addition, frequent Pattern Mining also has numerous applications in diverse domains such as spatiotemporal data, software bug detection, and biological data. The algorithmic aspects of frequent Pattern Mining have been explored very widely. This chapter provides an overview of these methods, as it relates to the organization of this book.

  • On dense Pattern Mining in graph streams
    Proceedings of the VLDB Endowment, 2010
    Co-Authors: Charu C. Aggarwal, Philip S Yu, Yao Li, Ruoming Jin
    Abstract:

    Many massive web and communication network applications create data which can be represented as a massive sequential stream of edges. For example, conversations in a telecommunication network or messages in a social network can be represented as a massive stream of edges. Such streams are typically very large, because of the large amount of underlying activity in such networks. An important application in these domains is to determine frequently occurring dense structures in the underlying graph stream. In general, we would like to determine frequent and dense Patterns in the underlying interactions. We introduce a model for dense Pattern Mining and propose probabilistic algorithms for deterMining such structural Patterns effectively and efficiently. The purpose of the probabilistic approach is to create a summarization of the graph stream, which can be used for further Pattern Mining. We show that this summarization approach leads to effective and efficient results for stream Pattern Mining over a number of real and synthetic data sets.

Jian Pei - One of the best experts on this subject based on the ideXlab platform.

  • Advanced Pattern Mining
    Data Mining, 2012
    Co-Authors: Jia Wei Han, Micheline Kamber, Jian Pei
    Abstract:

    This chapter discusses the advanced methods of frequent Pattern Mining, which mines more complex forms of frequent Patterns and considers user preferences or constraints to speed up the Mining process. Frequent Pattern Mining has reached far beyond the basics due to substantial research, numerous extensions of the problem scope, and broad application studies. An in-depth coverage of methods for Mining many kinds of Patterns is included elaborating on: multilevel Patterns, multidimensional Patterns, Patterns in continuous data, rare Patterns, negative Patterns, constrained frequent Patterns, frequent Patterns in high-dimensional data, colossal Patterns, and compressed and approximate Patterns. Other Pattern Mining themes, including Mining sequential and structured Patterns and Mining Patterns from spatiotemporal, multimedia, and stream data, are considered more advanced. Pattern Mining is a more general term than frequent Pattern Mining since the former covers rare and negative Patterns as well. However, when there is no ambiguity, the two terms are used interchangeably. In addition to Mining for basic frequent itemsets and associations, advanced forms of Patterns can be mined such as multilevel associations and multidimensional associations, quantitative association rules, rare Patterns, and negative Patterns. Users can also mine high-dimensional Patterns and compressed or approximate Patterns. Frequent Pattern Mining has many diverse applications, ranging from Pattern-based data cleaning to Pattern-based classification, clustering, and outlier or exception analysis.

  • Constraint-based sequential Pattern Mining: The Pattern-growth methods
    Journal of Intelligent Information Systems, 2007
    Co-Authors: Jian Pei, Jia Wei Han, Wei Wang
    Abstract:

    Abstract Constraints are essential for many sequential Pattern Mining applications. However, there is no systematic study on constraint-based sequential Pattern Mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-Pattern Mining\ndoes not fit our mission well. An extended framework is developed based on a sequential Pattern growth methodology. Our study\nshows that constraints can be effectively and efficiently pushed deep into the sequential Pattern Mining under this new framework.\nMoreover, this framework can be extended to constraint-based structured Pattern Mining as well.

  • Preference-Based Frequent Pattern Mining
    International Journal of Data Warehousing and Mining, 2005
    Co-Authors: Moonjung Cho, Jian Pei, Haixun Wang, Wei Wang
    Abstract:

    Frequent Pattern Mining is an important data-Mining problem with broad applications. Although there are many in-depth studies on efficient frequent Pattern Mining algorithms and constraint pushing techniques, the effectiveness of frequent Pattern Mining remains a serious concern: It is non-trivial and often tricky to specify appropriate support thresholds and proper constraints. In this paper, we propose a novel theme of preference-based frequent Pattern Mining. A user simply can specify a preference instead of setting detailed parameters in constraints. We identify the problem of preference-based frequent Pattern Mining and formulate the preferences for Mining. We develop an efficient framework to mine frequent Patterns with preferences. Interestingly, many preferences can be pushed deep into the Mining by properly employing the existing efficient frequent Pattern Mining techniques. We conduct an extensive performance study to examine our method. The results indicate that preference-based frequent Pattern Mining is effective and efficient. Furthermore, we extend our discussion from Pattern-based frequent Pattern Mining to preference-based data Mining in principle and draw a general framework.

  • From sequential Pattern Mining to structured Pattern Mining: A Pattern-growth approach
    Journal of Computer Science and Technology, 2004
    Co-Authors: Jia Wei Han, Jian Pei, Xifeng Yan
    Abstract:

    Sequential Pattern Mining is an important data Mining problem with broad applications. However, it is also a challenging problem since the Mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequential Pattern Mining methods: (1) a candidate generation-and-test approach, represented by (i) GSP, a horizontal format-based sequential Pattern Mining method, and (ii) SPADE, a vertical format-based method; and (2) a Pattern-growth method, represented by Pre xSpan and its further extensions, such as gSpan for Mining structured Patterns. In this study, we perform a systematic introduction and presentation of the Pattern-growth methodology and study its principles and extensions. We rst introduce two interesting Pattern-growth algorithms, FreeSpan and Pre xSpan, for eÆcient sequential Pattern Mining. Then we introduce gSpan for Mining structured Patterns using the same methodology. Their relative performance in large databases is presented and analyzed. Several extensions of these methods are also discussed in the paper, including Mining multi-level, multi-dimensional Patterns and Mining constraint-based Patterns.

  • Constrained frequent Pattern Mining: a Pattern-growth view
    ACM SIGKDD Explorations Newsletter, 2002
    Co-Authors: Jian Pei, Jia Wei Han
    Abstract:

    It has been well recognized that frequent Pattern Mining plays an essential role in many important data Mining tasks. However, frequent Pattern Mining often generates a very large number of Patterns and rules, which reduces not only the efficiency but also the effectiveness of Mining. Recent work has highlighted the importance of the constraint-based Mining paradigm in the context of Mining frequent itemsets, associations, correlations, sequential Patterns, and many other interesting Patterns in large databases.Recently, we developed efficient Pattern-growth methods for frequent Pattern Mining. Interestingly, Pattern-growth methods are not only efficient but also effective in Mining with various constraints. Many tough constraints which cannot be handled by previous methods can be pushed deep into the Pattern-growth Mining process. In this paper, we overview the principles of Pattern-growth methods for constrained frequent Pattern Mining and sequential Pattern Mining. Moreover, we explore the power of Pattern-growth methods towards Mining with tough constraints and highlight some interesting open problems.

Xifeng Yan - One of the best experts on this subject based on the ideXlab platform.

  • Frequent Pattern Mining: Current status and future directions
    Data Mining and Knowledge Discovery, 2007
    Co-Authors: Jia Wei Han, Dong Xin, Hong Cheng, Xifeng Yan
    Abstract:

    Frequent PatternMining has been a focused theme in dataMining re- search foroveradecade.Abundantliteraturehasbeendedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset Mining in transaction databases to numerous research frontiers, such as sequential Pattern Mining, structured Pattern Mining, correlation Mining, associative classification, and frequent Pattern-based clus- tering, as well as their broad applications. In this article, we provide a brief over- view of the current status of frequent Pattern Mining and discuss a few promising research directions.We believe that frequent Pattern Mining research has sub- stantiallybroadenedthe scopeof data analysisandwillhavedeepimpactondata Mining methodologies and applications in the long run. However, there are still some challenging research issues that need to be solved before frequent Pattern Mining can claim a cornerstone approach in data Mining applications.

  • From sequential Pattern Mining to structured Pattern Mining: A Pattern-growth approach
    Journal of Computer Science and Technology, 2004
    Co-Authors: Jia Wei Han, Jian Pei, Xifeng Yan
    Abstract:

    Sequential Pattern Mining is an important data Mining problem with broad applications. However, it is also a challenging problem since the Mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequential Pattern Mining methods: (1) a candidate generation-and-test approach, represented by (i) GSP, a horizontal format-based sequential Pattern Mining method, and (ii) SPADE, a vertical format-based method; and (2) a Pattern-growth method, represented by Pre xSpan and its further extensions, such as gSpan for Mining structured Patterns. In this study, we perform a systematic introduction and presentation of the Pattern-growth methodology and study its principles and extensions. We rst introduce two interesting Pattern-growth algorithms, FreeSpan and Pre xSpan, for eÆcient sequential Pattern Mining. Then we introduce gSpan for Mining structured Patterns using the same methodology. Their relative performance in large databases is presented and analyzed. Several extensions of these methods are also discussed in the paper, including Mining multi-level, multi-dimensional Patterns and Mining constraint-based Patterns.

  • gspan graph based substructure Pattern Mining
    International Conference on Data Mining, 2002
    Co-Authors: Xifeng Yan, Jia Wei Han
    Abstract:

    We investigate new approaches for frequent graph-based Pattern Mining in graph datasets and propose a novel algorithm called gSpan (graph-based substructure Pattern Mining), which discovers frequent substructures without candidate generation. gSpan builds a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label. Based on this lexicographic order gSpan adopts the depth-first search strategy to mine frequent connected subgraphs efficiently. Our performance study shows that gSpan substantially outperforms previous algorithms, sometimes by an order of magnitude.

G Raju - One of the best experts on this subject based on the ideXlab platform.

  • Web access Pattern Mining – a new method
    International Journal of Web Science, 2014
    Co-Authors: Achuthan Nair Rajimol, G Raju
    Abstract:

    An efficient web access Pattern Mining algorithm, FOL-mine is presented in this paper. The FOL-mine algorithm is based on the projected database of each frequent event and eliminates the need for construction of Pattern tree. A data structure, first occurrence list (FOL), is introduced in the proposed algorithm for efficient handling of suffix building. Rebuilding of projection databases is completely eliminated in the new method. Experimental analysis of the algorithms reveals significant performance gain over other web access Pattern Mining algorithms.

  • A Novel Weighted Support Method for Access Pattern Mining
    2014
    Co-Authors: G Raju, Achuthan Nair Rajimol
    Abstract:

    Sequential Pattern Mining is an important data Mining technique that finds out all frequent sequential Patterns in a sequence database. Applications in wide range of important domains make Sequential Pattern Mining an interesting area of research. Conventional approach for sequential Pattern Mining treats each and every item in the sequence with equal importance and thus fails to reflect the individual significance of items. Weighted Sequential Pattern Mining is an approach that treats different items in the sequences with different weights so as to reflect the importance of each item. Thus, weighted method models real life sequence database in a better manner and more efficient than the conventional sequential Pattern Mining. Weighted sequential Pattern Mining can be used to mine web access Patterns more efficiently from web log data. This paper proposes a new weighted access Pattern Mining algorithm to mine weighted access Patterns in a web log database. The proposed method uses frequency of user visit to give weights to web pages during the Mining process. Through extensive experimental evaluation the algorithm is proved to be promising.

  • ICDEM - Web access Pattern Mining --- a survey
    Lecture Notes in Computer Science, 2010
    Co-Authors: Achuthan Nair Rajimol, G Raju
    Abstract:

    This article provides a survey of different Web Access Pattern Tree (WAP-tree) based methods for Web Access Pattern Mining. Web Access Pattern Mining mines complete set of Patterns that satisfy the given support threshold from a given Web Access Sequence Database. A brief discussion of basic theory and terminologies related to web access Pattern Mining are Presented. A comparison of the different methods is also given.