Frequent Patterns

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 90141 Experts worldwide ranked by ideXlab platform

Unil Yun - One of the best experts on this subject based on the ideXlab platform.

  • a new efficient approach for mining uncertain Frequent Patterns using minimum data structure without false positives
    Future Generation Computer Systems, 2017
    Co-Authors: Gangin Lee, Unil Yun
    Abstract:

    Abstract The concept of uncertain pattern mining was recently proposed to fulfill the demand for processing databases with uncertain data, and various relevant methods have been devised. However, previous approaches have the following limitations. State-of-the-art methods based on tree structure can cause fatal problems in terms of runtime and memory usage according to the characteristics of uncertain databases and threshold settings because their own tree data structures can become excessively large and complicated in their mining processes. Various approximation approaches have been suggested in order to overcome such problems; however, they are methods that increase their own mining performance at the cost of accuracy of the mining results. In order to solve the problems, we propose an exact, efficient algorithm for mining uncertain Frequent Patterns based on novel data structures and mining techniques, which can also guarantee the correctness of the mining results without any false positives. The newly proposed list-based data structures and pruning techniques allow a complete set of uncertain Frequent Patterns to be mined more efficiently without pattern losses. We also demonstrate that the proposed algorithm outperforms previous state-of-the art approaches in both theoretical and empirical aspects. Especially, we provide analytical results of performance evaluation for various types of datasets to show efficiency of runtime, memory usage, and scalability in our method.

  • Mining top-k Frequent Patterns with combination reducing techniques
    Applied Intelligence, 2014
    Co-Authors: Gwangbum Pyun, Unil Yun
    Abstract:

    Top-k Frequent pattern mining finds interesting Patterns from the highest support to the k-th support. The approach can be effectively applied in numerous fields such as marketing, finance, bio-data analysis, and so on since it does not need constraints by a minimum support threshold. Top-k mining methods use the support of the k-th pattern, not a user-specified minimum support. Thus, the methods conduct mining operations based on very low supports until the k-th pattern is detected. When a low support is used in the mining process, single-paths with numerous items are generated, where the top-k mining algorithm extracts valid Patterns by combining the items for each single-path. Therefore, the bigger the number of combinations is, the larger the increase in time and memory consumption is. In this paper, in order to mine top-k Frequent Patterns more efficiently, we consider converting Patterns obtained from single-paths into composite Patterns during the mining process and recovering them as the original Patterns when the top-k Frequent Patterns are extracted. For this, we define a new concept, the composite pattern, and propose novel techniques for reducing pattern combinations in the single-path. Two algorithms are introduced in this paper, where the former is CRM (Combination Reducing method), applying our reduction manner, and the latter is CRMN (Combination Reducing method for N-itemset), considering N-itemset, i.e., Patterns' lengths. A performance evaluation shows that CRM and CRMN algorithms can efficiently reduce pattern combinations in single-paths compared to state-of-the-art algorithms. The experimental results also illustrate that our approaches have outstanding performance in terms of runtime, memory, and scalability.

  • mining maximal Frequent Patterns by considering weight conditions over data streams
    Knowledge Based Systems, 2014
    Co-Authors: Unil Yun, Gangin Lee, Keun Ho Ryu
    Abstract:

    Frequent pattern mining over data streams is currently one of the most interesting fields in data mining. Current databases have needed more immediate processes since enormous amounts of data are being accumulated and updated in real time. However, existing traditional approaches have not been entirely suitable for a data stream environment since they operate with more than two database scans. Moreover, Frequent pattern mining over data streams mostly generates an enormous number of Frequent Patterns, thereby causing a significant amount of overheads. In addition, as weight conditions are very useful factors in reflecting importance for each object in the real world, it is necessary to apply them to the mining process in order to obtain more practical, meaningful Patterns. To consider and solve these problems, we propose a novel method for mining Weighted Maximal Frequent Patterns (WMFPs) over data streams, called MWS (Maximal Frequent pattern mining with Weight conditions over data Streams). MWS guarantees efficient mining performance in the data stream environment by scanning stream databases only once, and prevents overheads of pattern extractions with an abbreviated notation: a maximal Frequent pattern form instead of the general one. Furthermore, MWS contributes to enhanced reliability of the mining results by applying weight conditions to each element of the data streams. Extensive experiments report that MWS has outstanding performance in comparison to previous algorithms.

  • An efficient mining algorithm for maximal weighted Frequent Patterns in transactional databases
    Knowledge-Based Systems, 2012
    Co-Authors: Unil Yun, Hyeon-il Shin, Keun Ho Ryu, Eunchul Yoon
    Abstract:

    In the field of data mining, there have been many studies on mining Frequent Patterns due to its broad applications in mining association rules, correlations, sequential Patterns, constraint-based Frequent Patterns, graph Patterns, emerging Patterns, and many other data mining tasks. We present a new algorithm for mining maximal weighted Frequent Patterns from a transactional database. Our mining paradigm prunes unimportant Patterns and reduces the size of the search space. However, maintaining the anti-monotone property without loss of information should be considered, and thus our algorithm prunes weighted inFrequent Patterns and uses a prefix-tree with weight-descending order. In comparison, a previous algorithm, MAFIA, exponentially scales to the longest pattern length. Our algorithm outperformed MAFIA in a thorough experimental analysis on real data. In addition, our algorithm is more efficient and scalable.

  • approximate weighted Frequent pattern mining with without noisy environments
    Knowledge Based Systems, 2011
    Co-Authors: Unil Yun, Keun Ho Ryu
    Abstract:

    In data mining area, weighted Frequent pattern mining has been suggested to find important Frequent Patterns by considering the weights of Patterns. More extensions with weight constraints have been proposed such as mining weighted association rules, weighted sequential Patterns, weighted closed Patterns, Frequent Patterns with dynamic weights, weighted graphs, and weighted sub-trees or sub structures. In previous approaches of weighted Frequent pattern mining, weighted supports of Patterns were exactly matched to prune weighted inFrequent Patterns. However, in the noisy environment, the small change in weights or supports of items affects the result sets seriously. This may make the weighted Frequent Patterns less useful in the noisy environment. In this paper, we propose the robust concept of mining approximate weighted Frequent Patterns. Based on the framework of weight based pattern mining, an approximate factor is defined to relax the requirement for exact equality between weighted supports of Patterns and a minimum threshold. After that, we address the concept of mining approximate weighted Frequent Patterns to find important Patterns with/without the noisy data. We analyze characteristics of approximate weighted Frequent Patterns and run extensive performance tests.

Md Rezaul Karim - One of the best experts on this subject based on the ideXlab platform.

  • mining maximal Frequent Patterns in transactional databases and dynamic data streams a spark based approach
    Information Sciences, 2018
    Co-Authors: Chowdhury Farhan Ahmed, Md Rezaul Karim, Michael Cochez, Oya Beyan, Stefan Decker
    Abstract:

    Abstract Mining maximal Frequent Patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of Patterns, help to reveal customers’ purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefore, proposed an efficient way to mining MFPs with Apache Spark to overcome these issues. For the faster computation and efficient utilization of memory, we utilized a prime number based data transformation technique, in which values of individual transaction have been preserved. After removing null transactions and inFrequent items, the resulting transformed dataset becomes denser compared to the original distributions. We tested our proposed algorithms in both real static TDBs and DDSs. Experimental results and performance analysis show that our approach is efficient and scalable to large dataset sizes.

  • mining maximal Frequent Patterns in transactional databases and dynamic data streams a spark based approach
    Information Sciences, 2018
    Co-Authors: Chowdhury Farhan Ahmed, Md Rezaul Karim, Michael Cochez, Oya Beyan, Stefan Decker
    Abstract:

    Abstract Mining maximal Frequent Patterns ( MFPs ) in transactional databases ( TDBs) and dynamic data streams (DDSs ) is substantially important for business intelligence. MFPs, as the smallest set of Patterns, help to reveal customers’ purchase rules and market basket analysis  ( MBA ). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism . Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefore, proposed an efficient way to mining MFPs with Apache Spark to overcome these issues. For the faster computation and efficient utilization of memory, we utilized a prime number based data transformation technique, in which values of individual transaction have been preserved. After removing null transactions and inFrequent items, the resulting transformed dataset becomes denser compared to the original distributions. We tested our proposed algorithms in both real static TDBs and DDSs . Experimental results and performance analysis show that our approach is efficient and scalable to large dataset sizes.

  • efficient mining regularly Frequent Patterns in transactional databases
    Lecture Notes in Computer Science, 2012
    Co-Authors: Md Mamunur Rashid, Byeong-soo Jeong, Md Rezaul Karim, Hojin Choi
    Abstract:

    Finding interesting Patterns plays an important role in several data mining applications, such as market basket analysis, medical data analysis, and others. The occurrence frequency of Patterns has been regarded as an important criterion for measuring interestingness of a pattern in several applications. However, temporal regularity of Patterns can be considered as another important measure for some applications. In this paper, we propose an efficient approach for miming regularly Frequent Patterns. As for temporal regularity measure, we use variance of interval time between pattern occurrences. To find regularly Frequent Patterns, we utilize pattern-growth approach according to user given min_support and max_variance threshold. Extensive performance study shows that our approach is time and memory efficient in finding regularly Frequent Patterns.

  • an efficient approach to mining maximal contiguous Frequent Patterns from large dna sequence databases
    Genomics & Informatics, 2012
    Co-Authors: Md Rezaul Karim, Byeong-soo Jeong, Md Mamunur Rashid, Hojin Choi
    Abstract:

    Mining interesting Patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous Frequent Patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding Frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous Frequent Patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous Frequent Patterns within a reasonable time.

  • a mapreduce framework for mining maximal contiguous Frequent Patterns in large dna sequence datasets
    Iete Technical Review, 2012
    Co-Authors: Md Rezaul Karim, Byeong-soo Jeong, Md Azam Hossain, Md Mamunur Rashid, Hojin Choi
    Abstract:

    AbstractCurrent DNA sequence datasets have become extremely large, making it a great challenge for single-processor and main-memory-based computing systems to mine interesting Patterns. Such limited hardware resources make the performance of most Apriori-like algorithms inefficient. However, recent implementation of a MapReduce framework has overcome these limitations. Furthermore, mining with maximal contiguous Frequent Patterns to express the function and structure of DNA sequences is a useful technique, capable of capturing the common data characteristics among related sequences. In this paper, we proposed an efficient approach for mining maximal contiguous Frequent Patterns in large DNA sequence data using MapReduce framework which can handle a massive DNA sequence datasets with a large number of nodes on a Hadoop platform. Our extensive experimental results show that the proposed approach can mine the complete set of maximal contiguous Frequent Patterns very efficiently.

Hojin Choi - One of the best experts on this subject based on the ideXlab platform.

  • single pass incremental and interactive mining for weighted Frequent Patterns
    Expert Systems With Applications, 2012
    Co-Authors: Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-soo Jeong, Youngkoo Lee, Hojin Choi
    Abstract:

    Weighted Frequent pattern (WFP) mining is more practical than Frequent pattern mining because it can consider different semantic significance (weight) of the items. For this reason, WFP mining becomes an important research issue in data mining and knowledge discovery. However, existing algorithms cannot be applied for incremental and interactive WFP mining and also for stream data mining because they are based on a static database and require multiple database scans. In this paper, we present two novel tree structures IWFPT"W"A (Incremental WFP tree based on weight ascending order) and IWFPT"F"D (Incremental WFP tree based on frequency descending order), and two new algorithms IWFP"W"A and IWFP"F"D for incremental and interactive WFP mining using a single database scan. They are effective for incremental and interactive mining to utilize the current tree structure and to use the previous mining results when a database is updated or a minimum support threshold is changed. IWFP"W"A gets advantage in candidate pattern generation by obtaining the highest weighted item in the bottom of IWFPT"W"A. IWFP"F"D ensures that any non-candidate item cannot appear before candidate items in any branch of IWFPT"F"D and thus speeds up the prefix tree and conditional tree creation time during mining operation. IWFPT"F"D also achieves the highly compact incremental tree to save memory space. To our knowledge, this is the first research work to perform single-pass incremental and interactive mining for weighted Frequent Patterns. Extensive performance analyses show that our tree structures and algorithms are very efficient and scalable for single-pass incremental and interactive WFP mining.

  • efficient mining regularly Frequent Patterns in transactional databases
    Lecture Notes in Computer Science, 2012
    Co-Authors: Md Mamunur Rashid, Byeong-soo Jeong, Md Rezaul Karim, Hojin Choi
    Abstract:

    Finding interesting Patterns plays an important role in several data mining applications, such as market basket analysis, medical data analysis, and others. The occurrence frequency of Patterns has been regarded as an important criterion for measuring interestingness of a pattern in several applications. However, temporal regularity of Patterns can be considered as another important measure for some applications. In this paper, we propose an efficient approach for miming regularly Frequent Patterns. As for temporal regularity measure, we use variance of interval time between pattern occurrences. To find regularly Frequent Patterns, we utilize pattern-growth approach according to user given min_support and max_variance threshold. Extensive performance study shows that our approach is time and memory efficient in finding regularly Frequent Patterns.

  • an efficient approach to mining maximal contiguous Frequent Patterns from large dna sequence databases
    Genomics & Informatics, 2012
    Co-Authors: Md Rezaul Karim, Byeong-soo Jeong, Md Mamunur Rashid, Hojin Choi
    Abstract:

    Mining interesting Patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous Frequent Patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding Frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous Frequent Patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous Frequent Patterns within a reasonable time.

  • a mapreduce framework for mining maximal contiguous Frequent Patterns in large dna sequence datasets
    Iete Technical Review, 2012
    Co-Authors: Md Rezaul Karim, Byeong-soo Jeong, Md Azam Hossain, Md Mamunur Rashid, Hojin Choi
    Abstract:

    AbstractCurrent DNA sequence datasets have become extremely large, making it a great challenge for single-processor and main-memory-based computing systems to mine interesting Patterns. Such limited hardware resources make the performance of most Apriori-like algorithms inefficient. However, recent implementation of a MapReduce framework has overcome these limitations. Furthermore, mining with maximal contiguous Frequent Patterns to express the function and structure of DNA sequences is a useful technique, capable of capturing the common data characteristics among related sequences. In this paper, we proposed an efficient approach for mining maximal contiguous Frequent Patterns in large DNA sequence data using MapReduce framework which can handle a massive DNA sequence datasets with a large number of nodes on a Hadoop platform. Our extensive experimental results show that the proposed approach can mine the complete set of maximal contiguous Frequent Patterns very efficiently.

Byeong-soo Jeong - One of the best experts on this subject based on the ideXlab platform.

  • single pass incremental and interactive mining for weighted Frequent Patterns
    Expert Systems With Applications, 2012
    Co-Authors: Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-soo Jeong, Youngkoo Lee, Hojin Choi
    Abstract:

    Weighted Frequent pattern (WFP) mining is more practical than Frequent pattern mining because it can consider different semantic significance (weight) of the items. For this reason, WFP mining becomes an important research issue in data mining and knowledge discovery. However, existing algorithms cannot be applied for incremental and interactive WFP mining and also for stream data mining because they are based on a static database and require multiple database scans. In this paper, we present two novel tree structures IWFPT"W"A (Incremental WFP tree based on weight ascending order) and IWFPT"F"D (Incremental WFP tree based on frequency descending order), and two new algorithms IWFP"W"A and IWFP"F"D for incremental and interactive WFP mining using a single database scan. They are effective for incremental and interactive mining to utilize the current tree structure and to use the previous mining results when a database is updated or a minimum support threshold is changed. IWFP"W"A gets advantage in candidate pattern generation by obtaining the highest weighted item in the bottom of IWFPT"W"A. IWFP"F"D ensures that any non-candidate item cannot appear before candidate items in any branch of IWFPT"F"D and thus speeds up the prefix tree and conditional tree creation time during mining operation. IWFPT"F"D also achieves the highly compact incremental tree to save memory space. To our knowledge, this is the first research work to perform single-pass incremental and interactive mining for weighted Frequent Patterns. Extensive performance analyses show that our tree structures and algorithms are very efficient and scalable for single-pass incremental and interactive WFP mining.

  • efficient mining regularly Frequent Patterns in transactional databases
    Lecture Notes in Computer Science, 2012
    Co-Authors: Md Mamunur Rashid, Byeong-soo Jeong, Md Rezaul Karim, Hojin Choi
    Abstract:

    Finding interesting Patterns plays an important role in several data mining applications, such as market basket analysis, medical data analysis, and others. The occurrence frequency of Patterns has been regarded as an important criterion for measuring interestingness of a pattern in several applications. However, temporal regularity of Patterns can be considered as another important measure for some applications. In this paper, we propose an efficient approach for miming regularly Frequent Patterns. As for temporal regularity measure, we use variance of interval time between pattern occurrences. To find regularly Frequent Patterns, we utilize pattern-growth approach according to user given min_support and max_variance threshold. Extensive performance study shows that our approach is time and memory efficient in finding regularly Frequent Patterns.

  • an efficient approach to mining maximal contiguous Frequent Patterns from large dna sequence databases
    Genomics & Informatics, 2012
    Co-Authors: Md Rezaul Karim, Byeong-soo Jeong, Md Mamunur Rashid, Hojin Choi
    Abstract:

    Mining interesting Patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous Frequent Patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding Frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous Frequent Patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous Frequent Patterns within a reasonable time.

  • a mapreduce framework for mining maximal contiguous Frequent Patterns in large dna sequence datasets
    Iete Technical Review, 2012
    Co-Authors: Md Rezaul Karim, Byeong-soo Jeong, Md Azam Hossain, Md Mamunur Rashid, Hojin Choi
    Abstract:

    AbstractCurrent DNA sequence datasets have become extremely large, making it a great challenge for single-processor and main-memory-based computing systems to mine interesting Patterns. Such limited hardware resources make the performance of most Apriori-like algorithms inefficient. However, recent implementation of a MapReduce framework has overcome these limitations. Furthermore, mining with maximal contiguous Frequent Patterns to express the function and structure of DNA sequences is a useful technique, capable of capturing the common data characteristics among related sequences. In this paper, we proposed an efficient approach for mining maximal contiguous Frequent Patterns in large DNA sequence data using MapReduce framework which can handle a massive DNA sequence datasets with a large number of nodes on a Hadoop platform. Our extensive experimental results show that the proposed approach can mine the complete set of maximal contiguous Frequent Patterns very efficiently.

  • sliding window based Frequent pattern mining over data streams
    Information Sciences, 2009
    Co-Authors: Syed Khairuzzaman Tanbeer, Chowdhury Farhan Ahmed, Byeong-soo Jeong, Youngkoo Lee
    Abstract:

    Finding Frequent Patterns in a continuous stream of transactions is critical for many applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. Even though numerous Frequent pattern mining algorithms have been developed over the past decade, new solutions for handling stream data are still required due to the continuous, unbounded, and ordered sequence of data elements generated at a rapid rate in a data stream. Therefore, extracting Frequent Patterns from more recent data can enhance the analysis of stream data. In this paper, we propose an efficient technique to discover the complete set of recent Frequent Patterns from a high-speed data stream over a sliding window. We develop a Compact Pattern Stream tree (CPS-tree) to capture the recent stream data content and efficiently remove the obsolete, old stream data content. We also introduce the concept of dynamic tree restructuring in our CPS-tree to produce a highly compact frequency-descending tree structure at runtime. The complete set of recent Frequent Patterns is obtained from the CPS-tree of the current window using an FP-growth mining technique. Extensive experimental analyses show that our CPS-tree is highly efficient in terms of memory and time complexity when finding recent Frequent Patterns from a high-speed data stream.

Keun Ho Ryu - One of the best experts on this subject based on the ideXlab platform.

  • mining maximal Frequent Patterns by considering weight conditions over data streams
    Knowledge Based Systems, 2014
    Co-Authors: Unil Yun, Gangin Lee, Keun Ho Ryu
    Abstract:

    Frequent pattern mining over data streams is currently one of the most interesting fields in data mining. Current databases have needed more immediate processes since enormous amounts of data are being accumulated and updated in real time. However, existing traditional approaches have not been entirely suitable for a data stream environment since they operate with more than two database scans. Moreover, Frequent pattern mining over data streams mostly generates an enormous number of Frequent Patterns, thereby causing a significant amount of overheads. In addition, as weight conditions are very useful factors in reflecting importance for each object in the real world, it is necessary to apply them to the mining process in order to obtain more practical, meaningful Patterns. To consider and solve these problems, we propose a novel method for mining Weighted Maximal Frequent Patterns (WMFPs) over data streams, called MWS (Maximal Frequent pattern mining with Weight conditions over data Streams). MWS guarantees efficient mining performance in the data stream environment by scanning stream databases only once, and prevents overheads of pattern extractions with an abbreviated notation: a maximal Frequent pattern form instead of the general one. Furthermore, MWS contributes to enhanced reliability of the mining results by applying weight conditions to each element of the data streams. Extensive experiments report that MWS has outstanding performance in comparison to previous algorithms.

  • An efficient mining algorithm for maximal weighted Frequent Patterns in transactional databases
    Knowledge-Based Systems, 2012
    Co-Authors: Unil Yun, Hyeon-il Shin, Keun Ho Ryu, Eunchul Yoon
    Abstract:

    In the field of data mining, there have been many studies on mining Frequent Patterns due to its broad applications in mining association rules, correlations, sequential Patterns, constraint-based Frequent Patterns, graph Patterns, emerging Patterns, and many other data mining tasks. We present a new algorithm for mining maximal weighted Frequent Patterns from a transactional database. Our mining paradigm prunes unimportant Patterns and reduces the size of the search space. However, maintaining the anti-monotone property without loss of information should be considered, and thus our algorithm prunes weighted inFrequent Patterns and uses a prefix-tree with weight-descending order. In comparison, a previous algorithm, MAFIA, exponentially scales to the longest pattern length. Our algorithm outperformed MAFIA in a thorough experimental analysis on real data. In addition, our algorithm is more efficient and scalable.

  • approximate weighted Frequent pattern mining with without noisy environments
    Knowledge Based Systems, 2011
    Co-Authors: Unil Yun, Keun Ho Ryu
    Abstract:

    In data mining area, weighted Frequent pattern mining has been suggested to find important Frequent Patterns by considering the weights of Patterns. More extensions with weight constraints have been proposed such as mining weighted association rules, weighted sequential Patterns, weighted closed Patterns, Frequent Patterns with dynamic weights, weighted graphs, and weighted sub-trees or sub structures. In previous approaches of weighted Frequent pattern mining, weighted supports of Patterns were exactly matched to prune weighted inFrequent Patterns. However, in the noisy environment, the small change in weights or supports of items affects the result sets seriously. This may make the weighted Frequent Patterns less useful in the noisy environment. In this paper, we propose the robust concept of mining approximate weighted Frequent Patterns. Based on the framework of weight based pattern mining, an approximate factor is defined to relax the requirement for exact equality between weighted supports of Patterns and a minimum threshold. After that, we address the concept of mining approximate weighted Frequent Patterns to find important Patterns with/without the noisy data. We analyze characteristics of approximate weighted Frequent Patterns and run extensive performance tests.