Frequent Itemsets

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 7062 Experts worldwide ranked by ideXlab platform

Suh Yin Lee - One of the best experts on this subject based on the ideXlab platform.

  • incremental updates of closed Frequent Itemsets over continuous data streams
    Expert Systems With Applications, 2009
    Co-Authors: Suh Yin Lee
    Abstract:

    Online mining of closed Frequent Itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we propose an efficient one-pass algorithm, NewMoment to maintain the set of closed Frequent Itemsets in data streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and memory needed to slide the windows. Experiments show that the proposed algorithm not only attain highly accurate mining results, but also run significant faster and consume less memory than existing algorithm Moment for mining closed Frequent Itemsets over recent data streams.

  • Mining Frequent Itemsets over data streams using efficient window sliding techniques
    Expert Systems with Applications, 2009
    Co-Authors: Hua-fu Li, Suh Yin Lee
    Abstract:

    Online mining of Frequent Itemsets over a stream sliding window is one of the most important problems in stream data mining with broad applications. It is also a difficult issue since the streaming data possess some challenging characteristics, such as unknown or unbound size, possibly a very fast arrival rate, inability to backtrack over previously arrived transactions, and a lack of system control over the order in which the data arrive. In this paper, we propose an effective bit-sequence based, one-pass algorithm, called MFI-TransSW (Mining Frequent Itemsets within a Transaction-sensitive Sliding Window), to mine the set of Frequent Itemsets from data streams within a transaction-sensitive sliding window which consists of a fixed number of transactions. The proposed MFI-TransSW algorithm consists of three phases: window initialization, window sliding and pattern generation. First, every item of each transaction is encoded in an effective bit-sequence representation in the window initialization phase. The proposed bit-sequence representation of item is used to reduce the time and memory needed to slide the windows in the following phases. Second, MFI-TransSW uses the left bit-shift technique to slide the windows efficiently in the window sliding phase. Finally, the complete set of Frequent Itemsets within the current sliding window is generated by a level-wise method in the pattern generation phase. Experimental studies show that the proposed algorithm not only attain highly accurate mining results, but also run significant faster and consume less memory than do existing algorithms for mining Frequent Itemsets over data streams with a sliding window. Furthermore, based on the MFI-TransSW framework, an extended single-pass algorithm, called MFI-TimeSW (Mining Frequent Itemsets within a Time-sensitive Sliding Window) is presented to mine the set of Frequent Itemsets efficiently over time-sensitive sliding windows.

  • dsm fi an efficient algorithm for mining Frequent Itemsets in data streams
    Knowledge and Information Systems, 2008
    Co-Authors: Mankwan Shan, Suh Yin Lee
    Abstract:

    Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for Frequent Itemsets), for online incremental mining of Frequent Itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary Frequent itemset forest) for maintaining the set of all Frequent Itemsets embedded in the transaction data stream generated so far. Finally, the set of all Frequent Itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of Frequent Itemsets.

  • online mining recently maximal Frequent Itemsets over data streams
    International Workshop on Research Issues in Data Engineering, 2005
    Co-Authors: Suh Yin Lee, Mankwan Shan
    Abstract:

    A data stream is a massive, open-ended sequence of data elements continuously generated at a rapid rate. Mining data streams is more difficult than mining static databases because the huge, high-speed and continuous characteristics of streaming data. In this paper, we propose a new one-pass algorithm called DSM-MFI (stands for Data Stream Mining for Maximal Frequent Itemsets), which mines the set of all maximal Frequent Itemsets in landmark windows over data streams. A new summary data structure called summary Frequent itemset forest (abbreviated as SFI-forest) is developed for incremental maintaining the essential information about maximal Frequent Itemsets embedded in the stream so far. Theoretical analysis and experimental studies show that the proposed algorithm is efficient and scalable for mining the set of all maximal Frequent Itemsets over the entire history of the data streams.

Mankwan Shan - One of the best experts on this subject based on the ideXlab platform.

  • dsm fi an efficient algorithm for mining Frequent Itemsets in data streams
    Knowledge and Information Systems, 2008
    Co-Authors: Mankwan Shan, Suh Yin Lee
    Abstract:

    Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for Frequent Itemsets), for online incremental mining of Frequent Itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary Frequent itemset forest) for maintaining the set of all Frequent Itemsets embedded in the transaction data stream generated so far. Finally, the set of all Frequent Itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of Frequent Itemsets.

  • efficient maintenance and mining of Frequent Itemsets over online data streams with a sliding window
    Systems Man and Cybernetics, 2006
    Co-Authors: Mankwan Shan, Sunyin Lee
    Abstract:

    Online mining of streaming data is one of the most important issues in data mining. In this paper, we proposed an efficient one-pass algorithm, called MFI-TransSW (mining Frequent Itemsets over a transaction-sensitive sliding window), to mine the set of all Frequent Itemsets in data streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and memory needed to slide the windows. The experiments show that the proposed algorithm not only attain highly accurate mining results, but also run significant faster and consume less memory than existing algorithms for mining Frequent Itemsets over recent data streams.

  • online mining recently maximal Frequent Itemsets over data streams
    International Workshop on Research Issues in Data Engineering, 2005
    Co-Authors: Suh Yin Lee, Mankwan Shan
    Abstract:

    A data stream is a massive, open-ended sequence of data elements continuously generated at a rapid rate. Mining data streams is more difficult than mining static databases because the huge, high-speed and continuous characteristics of streaming data. In this paper, we propose a new one-pass algorithm called DSM-MFI (stands for Data Stream Mining for Maximal Frequent Itemsets), which mines the set of all maximal Frequent Itemsets in landmark windows over data streams. A new summary data structure called summary Frequent itemset forest (abbreviated as SFI-forest) is developed for incremental maintaining the essential information about maximal Frequent Itemsets embedded in the stream so far. Theoretical analysis and experimental studies show that the proposed algorithm is efficient and scalable for mining the set of all maximal Frequent Itemsets over the entire history of the data streams.

Tzungpei Hong - One of the best experts on this subject based on the ideXlab platform.

  • Mining Frequent Itemsets using the N-list and subsume concepts
    International Journal of Machine Learning and Cybernetics, 2016
    Co-Authors: Frans Coenen, Tzungpei Hong
    Abstract:

    Frequent itemset mining is a fundamental element with respect to many data mining problems directed at finding interesting patterns in data. Recently the PrePost algorithm, a new algorithm for mining Frequent Itemsets based on the idea of N-lists, which in most cases outperforms other current state-of-the-art algorithms, has been presented. This paper proposes an improved version of PrePost, the N-list and Subsume-based algorithm for mining Frequent Itemsets (NSFI) algorithm that uses a hash table to enhance the process of creating the N-lists associated with 1-Itemsets and an improved N-list intersection algorithm. Furthermore, two new theorems are proposed for determining the “subsume index” of Frequent 1-Itemsets based on the N-list concept. Using the subsume index, NSFI can identify groups of Frequent Itemsets without determining the N-list associated with them. The experimental results show that NSFI outperforms PrePost in terms of runtime and memory usage and outperforms dEclat in terms of runtime.

  • rwfim recent weighted Frequent Itemsets mining
    Engineering Applications of Artificial Intelligence, 2015
    Co-Authors: Jerry Chunwei Lin, Wensheng Gan, Philippe Fournierviger, Tzungpei Hong
    Abstract:

    Abstract In recent years, weighted Frequent Itemsets mining (WFIM) has become a critical issue of data mining, which can be used to discover more useful and interesting patterns in real-world applications instead of the traditional Frequent Itemsets mining. Many algorithms have been developed to find weighted Frequent Itemsets (WFIs) without time-sensitive consideration. The discovered out-of-date information may, however, be meaningless and useless in decision making. In this paper, a novel framework, namely recent weighted-Frequent Itemsets mining (RWFIM) is proposed to concern both the weight and time-sensitive constraints. A projected-based RWFIM-P algorithm is first proposed for mining the designed recent weighted-Frequent Itemsets (RWFIs) with weight and time-sensitive consideration. It uses the projection-and-test mechanism to discover RWFIs in a recursive way. Based on the developed RWFIM-P algorithm, the entire database can be projected and divided into several sub-databases according to the currently processed itemset, thus reducing the computational costs and memory requirements. The second RWFIM-PE algorithm is also proposed to improve the performance of the first RWFIM-P algorithm based on the developed Estimated Weight of 2-itemset Pruning (EW2P) strategy to mine the RWFIs without generating the unpromising candidates, thus avoiding the computations of the projection mechanism compared to the first RWFIM-P algorithm. Experiments are conducted to evaluate the performance of the proposed two algorithms compared to the traditional WFIM in terms of execution time, number of generated RWFIs and scalability under varied two minimum thresholds in several real-world and synthetic datasets.

  • mining fuzzy Frequent Itemsets based on ubffp trees
    Journal of Intelligent and Fuzzy Systems, 2014
    Co-Authors: Chunwei Lin, Tzungpei Hong
    Abstract:

    In the past, most algorithms proposed for mining association rules handle items with binary values. Transactions with quantitative values are, however, commonly seen in real-world applications. Fuzzy mining algorithms for inducing rules from quantitative databases have thus been developed, most of which are based on the Apriori algorithm. Fuzzy mining is seldom based on Frequent pattern (FP) trees because fuzzy-set processing from an FP tree is much more complicated than crisp-set processing. In this paper, a two-phase fuzzy mining approach based on a tree structure to obtain fuzzy Frequent Itemsets from a quantitative database is proposed. A simple tree structure called the upper-bound fuzzy Frequent-pattern (UBFFP) tree is designed. The two- phase fuzzy mining approach can easily derive the upper-bound fuzzy counts of Itemsets using the tree. It prunes unpromising Itemsets in the first phase, and then finds the actual fuzzy Frequent Itemsets in the second phase. Experimental results show that the proposed approach has good performance.

Carson K Leung - One of the best experts on this subject based on the ideXlab platform.

  • approximation to expected support of Frequent Itemsets in mining probabilistic sets of uncertain data
    Procedia Computer Science, 2015
    Co-Authors: Alfredo Cuzzocrea, Carson K Leung, Richard Kyle Mackinnon
    Abstract:

    Abstract Knowledge discovery and data mining generally discovers implicit, previously unknown, and useful knowledge from data. As one of the popular knowledge discovery and data mining tasks, Frequent itemset mining, in particular, discovers knowledge in the form of sets of Frequently co-occurring items, events, or objects. On the one hand, in many real-life applications, users mine Frequent patterns from traditional databases of precise data, in which users know certainly the presence of items in transactions. On the other hand, in many other real-life applications, users mine Frequent Itemsets from probabilistic sets of uncertain data, in which users are uncertain about the likelihood of the presence of items in transactions. Each item in these probabilistic sets of uncertain data is often associated with an existential probability expressing the likelihood of its presence in that transaction. To mine Frequent Itemsets from these probabilistic datasets, many existing algorithms capture lots of information to compute expected support. To reduce the amount of space required, algorithms capture some but not all information in computing or approximating expected support. The tradeoff is that the upper bounds to expected support may not be tight. In this paper, we examine several upper bounds and recommend to the user which ones consume less space while providing good approximation to expected support of Frequent Itemsets in mining probabilistic sets of uncertain data.

  • mining constrained Frequent Itemsets from distributed uncertain data
    Future Generation Computer Systems, 2014
    Co-Authors: Alfredo Cuzzocrea, Carson K Leung, Richard Kyle Mackinnon
    Abstract:

    Nowadays, high volumes of massive data can be generated from various sources (e.g.,sensor data from environmental surveillance). Many existing distributed Frequent itemset mining algorithms do not allow users to express the Itemsets to be mined according to their intention via the use of constraints. Consequently, these unconstrained mining algorithms can yield numerous Itemsets that are not interesting to users. Moreover, due to inherited measurement inaccuracies and/or network latencies, the data are often riddled with uncertainty. These call for both constrained mining and uncertain data mining. In this journal article, we propose a data-intensive computer system for tree-based mining of Frequent Itemsets that satisfy user-defined constraints from a distributed environment such as a wireless sensor network of uncertain data. We proposed a system for tree-based distributed uncertain Frequent itemset mining.Our system allows users to specify constraints for expressing their interests.It finds Frequent Itemsets that satisfy succinct constraints from distributed uncertain data.It also handles non-succinct (e.g.,inductive succinct, anti-monotone) constraints.

  • fast tree based mining of Frequent Itemsets from uncertain data
    Database Systems for Advanced Applications, 2012
    Co-Authors: Carson K Leung, Syed Khairuzzaman Tanbeer
    Abstract:

    Over the past two decades, numerous algorithms have been proposed for mining Frequent Itemsets from precise data. However, there are situations in which data are uncertain. In recent years, tree-based algorithms have been proposed to mine Frequent Itemsets from uncertain data. While the key success of tree-based algorithms for mining precise data is due to the compactness of a tree structure in capturing precise data, the corresponding tree structure in capturing uncertain data may not be so compact. In this paper, we propose a novel tree structure for capturing uncertain data such that it is as compact as the tree for capturing precise data. Moreover, we also propose two fast algorithms that use this compact tree structure to mine Frequent Itemsets. Experimental results showed the compactness of our tree and the effectiveness of our algorithms in mining Frequent Itemsets from uncertain data.

  • equivalence class transformation based mining of Frequent Itemsets from uncertain data
    ACM Symposium on Applied Computing, 2011
    Co-Authors: Carson K Leung, Lijing Sun
    Abstract:

    Numerous Frequent itemset mining algorithms have been proposed over the past two decades. Most of them mine traditional databases of precise data. However, there are many real-life applications for which data are uncertain. This leads to the mining of uncertain data. In this paper, we propose an equivalence class transformation based algorithm---called UV-Eclat---which transforms probabilistic databases of uncertain data from their usual horizontal format into a vertical format, from which Frequent Itemsets are mined.

  • mining of Frequent Itemsets from streams of uncertain data
    International Conference on Data Engineering, 2009
    Co-Authors: Carson K Leung, Boyu Hao
    Abstract:

    Frequent itemset mining plays an essential role in the mining of various patterns and is in demand in many real-life applications. Hence, mining of Frequent Itemsets has been the subject of numerous studies since its introduction. Generally, most of these studies find Frequent Itemsets from traditional transaction databases, in which the content of each transaction--namely, items--is definitely known and precise. However, there are many real-life situations in which ones are uncertain about the content of transactions. This calls for the mining of uncertain data. Moreover, due to advances in technology, a flood of precise or uncertain data can be produced in many situations. This calls for the mining of data streams. To deal with these situations, we propose two tree-based mining algorithms to efficiently find Frequent Itemsets from streams of uncertain data, where each item in the transactions in the streams is associated with an existential probability. Experimental results show the effectiveness of our algorithms in mining Frequent Itemsets from streams of uncertain data.

Zhihong Deng - One of the best experts on this subject based on the ideXlab platform.

  • diffnodesets an efficient structure for fast mining Frequent Itemsets
    Applied Soft Computing, 2016
    Co-Authors: Zhihong Deng
    Abstract:

    Abstract Mining Frequent Itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining Frequent Itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining Frequent Itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining Frequent Itemsets. To achieve high efficiency, dFIN finds Frequent Itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates Frequent Itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms.

  • prepost an efficient n lists based algorithm for mining Frequent Itemsets via children parent equivalence pruning
    Expert Systems With Applications, 2015
    Co-Authors: Zhihong Deng
    Abstract:

    Abstract N-list is a novel data structure proposed in recent years. It has been proven to be very efficient for mining Frequent Itemsets. In this paper, we present PrePost + , a high-performance algorithm for mining Frequent Itemsets. It employs N-list to represent Itemsets and directly discovers Frequent Itemsets using a set-enumeration search tree. Especially, it employs an efficient pruning strategy named Children–Parent Equivalence pruning to greatly reduce the search space. We have conducted extensive experiments to evaluate PrePost + against three state-of-the-art algorithms, which are PrePost, FIN, and FP-growth ∗ , on six various real datasets. The experimental results show that PrePost + is always the fastest one on all datasets. Moreover, PrePost + also demonstrates good performance in terms of memory consumption since it use only a litter more memory than FP-growth ∗ and less memory than PrePost and FIN.

  • fast mining Frequent Itemsets using nodesets
    Expert Systems With Applications, 2014
    Co-Authors: Zhihong Deng
    Abstract:

    Node-list and N-list, two novel data structure proposed in recent years, have been proven to be very efficient for mining Frequent Itemsets. The main problem of these structures is that they both need to encode each node of a PPC-tree with pre-order and post-order code. This causes that they are memory-consuming and inconvenient to mine Frequent Itemsets. In this paper, we propose Nodeset, a more efficient data structure, for mining Frequent Itemsets. Nodesets require only the pre-order (or post-order code) of each node, which makes it saves half of memory compared with N-lists and Node-lists. Based on Nodesets, we present an efficient algorithm called FIN to mining Frequent Itemsets. For evaluating the performance of FIN, we have conduct experiments to compare it with PrePost and FP-growth∗, two state-of-the-art algorithms, on a variety of real and synthetic datasets. The experimental results show that FIN is high performance on both running time and memory usage.

  • a new algorithm for fast mining Frequent Itemsets using n lists
    Science in China Series F: Information Sciences, 2012
    Co-Authors: Zhihong Deng, Zhonghui Wang, Jiajian Jiang
    Abstract:

    Mining Frequent Itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about Frequent Itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all Frequent Itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of Itemsets’ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m + n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find Frequent Itemsets without generating candidate Itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining Frequent Itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though the algorithm consumes more memory when the datasets are sparse, it is still the fastest one.