Association Rules

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Jiawei Han - One of the best experts on this subject based on the ideXlab platform.

  • efficient mining of intertransaction Association Rules
    IEEE Transactions on Knowledge and Data Engineering, 2003
    Co-Authors: Anthony K H Tung, Jiawei Han, Ling Feng
    Abstract:

    Most of the previous studies on mining Association Rules are on mining intratransaction Associations, i.e., the Associations among items within the same transaction. We extend the scope to include multidimensional, intertransaction Associations. In a database of stock price information, an example of such an Association is "if (company) A's stock goes up on day one, B's stock will go down on day two but go up on day four:" whether we treat company or day as the unit of transaction, the items belong to different transactions. Moreover, such an intertransaction Association can be extended to associate multiple properties in the same rule, so that multidimensional intertransaction Associations can also be defined and discovered. Mining intertransaction Associations pose more challenges on efficient processing than mining intratransaction Associations because the number of potential Association Rules is extremely large. We introduce the notion of intertransaction Association rule and develop an efficient algorithm, FITI (first intra then inter), for mining intertransaction Associations, which adopts two major ideas: 1) an intertransaction frequent itemset contains only the frequent itemsets of its corresponding intratransaction counterpart; and 2) a special data structure is built among intratransaction frequent itemsets for efficient mining of intertransaction frequent itemsets.

  • a template model for multidimensional inter transactional Association Rules
    Very Large Data Bases, 2002
    Co-Authors: Ling Feng, Jiawei Han
    Abstract:

    Multidimensional inter-transactional Association Rules extend the traditional Association Rules to describe more general Associations among items with multiple properties across transactions. “After McDonald and Burger King open branches, KFC will open a branch two months later and one mile away” is an example of such Rules. Since the number of potential inter-transactional Association Rules tends to be extremely large, mining inter-transactional Associations poses more challenges on efficient processing than mining traditional intra-transactional Associations. In order to make such Association rule mining truly practical and computationally tractable, in this study we present a template model to help users declare the interesting multidimensional inter-transactional Associations to be mined. With the guidance of templates, several optimization techniques, i.e., joining, converging, and speeding, are devised to speed up the discovery of inter-transactional Association Rules. We show, through a series of experiments on both synthetic and real-life data sets, that these optimization techniques can yield significant performance benefits.

  • mining multiple level Association Rules in large databases
    IEEE Transactions on Knowledge and Data Engineering, 1999
    Co-Authors: Jiawei Han
    Abstract:

    A top-down progressive deepening method is developed for efficient mining of multiple-level Association Rules from large transaction databases based on the a priori principle. A group of variant algorithms is proposed based on the ways of sharing intermediate results, with the relative performance tested and analyzed. The enforcement of different interestingness measurements to find more interesting Rules, and the relaxation of rule conditions for finding "level-crossing" Association Rules, are also investigated. The study shows that efficient algorithms can be developed from large databases for the discovery of interesting and strong multiple-level Association Rules.

  • a fast distributed algorithm for mining Association Rules
    International Conference on Parallel and Distributed Information Systems, 1996
    Co-Authors: David W Cheung, Jiawei Han
    Abstract:

    With the existence of many large transaction databases, the huge amounts of data, the high scalability of distributed systems, and the easy partitioning and distribution of a centralized database, it is important to investigate efficient methods for distributed mining of Association Rules. The study discloses some interesting relationships between locally large and globally large item sets and proposes an interesting distributed Association rule mining algorithm, FDM (fast distributed mining of Association Rules), which generates a small number of candidate sets and substantially reduces the number of messages to be passed at mining Association Rules. A performance study shows that FDM has a superior performance over the direct application of a typical sequential algorithm. Further performance enhancement leads to a few variations of the algorithm.

  • maintenance of discovered Association Rules in large databases an incremental updating technique
    International Conference on Data Engineering, 1996
    Co-Authors: David W Cheung, Jiawei Han, C Y Wong
    Abstract:

    An incremental updating technique is developed for maintenance of the Association Rules discovered by database mining. There have been many studies on efficient discovery of Association Rules in large databases. However, it is nontrivial to maintain such discovered Rules in large databases because a database may allow frequent or occasional updates and such updates may not only invalidate some existing strong Association Rules but also turn some weak Rules into strong ones. An incremental updating technique is proposed for efficient maintenance of discovered Association Rules when new transaction data are added to a transaction database.

Rajeev Motwani - One of the best experts on this subject based on the ideXlab platform.

  • beyond market baskets generalizing Association Rules to dependence Rules
    Data Mining and Knowledge Discovery, 1998
    Co-Authors: Craig Silverstein, Sergey Brin, Rajeev Motwani
    Abstract:

    One of the more well-studied problems in data mining is the search for Association Rules in market basket data. Association Rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B.” Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of Association Rules, we develop the notion of dependence Rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chi-squared test for independence from classical statistics. This leads to a measure that is upward-closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence Rules. We demonstrate our algorithm‘s effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.

  • beyond market baskets generalizing Association Rules to correlations
    International Conference on Management of Data, 1997
    Co-Authors: Sergey Brin, Rajeev Motwani, Craig Silverstein
    Abstract:

    One of the most well-studied problems in data mining is mining for Association Rules in market basket data. Association Rules, whose significance is measured via support and confidence, are intended to identify Rules of the type, “A customer purchasing item A often also purchases item B.” Motivated by the goal of generalizing beyond market baskets and the Association Rules used with them, we develop the notion of mining Rules that identify correlations (generalizing Associations), and we consider both the absence and presence of items as a basis for generating Rules. We propose measuring significance of Associations via the chi-squared test for correlation from classical statistics. This leads to a measure that is upward closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between correlated and uncorrelated itemsets in the lattice. We develop pruning strategies and devise an efficient algorithm for the resulting problem. We demonstrate its effectiveness by testing it on census data and finding term dependence in a corpus of text documents, as well as on synthetic data.

Lotfi Lakhal - One of the best experts on this subject based on the ideXlab platform.

  • Generating a Condensed Representation for Association Rules
    Journal of Intelligent Information Systems, 2005
    Co-Authors: Nicolas Pasquier, Rafik Taouil, Yves Bastide, Gerd Stumme, Lotfi Lakhal
    Abstract:

    Association rule extraction from operational datasets often produces several tens of thousands, and even millions, of Association Rules. Moreover, many of these Rules are redundant and thus useless. Using a semantic based on the closure of the Galois connection, we define a condensed representation for Association Rules. This representation is characterized by frequent closed itemsets and their generators. It contains the non-redundant Association Rules having minimal antecedent and maximal consequent, called min-max Association Rules. We think that these Rules are the most relevant since they are the most general non-redundant Association Rules. Furthermore, this representation is a basis, i.e., a generating set for all Association Rules, their supports and their confidences, and all of them can be retrieved needless accessing the data. We introduce algorithms for extracting this basis and for reconstructing all Association Rules. Results of experiments carried out on real datasets show the usefulness of this approach. In order to generate this basis when an algorithm for extracting frequent itemsets—such as Apriori for instance—is used, we also present an algorithm for deriving frequent closed itemsets and their generators from frequent itemsets without using the dataset.

  • Intelligent Structuring and Reducing of Association Rules with Formal Concept Analysis
    2001
    Co-Authors: Gerd Stumme, Nicolas Pasquier, Rafik Taouil, Yves Bastide, Lotfi Lakhal
    Abstract:

    Association Rules are used to investigate large databases. The analyst is usually confronted with large lists of such Rules and has to find the most relevant ones for his purpose. Based on results about knowledge representation within the theoretical framework of Formal Concept Analysis, we present relatively small bases for Association Rules from which all Rules can be deduced. We also provide algorithms for their calculation.

  • Mining minimal non-redundant Association Rules using frequent closed itemsets
    2000
    Co-Authors: Yves Bastide, Nicolas Pasquier, Rafik Taouil, Gerd Stumme, Lotfi Lakhal
    Abstract:

    The problem of the relevance and the usefulness of extracted Association Rules is of primary importance because, in the majority of cases, real-life databases lead to several thousands Association Rules with high confidence and among which are many redundancies. Using the closure of the Galois connection, we define two new bases for Association Rules which union is a generating set for all valid Association Rules with support and confidence. These bases are characterized using frequent closed itemsets and their generators; they consist of the non-redundant exact and approximate Association Rules having minimal antecedents and maximal consequents, i.e. the most relevant Association Rules. Algorithms for extracting these bases are presented and results of experiments carried out on real-life databases show that the proposed bases are useful, and that their generation is not time consuming.

Craig Silverstein - One of the best experts on this subject based on the ideXlab platform.

  • beyond market baskets generalizing Association Rules to dependence Rules
    Data Mining and Knowledge Discovery, 1998
    Co-Authors: Craig Silverstein, Sergey Brin, Rajeev Motwani
    Abstract:

    One of the more well-studied problems in data mining is the search for Association Rules in market basket data. Association Rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B.” Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of Association Rules, we develop the notion of dependence Rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chi-squared test for independence from classical statistics. This leads to a measure that is upward-closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence Rules. We demonstrate our algorithm‘s effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.

  • beyond market baskets generalizing Association Rules to correlations
    International Conference on Management of Data, 1997
    Co-Authors: Sergey Brin, Rajeev Motwani, Craig Silverstein
    Abstract:

    One of the most well-studied problems in data mining is mining for Association Rules in market basket data. Association Rules, whose significance is measured via support and confidence, are intended to identify Rules of the type, “A customer purchasing item A often also purchases item B.” Motivated by the goal of generalizing beyond market baskets and the Association Rules used with them, we develop the notion of mining Rules that identify correlations (generalizing Associations), and we consider both the absence and presence of items as a basis for generating Rules. We propose measuring significance of Associations via the chi-squared test for correlation from classical statistics. This leads to a measure that is upward closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between correlated and uncorrelated itemsets in the lattice. We develop pruning strategies and devise an efficient algorithm for the resulting problem. We demonstrate its effectiveness by testing it on census data and finding term dependence in a corpus of text documents, as well as on synthetic data.

Ramakrishnan Srikant - One of the best experts on this subject based on the ideXlab platform.

  • privacy preserving mining of Association Rules
    Knowledge Discovery and Data Mining, 2002
    Co-Authors: Alexandre V Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke
    Abstract:

    We present a framework for mining Association Rules from transactions consisting of categorical items where the data has been randomized to preserve privacy of individual transactions. While it is feasible to recover Association Rules and preserve privacy using a straightforward "uniform" randomization, the discovered Rules can unfortunately be exploited to find privacy breaches. We analyze the nature of privacy breaches and propose a class of randomization operators that are much more effective than uniform randomization in limiting the breaches. We derive formulae for an unbiased support estimator and its variance, which allow us to recover itemset supports from randomized datasets, and show how to incorporate these formulae into mining algorithms. Finally, we present experimental results that validate the algorithm by applying it on real datasets.

  • mining Association Rules with item constraints
    Knowledge Discovery and Data Mining, 1997
    Co-Authors: Ramakrishnan Srikant, Rakesh Agrawal
    Abstract:

    The problem of discovering Association Rules has received considerable research attention and several fast algorithms for mining Association Rules have been developed. In practice, users are often interested in a subset of Association Rules. For example, they may only want Rules that contain a specific item or Rules that contain children of a specific item in a hierarchy. While such constraints can be applied as a post-processing step, integrating them into the mining algorithm can dramatically reduce the execution time. We consider the problem of integrating constraints that are Boolean expressions over the presence or absence of items into the Association discovery algorithm. We present three integrated algorithms for mining Association Rules with item constraints and discuss their tradeoffs.

  • Mining generalized Association Rules
    Future Generation Computer Systems, 1997
    Co-Authors: Ramakrishnan Srikant, Rajeev Agrawal
    Abstract:

    We introduce the problem of mining generalized Association Rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (is-a hierarchy) on the items, we find Associations between items at any level of the taxonomy. For example, given a taxonomy that says that jackets is-a outerwear is-a clothes, we may infer a rule that "people who buy outerwear tend to buy shoes". This rule may hold even if Rules that "people who buy jackets tend to buy shoes", and "people who buy clothes tend to buy shoes " do not hold. An obvious solution to the problem is to add all ancestors of each item in a transaction to the transaction, and then run any of the algorithms for mining Association Rules on these "extended transactions ". However, this "Basic " algorithm is not very fast; we present two algorithms, Cumulate and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one real-life dataset). We also present a new interest-measure for Rules which uses the information in the taxonomy. Given a user-specified "minimum-interest-level", this measure prunes a large number of redundant Rules; 40 % to 60 % of all the Rules were pruned on two real-life datasets.

  • mining quantitative Association Rules in large relational tables
    International Conference on Management of Data, 1996
    Co-Authors: Ramakrishnan Srikant, Rakesh Agrawal
    Abstract:

    We introduce the problem of mining Association Rules in large relational tables containing both quantitative and categorical attributes. An example of such an Association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by fine-partitioning the values of the attribute and then combining adjacent partitions as necessary. We introduce measures of partial completeness which quantify the information lost due to partitioning. A direct application of this technique can generate too many similar Rules. We tackle this problem by using a "greater-than-expected-value" interest measure to identify the interesting Rules in the output. We give an algorithm for mining such quantitative Association Rules. Finally, we describe the results of using this approach on a real-life dataset.

  • mining generalized Association Rules
    Very Large Data Bases, 1995
    Co-Authors: Ramakrishnan Srikant, Rakesh Agrawal
    Abstract:

    We introduce the problem of mining generalized Association Rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (is-a hierarchy) on the items, we find Associations between items at any level of the taxonomy. For example, given a taxonomy that says that jackets is-a outerwear is-e clothes, we may infer a rule that “people who buy outerwear tend to buy shoes”. This rule may hold even if Rules that “people who buy jackets tend to buy shoes”, and “people who buy clothes tend to buy shoes” do not hold. An obvious solution to the problem is to add all ancestors of each item in a transaction to the transaction, and then run any of the algorithms for mining Association Rules on these “extended transactions” . However, this “Basic” algorithm is not very fast; we present two algorithms, Cumulate and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one real-life dataset). We also present a new interest-measure for Rules which uses the information in the taxonomy. Given a user-specified “minimum-interest-level”, this measure prunes a large number of redundant Rules; 40% to 60% of all the Rules were pruned on two real-life datasets. *Also, Department of Computer Science, University of Wisconsin, Madison. Permission to copy without fee all OT part of this material is granted provided that the copies are not made OT distrib&ed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying ia by permission of the Very Large Data Base Endowment. To copy otherwise, OT to republish, requires a fee and/or special pcTmiasion from the Endowment. Proceedings of the 21st VLDB Conference Zurich, Swizerland, 1995