Data Mining Query

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 11274 Experts worldwide ranked by ideXlab platform

Maciej Zakrzewicz - One of the best experts on this subject based on the ideXlab platform.

  • on multiple Query optimization in Data Mining
    Knowledge Discovery and Data Mining, 2005
    Co-Authors: Marek Wojciechowski, Maciej Zakrzewicz
    Abstract:

    Traditional multiple Query optimization methods focus on identifying common subexpressions in sets of relational queries and on constructing their global execution plans. In this paper we consider the problem of optimizing sets of Data Mining queries submitted to a Knowledge Discovery Management System. We describe the problem of Data Mining Query scheduling and we introduce a new algorithm called CCAgglomerative to schedule Data Mining queries for frequent itemset discovery.

  • PAKDD - On multiple Query optimization in Data Mining
    Advances in Knowledge Discovery and Data Mining, 2005
    Co-Authors: Marek Wojciechowski, Maciej Zakrzewicz
    Abstract:

    Traditional multiple Query optimization methods focus on identifying common subexpressions in sets of relational queries and on constructing their global execution plans. In this paper we consider the problem of optimizing sets of Data Mining queries submitted to a Knowledge Discovery Management System. We describe the problem of Data Mining Query scheduling and we introduce a new algorithm called CCAgglomerative to schedule Data Mining queries for frequent itemset discovery.

  • a study on answering a Data Mining Query using a materialized view
    International Symposium on Computer and Information Sciences, 2004
    Co-Authors: Maciej Zakrzewicz, Mikolaj Morzy, Marek Wojciechowski
    Abstract:

    One of the classic Data Mining problems is discovery of frequent itemsets. This problem particularly attracts Database community as it resembles traditional Database Querying. In this paper we consider a Data Mining system which supports storing of previous Query results in the form of materialized Data Mining views. While numerous works have shown that reusing results of previous frequent itemset queries can significantly improve performance of Data Mining Query processing, a thorough study of possible differences between the current Query and a materialized view has not been presented yet. In this paper we classify possible differences into six classes, provide I/O cost analysis for all the classes, and experimentally evaluate the most promising one.

  • Data Mining Query Scheduling for Apriori Common Counting
    2004
    Co-Authors: Marek Wojciechowski, Maciej Zakrzewicz
    Abstract:

    In this paper we consider concurrent execution of multiple Data Mining queries. If such Data Mining queries operate on similar parts of the Database, then their overall I/O cost can be reduced by integrating their Data retrieval operations. The integration requires that many Data Mining queries are present in memory at the same time. If the memory size is not sufficient to hold all the Data Mining queries, then the queries must be scheduled into multiple phases of loading and processing. We discuss the problem of Data Mining Query scheduling and propose a heuristic algorithm to efficiently schedule the Data Mining queries into phases.

  • evaluation of the mine merge method for Data Mining Query processing
    ADBIS (Local Proceedings), 2004
    Co-Authors: Marek Wojciechowski, Maciej Zakrzewicz
    Abstract:

    In this paper we consider concurrent execution of multiple Data Mining queries in the context of discovery of frequent itemsets. If such Data Mining queries operate on similar parts of the Database, then their overall I/O cost can be reduced by transforming the set of Data Mining queries into another set of non-overlapping queries, whose results can be used to efficiently answer the original queries. We discuss the problem of multiple Data Mining Query optimization and experimentally evaluate the Mine Merge algorithm to efficiently execute sets of Data Mining queries.

Céline Robardet - One of the best experts on this subject based on the ideXlab platform.

  • An Inductive Database System Based on Virtual Mining Views
    Data Mining and Knowledge Discovery, 2012
    Co-Authors: Hendrik Blockeel, Toon Calders, Elisa Fromont, Bart Goethals, Adriana Prado, Céline Robardet
    Abstract:

    Inductive Databases integrate Database Querying with Database Mining. In this article, we present an inductive Database system that does not rely on a new Data Mining Query language, but on plain SQL. We propose an intuitive and elegant framework based on virtual Mining views, which are relational tables that virtually contain the complete output of Data Mining algorithms executed over a given Data table. We show that several types of patterns and models that are implicitly present in the Data, such as itemsets, association rules, and decision trees, can be represented and queried with SQL using a unifying framework. As a proof of concept, we illustrate a complete Data Mining scenario with SQL queries over the Mining views, which is executed in our system.

  • A Practical Comparative Study Of Data Mining Query Languages
    2010
    Co-Authors: Hendrik Blockeel, Toon Calders, Elisa Fromont, Bart Goethals, Adriana Prado, Céline Robardet
    Abstract:

    An important motivation for the development of inductive Databases and Query languages for Data Mining is that such an approach will increase the flexibility with which Data Mining can be performed. By integrating Data Mining more closely into a Database Querying framework, separate steps such as Data preprocessing, Data Mining, and postprocessing of the results, can all be handled using one Query language. In this chapter, we compare 6 existing Data Mining Query languages, all extensions of the standard relational Query language SQL, from this point of view: how flexible are they with respect to the tasks they can be used for, and how easily can those tasks be performed? We verify whether and how these languages can be used to perform four prototypical Data Mining tasks in the domain of itemset and associa- tion rule Mining, and summarize their stronger and weaker points. Besides offering a comparative evaluation of different Data Mining Query languages, this chapter also provides a motivation for the next chapter, where a deeper integration of Data Mining into Databases is proposed, one that does not rely on the development of a new Query language, but where the structure of the Database itself is extended.

  • Inductive Querying with Virtual Mining Views
    2010
    Co-Authors: Hendrik Blockeel, Toon Calders, Elisa Fromont, Bart Goethals, Adriana Prado, Céline Robardet
    Abstract:

    An important motivation for the development of inductive Databases and Query languages for Data Mining is that such an approach will increase the flexibility with which Data Mining can be performed. By integrating Data Mining more closely into a Database Querying framework, separate steps such as Data preprocessing, Data Mining, and postprocessing of the results, can all be handled using one Query language. In this chapter, we compare 6 existing Data Mining Query languages, all extensions of the standard relational Query language SQL, from this point of view: how flexible are they with respect to the tasks they can be used for, and how easily can those tasks be performed? We verify whether and how these languages can be used to perform four prototypical Data Mining tasks in the domain of itemset and associa- tion rule Mining, and summarize their stronger and weaker points. Besides offering a comparative evaluation of different Data Mining Query languages, this chapter also provides a motivation for the next chapter, where a deeper integration of Data Mining into Databases is proposed, one that does not rely on the development of a new Query language, but where the structure of the Database itself is extended.

  • Inductive Databases and Constraint-Based Data Mining - A Practical Comparative Study Of Data Mining Query Languages
    Inductive Databases and Constraint-Based Data Mining, 2010
    Co-Authors: Hendrik Blockeel, Toon Calders, Elisa Fromont, Bart Goethals, Adriana Prado, Céline Robardet
    Abstract:

    An important motivation for the development of inductive Databases and Query languages for Data Mining is that such an approach will increase the flexibility with which Data Mining can be performed. By integrating Data Mining more closely into a Database Querying framework, separate steps such as Data preprocessing, Data Mining, and postprocessing of the results, can all be handled using one Query language. In this chapter, we compare six existing Data Mining Query languages, all extensions of the standard relational Query language SQL, from this point of view: how flexible are they with respect to the tasks they can be used for, and how easily can those tasks be performed? We verify whether and how these languages can be used to perform four prototypical Data Mining tasks in the domain of itemset and association rule Mining, and summarize their stronger and weaker points. Besides offering a comparative evaluation of different Data Mining Query languages, this chapter also provides a motivation for a following chapter, where a deeper integration of Data Mining into Databases is proposed, one that does not rely on the development of a new Query language, but where the structure of the Database itself is extended.

  • a practical comparative study of Data Mining Query languages
    Inductive databases and constraint-based data mining Džeroski Sašo [edit.]; et al. [edit.], 2010
    Co-Authors: Hendrik Blockeel, Toon Calders, Elisa Fromont, Bart Goethals, Adriana Prado, Céline Robardet
    Abstract:

    An important motivation for the development of inductive Databases and Query languages for Data Mining is that such an approach will increase the flexibility with which Data Mining can be performed. By integrating Data Mining more closely into a Database Querying framework, separate steps such as Data preprocessing, Data Mining, and postprocessing of the results, can all be handled using one Query language. In this chapter, we compare six existing Data Mining Query languages, all extensions of the standard relational Query language SQL, from this point of view: how flexible are they with respect to the tasks they can be used for, and how easily can those tasks be performed? We verify whether and how these languages can be used to perform four prototypical Data Mining tasks in the domain of itemset and association rule Mining, and summarize their stronger and weaker points. Besides offering a comparative evaluation of different Data Mining Query languages, this chapter also provides a motivation for a following chapter, where a deeper integration of Data Mining into Databases is proposed, one that does not rely on the development of a new Query language, but where the structure of the Database itself is extended.

Marek Wojciechowski - One of the best experts on this subject based on the ideXlab platform.

  • on multiple Query optimization in Data Mining
    Knowledge Discovery and Data Mining, 2005
    Co-Authors: Marek Wojciechowski, Maciej Zakrzewicz
    Abstract:

    Traditional multiple Query optimization methods focus on identifying common subexpressions in sets of relational queries and on constructing their global execution plans. In this paper we consider the problem of optimizing sets of Data Mining queries submitted to a Knowledge Discovery Management System. We describe the problem of Data Mining Query scheduling and we introduce a new algorithm called CCAgglomerative to schedule Data Mining queries for frequent itemset discovery.

  • PAKDD - On multiple Query optimization in Data Mining
    Advances in Knowledge Discovery and Data Mining, 2005
    Co-Authors: Marek Wojciechowski, Maciej Zakrzewicz
    Abstract:

    Traditional multiple Query optimization methods focus on identifying common subexpressions in sets of relational queries and on constructing their global execution plans. In this paper we consider the problem of optimizing sets of Data Mining queries submitted to a Knowledge Discovery Management System. We describe the problem of Data Mining Query scheduling and we introduce a new algorithm called CCAgglomerative to schedule Data Mining queries for frequent itemset discovery.

  • a study on answering a Data Mining Query using a materialized view
    International Symposium on Computer and Information Sciences, 2004
    Co-Authors: Maciej Zakrzewicz, Mikolaj Morzy, Marek Wojciechowski
    Abstract:

    One of the classic Data Mining problems is discovery of frequent itemsets. This problem particularly attracts Database community as it resembles traditional Database Querying. In this paper we consider a Data Mining system which supports storing of previous Query results in the form of materialized Data Mining views. While numerous works have shown that reusing results of previous frequent itemset queries can significantly improve performance of Data Mining Query processing, a thorough study of possible differences between the current Query and a materialized view has not been presented yet. In this paper we classify possible differences into six classes, provide I/O cost analysis for all the classes, and experimentally evaluate the most promising one.

  • Data Mining Query Scheduling for Apriori Common Counting
    2004
    Co-Authors: Marek Wojciechowski, Maciej Zakrzewicz
    Abstract:

    In this paper we consider concurrent execution of multiple Data Mining queries. If such Data Mining queries operate on similar parts of the Database, then their overall I/O cost can be reduced by integrating their Data retrieval operations. The integration requires that many Data Mining queries are present in memory at the same time. If the memory size is not sufficient to hold all the Data Mining queries, then the queries must be scheduled into multiple phases of loading and processing. We discuss the problem of Data Mining Query scheduling and propose a heuristic algorithm to efficiently schedule the Data Mining queries into phases.

  • evaluation of the mine merge method for Data Mining Query processing
    ADBIS (Local Proceedings), 2004
    Co-Authors: Marek Wojciechowski, Maciej Zakrzewicz
    Abstract:

    In this paper we consider concurrent execution of multiple Data Mining queries in the context of discovery of frequent itemsets. If such Data Mining queries operate on similar parts of the Database, then their overall I/O cost can be reduced by transforming the set of Data Mining queries into another set of non-overlapping queries, whose results can be used to efficiently answer the original queries. We discuss the problem of multiple Data Mining Query optimization and experimentally evaluate the Mine Merge algorithm to efficiently execute sets of Data Mining queries.

Cyrille Masson - One of the best experts on this subject based on the ideXlab platform.

  • The Data Mining and Knowledge Discovery Handbook - Data Mining Query Languages
    Data Mining and Knowledge Discovery Handbook, 2009
    Co-Authors: Jeanfrancois Boulicaut, Cyrille Masson
    Abstract:

    Many Data Mining algorithms enable to extract different types of patterns from Data (e.g., local patterns like itemsets and association rules, models like classifiers). To support the whole knowledge discovery process, we need for integrated systems which can deal either with patterns and Data. The inductive Database approach has emerged as an unifying framework for such systems. Following this Database perspective, knowledge discovery processes become Querying processes for which Query languages have to be designed. In the prolific field of association rule Mining, different proposals of Query languages have been made to support the more or less declarative specification of both Data and pattern manipulations. In this chapter, we survey some of these proposals. It enables to identify nowadays shortcomings and to point out some promising directions of research in this area.

  • Data Mining Query languages
    The Data Mining and Knowledge Discovery Handbook, 2005
    Co-Authors: Jeanfrancois Boulicaut, Cyrille Masson
    Abstract:

    Many Data Mining algorithms enable to extract different types of patterns from Data (e.g., local patterns like itemsets and association rules, models like classifiers). To support the whole knowledge discovery process, we need for integrated systems which can deal either with patterns and Data. The inductive Database approach has emerged as an unifying framework for such systems. Following this Database perspective, knowledge discovery processes become Querying processes for which Query languages have to be designed. In the prolific field of association rule Mining, different proposals of Query languages have been made to support the more or less declarative specification of both Data and pattern manipulations. In this chapter, we survey some of these proposals. It enables to identify nowadays shortcomings and to point out some promising directions of research in this area.

  • Optimizing subset queries: a step towards SQL-based inductive Databases for itemsets
    2004
    Co-Authors: Cyrille Masson, Céline Robardet, Jeanfrancois Boulicaut
    Abstract:

    Storing sets and Querying them (e.g., subset queries that provide all supersets of a given set) is known to be difficult within relational Databases. We consider that being able to Query efficiently both transactional Data and materialized collections of sets by means of standard Query language is an important step towards practical inductive Databases. Indeed, Data Mining Query languages like MINE RULE extract collections of association rules whose components are sets into relational tables. Post-processing phases often use extensively subset queries and cannot be efficiently processed by SQL servers. In this paper, we propose a new way to handle sets from relational Databases. It is based on a Data structure that partially encodes the inclusion relationship between sets. It is an extension of the hash group bitmap key proposed by Morzy et al. [8]. Our experiments show an interesting improvement for these useful subset queries.

  • SAC - Optimizing subset queries: a step towards SQL-based inductive Databases for itemsets
    Proceedings of the 2004 ACM symposium on Applied computing - SAC '04, 2004
    Co-Authors: Cyrille Masson, Céline Robardet, Jeanfrancois Boulicaut
    Abstract:

    Storing sets and Querying them (e.g., subset queries that provide all supersets of a given set) is known to be difficult within relational Databases. We consider that being able to Query efficiently both transactional Data and materialized collections of sets by means of standard Query language is an important step towards practical inductive Databases. Indeed, Data Mining Query languages like MINE RULE extract collections of association rules whose components are sets into relational tables. Post-processing phases often use extensively subset queries and cannot be efficiently processed by SQL servers. In this paper, we propose a new way to handle sets from relational Databases. It is based on a Data structure that partially encodes the inclusion relationship between sets. It is an extension of the hash group bitmap key proposed by Morzy et al. [8]. Our experiments show an interesting improvement for these useful subset queries.

Jeanfrancois Boulicaut - One of the best experts on this subject based on the ideXlab platform.

  • The Data Mining and Knowledge Discovery Handbook - Data Mining Query Languages
    Data Mining and Knowledge Discovery Handbook, 2009
    Co-Authors: Jeanfrancois Boulicaut, Cyrille Masson
    Abstract:

    Many Data Mining algorithms enable to extract different types of patterns from Data (e.g., local patterns like itemsets and association rules, models like classifiers). To support the whole knowledge discovery process, we need for integrated systems which can deal either with patterns and Data. The inductive Database approach has emerged as an unifying framework for such systems. Following this Database perspective, knowledge discovery processes become Querying processes for which Query languages have to be designed. In the prolific field of association rule Mining, different proposals of Query languages have been made to support the more or less declarative specification of both Data and pattern manipulations. In this chapter, we survey some of these proposals. It enables to identify nowadays shortcomings and to point out some promising directions of research in this area.

  • Data Mining Query languages
    The Data Mining and Knowledge Discovery Handbook, 2005
    Co-Authors: Jeanfrancois Boulicaut, Cyrille Masson
    Abstract:

    Many Data Mining algorithms enable to extract different types of patterns from Data (e.g., local patterns like itemsets and association rules, models like classifiers). To support the whole knowledge discovery process, we need for integrated systems which can deal either with patterns and Data. The inductive Database approach has emerged as an unifying framework for such systems. Following this Database perspective, knowledge discovery processes become Querying processes for which Query languages have to be designed. In the prolific field of association rule Mining, different proposals of Query languages have been made to support the more or less declarative specification of both Data and pattern manipulations. In this chapter, we survey some of these proposals. It enables to identify nowadays shortcomings and to point out some promising directions of research in this area.

  • Optimizing subset queries: a step towards SQL-based inductive Databases for itemsets
    2004
    Co-Authors: Cyrille Masson, Céline Robardet, Jeanfrancois Boulicaut
    Abstract:

    Storing sets and Querying them (e.g., subset queries that provide all supersets of a given set) is known to be difficult within relational Databases. We consider that being able to Query efficiently both transactional Data and materialized collections of sets by means of standard Query language is an important step towards practical inductive Databases. Indeed, Data Mining Query languages like MINE RULE extract collections of association rules whose components are sets into relational tables. Post-processing phases often use extensively subset queries and cannot be efficiently processed by SQL servers. In this paper, we propose a new way to handle sets from relational Databases. It is based on a Data structure that partially encodes the inclusion relationship between sets. It is an extension of the hash group bitmap key proposed by Morzy et al. [8]. Our experiments show an interesting improvement for these useful subset queries.

  • SAC - Optimizing subset queries: a step towards SQL-based inductive Databases for itemsets
    Proceedings of the 2004 ACM symposium on Applied computing - SAC '04, 2004
    Co-Authors: Cyrille Masson, Céline Robardet, Jeanfrancois Boulicaut
    Abstract:

    Storing sets and Querying them (e.g., subset queries that provide all supersets of a given set) is known to be difficult within relational Databases. We consider that being able to Query efficiently both transactional Data and materialized collections of sets by means of standard Query language is an important step towards practical inductive Databases. Indeed, Data Mining Query languages like MINE RULE extract collections of association rules whose components are sets into relational tables. Post-processing phases often use extensively subset queries and cannot be efficiently processed by SQL servers. In this paper, we propose a new way to handle sets from relational Databases. It is based on a Data structure that partially encodes the inclusion relationship between sets. It is an extension of the hash group bitmap key proposed by Morzy et al. [8]. Our experiments show an interesting improvement for these useful subset queries.