Mining Process

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 145815 Experts worldwide ranked by ideXlab platform

Kwekumuata Oseibryson - One of the best experts on this subject based on the ideXlab platform.

  • profiling internet banking users a knowledge discovery in data Mining Process model based approach
    Information Systems Frontiers, 2015
    Co-Authors: Gunjan Mansingh, Kwekumuata Oseibryson, Lila Rao, Annette Mills
    Abstract:

    Analysing datasets using data Mining techniques can enhance decision making in organizations. However, to ensure that the full potential of these techniques is realised it is important that decision makers understand there are Knowledge Discovery and Data Mining (KDDM) Processes that are mature enough to be adopted. This paper demonstrates the benefits of using a KDDM Process to evaluate survey data for internet banking users in Jamaica which includes demographic as well as attitudinal and behavioral variables. The major benefits of following this Process include the selection of a set of models, rather than a single model, which are more relevant to the business/research objectives and use of a more targeted knowledge discovery Process as the data Mining analyst is now directed to consider the effects the decisions in each phase will have on subsequent phases. This leads to more relevant knowledge being extracted from the data Mining Process.

  • toward an integrated knowledge discovery and data Mining Process model
    Knowledge Engineering Review, 2010
    Co-Authors: Sumana Sharma, Kwekumuata Oseibryson
    Abstract:

    The knowledge discovery and data Mining (KDDM) Process models describe the various phases (e.g. business understanding, data understanding, data preparation, modeling, evaluation and deployment) of the KDDM Process. They act as a roadmap for implementation of the KDDM Process by presenting a list of tasks for executing the various phases. The checklist approach of describing the tasks is not adequately supported by appropriate tools, which specify ‘how’ the particular task can be implemented. This may result in tasks not being implemented. Another disadvantage is that the long checklist does not capture or leverage the dependencies that exist among the various tasks of the same and different phases. This not only makes the Process cumbersome to implement, but also hinders possibilities for semi-automation of certain tasks. Given that each task in the Process model serves an important goal and even affects the execution of related tasks due to the dependencies, these limitations are likely to negatively affect the efficiency and effectiveness of KDDM projects. This paper proposes an improved KDDM Process model that overcomes these shortcomings by prescribing tools for supporting each task as well as identifying and leveraging dependencies among tasks for semi-automation of tasks, wherever possible.

  • towards supporting expert evaluation of clustering results using a data Mining Process model
    Information Sciences, 2010
    Co-Authors: Kwekumuata Oseibryson
    Abstract:

    Clustering is a popular non-directed learning data Mining technique for partitioning a dataset into a set of clusters (i.e. a segmentation). Although there are many clustering algorithms, none is superior on all datasets, and so it is never clear which algorithm and which parameter settings are the most appropriate for a given dataset. This suggests that an appropriate approach to clustering should involve the application of multiple clustering algorithms with different parameter settings and a non-taxing approach for comparing the various segmentations that would be generated by these algorithms. In this paper we are concerned with the situation where a domain expert has to evaluate several segmentations in order to determine the most appropriate segmentation (set of clusters) based on his/her specified objective(s). We illustrate how a data Mining Process model could be applied to address this problem.

Kyuseok Shim - One of the best experts on this subject based on the ideXlab platform.

  • Mining sequential patterns with regular expression constraints
    IEEE Transactions on Knowledge and Data Engineering, 2002
    Co-Authors: Minos Garofalakis, Rajeev Rastogi, Kyuseok Shim
    Abstract:

    Discovering sequential patterns is an important problem in data Mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional sequential pattern Mining systems provide users with only a very restricted mechanism (based on minimum support) for specifying patterns of interest. As a consequence, the pattern Mining Process is typically characterized by lack of focus and users often end up paying inordinate computational costs just to be inundated with an overwhelming number of useless results. We propose the use of Regular Expressions (REs) as a flexible constraint specification tool that enables user-controlled focus to be incorporated into the pattern Mining Process. We develop a family of novel algorithms (termed SPIRIT-Sequential Pattern Mining with Regular expression consTraints) for Mining frequent sequential patterns that also satisfy user-specified RE constraints. The main distinguishing factor among the proposed schemes is the degree to which the RE constraints are enforced to prune the search space of patterns during computation. Our solutions provide valuable insights into the trade-offs that arise when constraints that do not subscribe to nice properties (like anti monotonicity) are integrated into the Mining Process.

  • spirit sequential pattern Mining with regular expression constraints
    Very Large Data Bases, 1999
    Co-Authors: Minos Garofalakis, Rajeev Rastogi, Kyuseok Shim
    Abstract:

    Discovering sequential patterns is an important problem in data Mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional Mining systems provide users with only a very restricted mechanism (based on minimum support) for specifying patterns of interest. In this paper, we propose the use of Regular Expressions (REs) as a flexible constraint specification tool that enables user-controlled focus to be incorporated into the pattern Mining Process. We develop a family of novel algorithms (termed SPIRIT ‐ Sequential Pattern Mining with Regular expressIon consTraints) for Mining frequent sequential patterns that also satisfy user-specified RE constraints. The main distinguishing factor among the proposed schemes is the degree to which the RE constraints are enforced to prune the search space of patterns during computation. Our solutions provide valuable insights into the tradeoffs that arise when constraints that do not subscribe to nice properties (like anti-monotonicity) are integrated into the Mining Process. A quantitative exploration of these tradeoffs is conducted through an extensive experimental study on synthetic and real-life data sets.

Richi Nayak - One of the best experts on this subject based on the ideXlab platform.

  • a user driven data Mining Process model and learning system
    Database Systems for Advanced Applications, 2008
    Co-Authors: Richi Nayak
    Abstract:

    This paper deals with the problem of using the data Mining models in a real-world situation where the user can not provide all the inputs with which the predictive model is built. A learning system framework, Query Based Learning System (QBLS), is developed for improving the performance of the predictive models in practice where not all inputs are available for querying to the system. The automatic feature selection algorithm called Query Based Feature Selection (QBFS) is developed for selecting features to obtain a balance between the relative minimum subset of features and the relative maximum classification accuracy. Performance of the QBLS system and the QBFS algorithm is successfully demonstrated with a real-world application.

Petr Musilek - One of the best experts on this subject based on the ideXlab platform.

  • a survey of knowledge discovery and data Mining Process models
    Knowledge Engineering Review, 2006
    Co-Authors: Lukasz Kurgan, Petr Musilek
    Abstract:

    Knowledge Discovery and Data Mining is a very dynamic research and development area that is reaching maturity. As such, it requires stable and well-defined foundations, which are well understood and popularized throughout the community. This survey presents a historical overview, description and future directions concerning a standard for a Knowledge Discovery and Data Mining Process model. It presents a motivation for use and a comprehensive comparison of several leading Process models, and discusses their applications to both academic and industrial problems. The main goal of this review is the consolidation of the research in this area. The survey also proposes to enhance existing models by embedding other current standards to enable automation and interoperability of the entire Process.

Jia Wei Han - One of the best experts on this subject based on the ideXlab platform.

  • Direct discriminative pattern Mining for effective classification
    Proceedings - International Conference on Data Engineering, 2008
    Co-Authors: Hong Cheng, Xifeng Yan, Jia Wei Han
    Abstract:

    — The application of frequent patterns in classification has demonstrated its power in recent studies. It often adopts a two-step approach: frequent pattern (or classification rule) min-ing followed by feature selection (or rule ranking). However, this two-step Process could be computationally expensive, especially when the problem scale is large or the minimum support is low. It was observed that frequent pattern Mining usually produces a huge number of " patterns " that could not only slow down the Mining Process but also make feature selection hard to complete. In this paper, we propose a direct discriminative pattern Mining approach, DDPMine, to tackle the efficiency issue arising from the two-step approach. DDPMine performs a branch-and-bound search for directly Mining discriminative patterns without generating the complete pattern set. Instead of selecting best patterns in a batch, we introduce a " feature-centered " Mining approach that generates discriminative patterns sequentially on a progressively shrinking FP-tree by incrementally eliminating training instances. The instance elimination effectively reduces the problem size iteratively and expedites the Mining Process. Empirical results show that DDPMine achieves orders of magni-tude speedup without any downgrade of classification accuracy. It outperforms the state-of-the-art associative classification methods in terms of both accuracy and efficiency.

  • gprune a constraint pushing framework for graph pattern Mining
    Knowledge Discovery and Data Mining, 2007
    Co-Authors: Feida Zhu, Xifeng Yan, Jia Wei Han
    Abstract:

    In graph Mining applications, there has been an increasingly strong urge for imposing user-specified constraints on the Mining results. However, unlike most traditional itemset constraints, structural constraints, such as density and diameter of a graph, are very hard to be pushed deep into the Mining Process. In this paper, we give the first comprehensive study on the pruning properties of both traditional and structural constraints aiming to reduce not only the pattern search space but the data search space as well. A new general framework, called gPrune, is proposed to incorporate all the constraints in such a way that they recursively reinforce each other through the entire Mining Process. A new concept, Pattern-inseparable Data-antimonotonicity, is proposed to handle the structural constraints unique in the context of graph, which, combined with known pruning properties, provides a comprehensive and unified classification framework for structural constraints. The exploration of these antimonotonicities in the context of graph pattern Mining is a significant extension to the known classification of constraints, and deepens our understanding of the pruning properties of structural graph constraints.

  • h mine fast and space preserving frequent pattern Mining in large databases
    Iie Transactions, 2007
    Co-Authors: Jian Pei, Jia Wei Han, Shojiro Nishio, Shiwei Tang, Dongqing Yang
    Abstract:

    In this study, we propose a simple and novel data structure using hyper-links, H-struct, and a new Mining algorithm, H-mine, which takes advantage of this data structure and dynamically adjusts links in the Mining Process. A distinct feature of this method is that it has a very limited and precisely predictable main memory cost and runs very quickly in memory-based settings. Moreover, it can be scaled up to very large databases using database partitioning. When the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the Mining Process. Our study shows that H-mine has an excellent performance for various kinds of data, outperforms currently available algorithms in different settings, and is highly scalable to Mining large databases. This study also proposes a new data Mining methodology, space-preserving Mining, which may have a major impact on the future development of efficient and scalable data Mining methods. †Decreased

  • ac close efficiently Mining approximate closed itemsets by core pattern recovery
    International Conference on Data Mining, 2006
    Co-Authors: Hong Cheng, Jia Wei Han
    Abstract:

    Recent studies have proposed methods to discover approximate frequent itemsets in the presence of random noise. By relaxing the rigid requirement of exact frequent pattern Mining, some interesting patterns, which would previously be fragmented by exact pattern Mining methods due to the random noise or measurement error, are successfully recovered. Unfortunately, a large number of "uninteresting" candidates are explored as well during the Mining Process, as a result of the relaxed pattern Mining methodology. This severely slows down the Mining Process. Even worse, it is hard for an end user to distinguish the recovered interesting patterns from these uninteresting ones. In this paper, we propose an efficient algorithm AC-Close to recover the approximate closed itemsets from "core patterns". By focusing on the so-called core patterns, integrated with a top-down Mining and several effective pruning strategies, the algorithm narrows down the search space to those potentially interesting ones. Experimental results show that AC-Close substantially outperforms the previously proposed method in terms of efficiency, while delivers a similar set of interesting recovered patterns.

  • h mine hyper structure Mining of frequent patterns in large databases
    International Conference on Data Mining, 2001
    Co-Authors: Jian Pei, Jia Wei Han, Shojiro Nishio, Shiwei Tang, Dongqing Yang
    Abstract:

    Methods for efficient Mining of frequent patterns have been studied extensively by many researchers. However, the previously proposed methods still encounter some performance bottlenecks when Mining databases with different data characteristics, such as dense vs. sparse, long vs. short patterns, memory-based vs. disk-based, etc. In this study, we propose a simple and novel hyper-linked data structure, H-struct and a new Mining algorithm, H-mine, which takes advantage of this data structure and dynamically adjusts links in the Mining Process. A distinct feature of this method is that it has very limited and precisely predictable space overhead and runs really fast in memory-based setting. Moreover it can be scaled up to very large databases by database partitioning, and when the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the Mining Process. Our study shows that H-mine has high performance in various kinds of data, outperforms the previously developed algorithms in different settings, and is highly scalable in Mining large databases. This study also proposes a new data Mining methodology, space-preserving Mining, which may have strong impact in the future development of efficient and scalable data Mining methods.