Splitting Attribute

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 198 Experts worldwide ranked by ideXlab platform

Sunil Prabhakar - One of the best experts on this subject based on the ideXlab platform.

  • a rule based classification algorithm for uncertain data
    International Conference on Data Engineering, 2009
    Co-Authors: Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu
    Abstract:

    Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, outdated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. In this paper, we propose a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. This algorithm introduces new measures for generating, pruning and optimizing rules. These new measures are computed considering uncertain data interval and probability distribution function. Based on the new measures, the optimal Splitting Attribute and Splitting value can be identified and used for classification and prediction. The proposed uRule algorithm can process uncertainty in both numerical and categorical data. Our experimental results show that uRule has excellent performance even when data is highly uncertain.

  • ICDE - A Rule-Based Classification Algorithm for Uncertain Data
    2009 IEEE 25th International Conference on Data Engineering, 2009
    Co-Authors: Biao Qin, Yuni Xia, Sunil Prabhakar
    Abstract:

    Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, outdated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. In this paper, we propose a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. This algorithm introduces new measures for generating, pruning and optimizing rules. These new measures are computed considering uncertain data interval and probability distribution function. Based on the new measures, the optimal Splitting Attribute and Splitting value can be identified and used for classification and prediction. The proposed uRule algorithm can process uncertainty in both numerical and categorical data. Our experimental results show that uRule has excellent performance even when data is highly uncertain.

Biao Qin - One of the best experts on this subject based on the ideXlab platform.

  • a rule based classification algorithm for uncertain data
    International Conference on Data Engineering, 2009
    Co-Authors: Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu
    Abstract:

    Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, outdated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. In this paper, we propose a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. This algorithm introduces new measures for generating, pruning and optimizing rules. These new measures are computed considering uncertain data interval and probability distribution function. Based on the new measures, the optimal Splitting Attribute and Splitting value can be identified and used for classification and prediction. The proposed uRule algorithm can process uncertainty in both numerical and categorical data. Our experimental results show that uRule has excellent performance even when data is highly uncertain.

  • ICDE - A Rule-Based Classification Algorithm for Uncertain Data
    2009 IEEE 25th International Conference on Data Engineering, 2009
    Co-Authors: Biao Qin, Yuni Xia, Sunil Prabhakar
    Abstract:

    Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, outdated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. In this paper, we propose a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. This algorithm introduces new measures for generating, pruning and optimizing rules. These new measures are computed considering uncertain data interval and probability distribution function. Based on the new measures, the optimal Splitting Attribute and Splitting value can be identified and used for classification and prediction. The proposed uRule algorithm can process uncertainty in both numerical and categorical data. Our experimental results show that uRule has excellent performance even when data is highly uncertain.

Yicheng Tu - One of the best experts on this subject based on the ideXlab platform.

  • a rule based classification algorithm for uncertain data
    International Conference on Data Engineering, 2009
    Co-Authors: Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu
    Abstract:

    Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, outdated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. In this paper, we propose a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. This algorithm introduces new measures for generating, pruning and optimizing rules. These new measures are computed considering uncertain data interval and probability distribution function. Based on the new measures, the optimal Splitting Attribute and Splitting value can be identified and used for classification and prediction. The proposed uRule algorithm can process uncertainty in both numerical and categorical data. Our experimental results show that uRule has excellent performance even when data is highly uncertain.

Yuni Xia - One of the best experts on this subject based on the ideXlab platform.

  • a rule based classification algorithm for uncertain data
    International Conference on Data Engineering, 2009
    Co-Authors: Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu
    Abstract:

    Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, outdated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. In this paper, we propose a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. This algorithm introduces new measures for generating, pruning and optimizing rules. These new measures are computed considering uncertain data interval and probability distribution function. Based on the new measures, the optimal Splitting Attribute and Splitting value can be identified and used for classification and prediction. The proposed uRule algorithm can process uncertainty in both numerical and categorical data. Our experimental results show that uRule has excellent performance even when data is highly uncertain.

  • ICDE - A Rule-Based Classification Algorithm for Uncertain Data
    2009 IEEE 25th International Conference on Data Engineering, 2009
    Co-Authors: Biao Qin, Yuni Xia, Sunil Prabhakar
    Abstract:

    Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, outdated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. In this paper, we propose a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. This algorithm introduces new measures for generating, pruning and optimizing rules. These new measures are computed considering uncertain data interval and probability distribution function. Based on the new measures, the optimal Splitting Attribute and Splitting value can be identified and used for classification and prediction. The proposed uRule algorithm can process uncertainty in both numerical and categorical data. Our experimental results show that uRule has excellent performance even when data is highly uncertain.

Xue Z Wang - One of the best experts on this subject based on the ideXlab platform.

  • inductive data mining automatic generation of decision trees from data for qsar modelling and process historical data analysis
    International Journal of Modelling Identification and Control, 2011
    Co-Authors: Chao Y, Frances V Buontempo, Xue Z Wang
    Abstract:

    A new inductive data mining method for automatic generation of decision trees from data (GPTree) is presented. Compared with other decision tree induction techniques that are based upon recursive partitioning employing greedy searches to choose the best Splitting Attribute and value at each node therefore will necessarily miss regions of the search space, GPTree can overcome the problem. In addition, the approach is extended to a new method (YAdapt) that models the original continuous endpoint by adaptively finding suitable ranges to describe the endpoints during the tree induction process, removing the need for discretisation prior to tree induction and allowing the ordinal nature of the endpoint to be taken into account in the models built. A strategy for further improving the predictive performance for previously unseen data is investigated that uses multiple decision trees, i.e., a decision forest, and a majority voting strategy to give predictions (GPForest). The methods were applied to QSAR (quantit...

  • inductive data mining automatic generation of decision trees from data for qsar modelling and process historical data analysis
    Computer-aided chemical engineering, 2011
    Co-Authors: Chao Y, Frances V Buontempo, Xue Z Wang
    Abstract:

    A new inductive data mining method for automatic generation of decision trees from data (GPTree) is presented. Compared with other decision tree induction techniques that are based upon recursive partitioning employing greedy searches to choose the best Splitting Attribute and value at each node therefore will necessarily miss regions of the search space, GPTree can overcome the problem. In addition, the approach is extended to a new method (YAdapt) that models the original continuous endpoint by adaptively finding suitable ranges to describe the endpoints during the tree induction process, removing the need for discretisation prior to tree induction and allowing the ordinal nature of the endpoint to be taken into account in the models built. A strategy for further improving the predictive performance for previously unseen data is investigated that uses multiple decision trees, i.e., a decision forest, and a majority voting strategy to give predictions (GPForest). The methods were applied to QSAR (quantitative structure – activity relationships) modelling for eco-toxicity prediction of chemicals and to the analysis of a historical database for a wastewater treatment plant.

  • Induction of decision trees using genetic programming for modelling ecotoxicity data: adaptive discretization of real-valued endpoints.
    SAR and QSAR in environmental research, 2006
    Co-Authors: Xue Z Wang, Frances V Buontempo, A. Young, Daniel Osborn
    Abstract:

    Recent literature has demonstrated the applicability of genetic programming to induction of decision trees for modelling toxicity endpoints. Compared with other decision tree induction techniques that are based upon recursive partitioning employing greedy searches to choose the best Splitting Attribute and value at each node that will necessarily miss regions of the search space, the genetic programming based approach can overcome the problem. However, the method still requires the discretization of the often continuous-valued toxicity endpoints prior to the tree induction. A novel extension of this method, YAdapt, is introduced in this work which models the original continuous endpoint by adaptively finding suitable ranges to describe the endpoints during the tree induction process, removing the need for discretization prior to tree induction and allowing the ordinal nature of the endpoint to be taken into account in the models built.

  • Induction of decision trees using genetic programming for the development of SAR toxicity models
    2005
    Co-Authors: Xue Z Wang, Frances V Buontempo, Mulaisho Mwense, A. Young, Daniel Osborn
    Abstract:

    Automatic induction of decision tress and production rules from data to develop structure-activity relationship (SAR) models for toxicity prediction of chemicals has recently received much attention and the majority of methodologies reported in the literature are based upon recursive partitioning employing greedy searches to choose the best Splitting Attribute and value at each node. These approaches can be successful however the greedy search will necessarily miss regions of the search space. Recent literature has demonstrated the applicability of genetic programming to decision tree induction to overcome this problem. This paper presents a variant of this novel approach, using fewer mutation options and a simpler fitness function, demonstrating its utility in inducing decision trees for ecotoxicity data, via a case study of two datasets giving improved accuracy and generalisation ability over a popular decision tree inducer.

  • Genetic Programming for the Induction of Decision Trees to Model Ecotoxicity Data
    Journal of chemical information and modeling, 2005
    Co-Authors: Frances V Buontempo, Xue Z Wang, Mulaisho Mwense, Nigel Horan, And Anita Young, Daniel Osborn
    Abstract:

    Automatic induction of decision trees and production rules from data to develop structure-activity models for toxicity prediction has recently received much attention, and the majority of methodologies reported in the literature are based upon recursive partitioning employing greedy searches to choose the best Splitting Attribute and value at each node. These approaches can be successful; however, the greedy search will necessarily miss regions of the search space. Recent literature has demonstrated the applicability of genetic programming to decision tree induction to overcome this problem. This paper presents a variant of this novel approach, using fewer mutation options and a simpler fitness function, demonstrating its utility in inducing decision trees for ecotoxicity data, via a case study of two data sets giving improved accuracy and generalization ability over a popular decision tree inducer.