Decision Trees

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 58431 Experts worldwide ranked by ideXlab platform

Margo Seltze - One of the best experts on this subject based on the ideXlab platform.

  • generalized and scalable optimal sparse Decision Trees
    arXiv: Learning, 2020
    Co-Authors: Chudi Zhong, Cynthia Rudi, Margo Seltze
    Abstract:

    Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find optimal Decision Trees. These new techniques have the potential to trigger a paradigm shift where it is possible to construct sparse Decision Trees to efficiently optimize a variety of objective functions without relying on greedy splitting and pruning heuristics that often lead to suboptimal solutions. The contribution in this work is to provide a general framework for Decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables. We present techniques that produce optimal Decision Trees over a variety of objectives including F-score, AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables and speeds up Decision tree construction by several orders of magnitude relative to the state-of-the art.

  • optimal sparse Decision Trees
    arXiv: Learning, 2019
    Co-Authors: Cynthia Rudi, Margo Seltze
    Abstract:

    Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued Decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: Decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of Decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal Decision Trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. Our experiments highlight advantages in scalability, speed, and proof of optimality.

  • optimal sparse Decision Trees
    Neural Information Processing Systems, 2019
    Co-Authors: Cynthia Rudi, Margo Seltze
    Abstract:

    Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued Decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: Decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of Decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal Decision Trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. We highlight possible steps to improving the scalability and speed of future generations of this algorithm based on insights from our theory and experiments.

Jennifer Alaine Blue - One of the best experts on this subject based on the ideXlab platform.

  • a support vector machine approach to Decision Trees
    International Joint Conference on Neural Network, 1998
    Co-Authors: K Bennett, Jennifer Alaine Blue
    Abstract:

    Key ideas from statistical learning theory and support vector machines are generalized to Decision Trees. A support vector machine is used for each Decision in the tree. The "optimal" Decision tree is characterized, and both a primal and dual space formulation for constructing the tree are proposed. The result is a method for generating logically simple Decision Trees with multivariate linear, nonlinear or linear Decisions. By varying the kernel function used, the Decisions may consist of linear threshold units, polynomials, sigmoidal neural networks, or radial basis function networks. The preliminary results indicate that the method produces simple Trees that generalize well with respect to other Decision tree algorithms and single support vector machines.

Xue Z Wang - One of the best experts on this subject based on the ideXlab platform.

  • inductive data mining automatic generation of Decision Trees from data for qsar modelling and process historical data analysis
    Computer-aided chemical engineering, 2011
    Co-Authors: Chao Y, Frances V Buontempo, Xue Z Wang
    Abstract:

    A new inductive data mining method for automatic generation of Decision Trees from data (GPTree) is presented. Compared with other Decision tree induction techniques that are based upon recursive partitioning employing greedy searches to choose the best splitting attribute and value at each node therefore will necessarily miss regions of the search space, GPTree can overcome the problem. In addition, the approach is extended to a new method (YAdapt) that models the original continuous endpoint by adaptively finding suitable ranges to describe the endpoints during the tree induction process, removing the need for discretisation prior to tree induction and allowing the ordinal nature of the endpoint to be taken into account in the models built. A strategy for further improving the predictive performance for previously unseen data is investigated that uses multiple Decision Trees, i.e., a Decision forest, and a majority voting strategy to give predictions (GPForest). The methods were applied to QSAR (quantitative structure – activity relationships) modelling for eco-toxicity prediction of chemicals and to the analysis of a historical database for a wastewater treatment plant.

  • inductive data mining automatic generation of Decision Trees from data for qsar modelling and process historical data analysis
    International Journal of Modelling Identification and Control, 2011
    Co-Authors: Chao Y, Frances V Buontempo, Xue Z Wang
    Abstract:

    A new inductive data mining method for automatic generation of Decision Trees from data (GPTree) is presented. Compared with other Decision tree induction techniques that are based upon recursive partitioning employing greedy searches to choose the best splitting attribute and value at each node therefore will necessarily miss regions of the search space, GPTree can overcome the problem. In addition, the approach is extended to a new method (YAdapt) that models the original continuous endpoint by adaptively finding suitable ranges to describe the endpoints during the tree induction process, removing the need for discretisation prior to tree induction and allowing the ordinal nature of the endpoint to be taken into account in the models built. A strategy for further improving the predictive performance for previously unseen data is investigated that uses multiple Decision Trees, i.e., a Decision forest, and a majority voting strategy to give predictions (GPForest). The methods were applied to QSAR (quantit...

  • inductive data mining based on genetic programming automatic generation of Decision Trees from data for process historical data analysis
    Computers & Chemical Engineering, 2009
    Co-Authors: Chao Y, Xue Z Wang
    Abstract:

    An inductive data mining algorithm based on genetic programming, GPForest, is introduced for automatic construction of Decision Trees and applied to the analysis of process historical data. GPForest not only outperforms traditional Decision tree generation methods that are based on a greedy search strategy therefore necessarily miss regions of the search space, but more importantly generates multiple Trees in each experimental run. In addition, by varying the initial values of parameters, more Decision Trees can be generated in new experiments. From the multiple Decision Trees generated, those with high fitness values are selected to form a Decision forest. For predictive purpose, the Decision forest instead of a single tree is used and a voting strategy is employed which allows the combination of the predictions of all Decision Trees in the forest in order to generate the final prediction. It was demonstrated that in comparison with Decision tree methods in the literature, GPForest gives much improved performance.

James S Thorp - One of the best experts on this subject based on the ideXlab platform.

  • Decision Trees for real time transient stability prediction
    IEEE Transactions on Power Systems, 1994
    Co-Authors: S Rovnyak, S Kretsinge, James S Thorp
    Abstract:

    The ability to rapidly acquire synchronized phasor measurements from around the system opens up new possibilities for power system protection and control. This paper demonstrates how Decision Trees can be constructed offline and then utilized online for predicting transient stability in real-time. Primary features of the method include building a single tree for all fault locations, using a short window of realistic-precision post-fault phasor measurements for the prediction, and testing robustness to variations in the operating point. Several candidate Decision Trees are tested on 40,800 faults from 50 randomly generated operating points on the New England 39 bus test system. >

Cynthia Rudi - One of the best experts on this subject based on the ideXlab platform.

  • generalized and scalable optimal sparse Decision Trees
    arXiv: Learning, 2020
    Co-Authors: Chudi Zhong, Cynthia Rudi, Margo Seltze
    Abstract:

    Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find optimal Decision Trees. These new techniques have the potential to trigger a paradigm shift where it is possible to construct sparse Decision Trees to efficiently optimize a variety of objective functions without relying on greedy splitting and pruning heuristics that often lead to suboptimal solutions. The contribution in this work is to provide a general framework for Decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables. We present techniques that produce optimal Decision Trees over a variety of objectives including F-score, AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables and speeds up Decision tree construction by several orders of magnitude relative to the state-of-the art.

  • optimal sparse Decision Trees
    arXiv: Learning, 2019
    Co-Authors: Cynthia Rudi, Margo Seltze
    Abstract:

    Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued Decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: Decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of Decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal Decision Trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. Our experiments highlight advantages in scalability, speed, and proof of optimality.

  • optimal sparse Decision Trees
    Neural Information Processing Systems, 2019
    Co-Authors: Cynthia Rudi, Margo Seltze
    Abstract:

    Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued Decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: Decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of Decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal Decision Trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. We highlight possible steps to improving the scalability and speed of future generations of this algorithm based on insights from our theory and experiments.