Decision Tree Induction

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 5037 Experts worldwide ranked by ideXlab platform

Rodrigo C Barros - One of the best experts on this subject based on the ideXlab platform.

  • Decision Tree Induction
    2015
    Co-Authors: Rodrigo C Barros, André C. P. L. F. De Carvalho, Alex A Freitas
    Abstract:

    Decision-Tree Induction algorithms are highly used in a variety of domains for knowledge discovery and pattern recognition. They have the advantage of producing a comprehensible classification/regression model and satisfactory accuracy levels in several application domains, such as medical diagnosis and credit risk assessment. In this chapter, we present in detail the most common approach for Decision-Tree Induction: top-down Induction (Sect. 2.3). Furthermore, we briefly comment on some alternative strategies for Induction of Decision Trees (Sect. 2.4). Our goal is to summarize the main design options one has to face when building Decision-Tree Induction algorithms. These design choices will be specially interesting when designing an evolutionary algorithm for evolving Decision-Tree Induction algorithms.

  • Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets
    IEEE Transactions on Evolutionary Computation, 2014
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, Alex A Freitas, A C P L F De Carvalho
    Abstract:

    Decision-Tree Induction algorithms are widely used in machine learning applications in which the goal is to extract knowledge from data and present it in a graphically intuitive way. The most successful strategy for inducing Decision Trees is the greedy top-down recursive approach, which has been continuously improved by researchers over the past 40 years. In this paper, we propose a paradigm shift in the research of Decision Trees: instead of proposing a new manually designed method for inducing Decision Trees, we propose automatically designing Decision-Tree Induction algorithms tailored to a specific type of classification data set (or application domain). Following recent breakthroughs in the automatic design of machine learning algorithms, we propose a hyper-heuristic evolutionary algorithm called hyper-heuristic evolutionary algorithm for designing Decision-Tree algorithms (HEAD-DT) that evolves design components of top-down Decision-Tree Induction algorithms. By the end of the evolution, we expect HEAD-DT to generate a new and possibly better Decision-Tree algorithm for a given application domain. We perform extensive experiments in 35 real-world microarray gene expression data sets to assess the performance of HEAD-DT, and compare it with very well known Decision-Tree algorithms such as C4.5, CART, and REPTree. Results show that HEAD-DT is capable of generating algorithms that significantly outperform the baseline manually designed Decision-Tree algorithms regarding predictive accuracy and F-measure.

  • automatic design of Decision Tree Induction algorithms tailored to flexible receptor docking data
    BMC Bioinformatics, 2012
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Duncan D Ruiz, Ana T Winck, Karina S Machado, Osmar Norberto De Souza
    Abstract:

    This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-Tree Induction algorithms have been successfully used in drug-design related applications, specially considering that Decision Trees are simple to understand, interpret, and validate. There are several Decision-Tree Induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of Decision-Tree Induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to Decision Tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating Decision-Tree Induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a Decision-Tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

  • a hyper heuristic evolutionary algorithm for automatically designing Decision Tree algorithms
    Genetic and Evolutionary Computation Conference, 2012
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas
    Abstract:

    Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes a hyper-heuristic evolutionary algorithm for automatically generating Decision-Tree Induction algorithms, named HEAD-DT. We perform extensive experiments in 20 public data sets to assess the performance of HEAD-DT, and we compare it to traditional Decision-Tree algorithms such as C4.5 and CART. Results show that HEAD-DT can generate algorithms that significantly outperform C4.5 and CART regarding predictive accuracy and F-Measure.

  • a survey of evolutionary algorithms for Decision Tree Induction
    Systems Man and Cybernetics, 2012
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas
    Abstract:

    This paper presents a survey of evolutionary algorithms that are designed for Decision-Tree Induction. In this context, most of the paper focuses on approaches that evolve Decision Trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of Decision-Tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and Decision Trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve Decision Trees and works that design Decision-Tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for Decision-Tree Induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

Alex A Freitas - One of the best experts on this subject based on the ideXlab platform.

  • Decision Tree Induction
    2015
    Co-Authors: Rodrigo C Barros, André C. P. L. F. De Carvalho, Alex A Freitas
    Abstract:

    Decision-Tree Induction algorithms are highly used in a variety of domains for knowledge discovery and pattern recognition. They have the advantage of producing a comprehensible classification/regression model and satisfactory accuracy levels in several application domains, such as medical diagnosis and credit risk assessment. In this chapter, we present in detail the most common approach for Decision-Tree Induction: top-down Induction (Sect. 2.3). Furthermore, we briefly comment on some alternative strategies for Induction of Decision Trees (Sect. 2.4). Our goal is to summarize the main design options one has to face when building Decision-Tree Induction algorithms. These design choices will be specially interesting when designing an evolutionary algorithm for evolving Decision-Tree Induction algorithms.

  • Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets
    IEEE Transactions on Evolutionary Computation, 2014
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, Alex A Freitas, A C P L F De Carvalho
    Abstract:

    Decision-Tree Induction algorithms are widely used in machine learning applications in which the goal is to extract knowledge from data and present it in a graphically intuitive way. The most successful strategy for inducing Decision Trees is the greedy top-down recursive approach, which has been continuously improved by researchers over the past 40 years. In this paper, we propose a paradigm shift in the research of Decision Trees: instead of proposing a new manually designed method for inducing Decision Trees, we propose automatically designing Decision-Tree Induction algorithms tailored to a specific type of classification data set (or application domain). Following recent breakthroughs in the automatic design of machine learning algorithms, we propose a hyper-heuristic evolutionary algorithm called hyper-heuristic evolutionary algorithm for designing Decision-Tree algorithms (HEAD-DT) that evolves design components of top-down Decision-Tree Induction algorithms. By the end of the evolution, we expect HEAD-DT to generate a new and possibly better Decision-Tree algorithm for a given application domain. We perform extensive experiments in 35 real-world microarray gene expression data sets to assess the performance of HEAD-DT, and compare it with very well known Decision-Tree algorithms such as C4.5, CART, and REPTree. Results show that HEAD-DT is capable of generating algorithms that significantly outperform the baseline manually designed Decision-Tree algorithms regarding predictive accuracy and F-measure.

  • a hyper heuristic evolutionary algorithm for automatically designing Decision Tree algorithms
    Genetic and Evolutionary Computation Conference, 2012
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas
    Abstract:

    Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes a hyper-heuristic evolutionary algorithm for automatically generating Decision-Tree Induction algorithms, named HEAD-DT. We perform extensive experiments in 20 public data sets to assess the performance of HEAD-DT, and we compare it to traditional Decision-Tree algorithms such as C4.5 and CART. Results show that HEAD-DT can generate algorithms that significantly outperform C4.5 and CART regarding predictive accuracy and F-Measure.

  • a survey of evolutionary algorithms for Decision Tree Induction
    Systems Man and Cybernetics, 2012
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas
    Abstract:

    This paper presents a survey of evolutionary algorithms that are designed for Decision-Tree Induction. In this context, most of the paper focuses on approaches that evolve Decision Trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of Decision-Tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and Decision Trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve Decision Trees and works that design Decision-Tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for Decision-Tree Induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

  • towards the automatic design of Decision Tree Induction algorithms
    Genetic and Evolutionary Computation Conference, 2011
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Alex A Freitas
    Abstract:

    Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes two different approaches for automatically generating generic Decision Tree Induction algorithms. Both approaches are based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. We also propose guidelines to design interesting fitness functions for these evolutionary algorithms, which take into account the requirements and needs of the end-user.

Marcio P Basgalupp - One of the best experts on this subject based on the ideXlab platform.

  • Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets
    IEEE Transactions on Evolutionary Computation, 2014
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, Alex A Freitas, A C P L F De Carvalho
    Abstract:

    Decision-Tree Induction algorithms are widely used in machine learning applications in which the goal is to extract knowledge from data and present it in a graphically intuitive way. The most successful strategy for inducing Decision Trees is the greedy top-down recursive approach, which has been continuously improved by researchers over the past 40 years. In this paper, we propose a paradigm shift in the research of Decision Trees: instead of proposing a new manually designed method for inducing Decision Trees, we propose automatically designing Decision-Tree Induction algorithms tailored to a specific type of classification data set (or application domain). Following recent breakthroughs in the automatic design of machine learning algorithms, we propose a hyper-heuristic evolutionary algorithm called hyper-heuristic evolutionary algorithm for designing Decision-Tree algorithms (HEAD-DT) that evolves design components of top-down Decision-Tree Induction algorithms. By the end of the evolution, we expect HEAD-DT to generate a new and possibly better Decision-Tree algorithm for a given application domain. We perform extensive experiments in 35 real-world microarray gene expression data sets to assess the performance of HEAD-DT, and compare it with very well known Decision-Tree algorithms such as C4.5, CART, and REPTree. Results show that HEAD-DT is capable of generating algorithms that significantly outperform the baseline manually designed Decision-Tree algorithms regarding predictive accuracy and F-measure.

  • automatic design of Decision Tree Induction algorithms tailored to flexible receptor docking data
    BMC Bioinformatics, 2012
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Duncan D Ruiz, Ana T Winck, Karina S Machado, Osmar Norberto De Souza
    Abstract:

    This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-Tree Induction algorithms have been successfully used in drug-design related applications, specially considering that Decision Trees are simple to understand, interpret, and validate. There are several Decision-Tree Induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of Decision-Tree Induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to Decision Tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating Decision-Tree Induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a Decision-Tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

  • a hyper heuristic evolutionary algorithm for automatically designing Decision Tree algorithms
    Genetic and Evolutionary Computation Conference, 2012
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas
    Abstract:

    Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes a hyper-heuristic evolutionary algorithm for automatically generating Decision-Tree Induction algorithms, named HEAD-DT. We perform extensive experiments in 20 public data sets to assess the performance of HEAD-DT, and we compare it to traditional Decision-Tree algorithms such as C4.5 and CART. Results show that HEAD-DT can generate algorithms that significantly outperform C4.5 and CART regarding predictive accuracy and F-Measure.

  • a survey of evolutionary algorithms for Decision Tree Induction
    Systems Man and Cybernetics, 2012
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas
    Abstract:

    This paper presents a survey of evolutionary algorithms that are designed for Decision-Tree Induction. In this context, most of the paper focuses on approaches that evolve Decision Trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of Decision-Tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and Decision Trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve Decision Trees and works that design Decision-Tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for Decision-Tree Induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

  • towards the automatic design of Decision Tree Induction algorithms
    Genetic and Evolutionary Computation Conference, 2011
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Alex A Freitas
    Abstract:

    Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes two different approaches for automatically generating generic Decision Tree Induction algorithms. Both approaches are based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. We also propose guidelines to design interesting fitness functions for these evolutionary algorithms, which take into account the requirements and needs of the end-user.

H Hogeveen - One of the best experts on this subject based on the ideXlab platform.

  • detection of clinical mastitis with sensor data from automatic milking systems is improved by using Decision Tree Induction
    Journal of Dairy Science, 2010
    Co-Authors: C Kamphuis, H Mollenhorst, H Hogeveen, J A P Heesterbeek
    Abstract:

    The objective was to develop and validate a clinical mastitis (CM) detection model by means of Decision-Tree Induction. For farmers milking with an automatic milking system (AMS), it is desirable that the detection model has a high level of sensitivity (Se), especially for more severe cases of CM, at a very high specificity (Sp). In addition, an alert for CM should be generated preferably at the quarter milking (QM) at which the CM infection is visible for the first time. Data were collected from 9 Dutch dairy herds milking automatically during a 2.5-yr period. Data included sensor data (electrical conductivity, color, and yield) at the QM level and visual observations of quarters with CM recorded by the farmers. Visual observations of quarters with CM were combined with sensor data of the most recent automatic milking recorded for that same quarter, within a 24-h time window before the visual assessment time. Sensor data of 3.5 million QM were collected, of which 348 QM were combined with a CM observation. Data were divided into a training set, including two-thirds of all data, and a test set. Cows in the training set were not included in the test set and vice versa. A Decision-Tree model was trained using only clear examples of healthy (n = 24,717) or diseased (n = 243) QM. The model was tested on 105 QM with CM and a random sample of 50,000 QM without CM. While keeping the Se at a level comparable to that of models currently used by AMS, the Decision-Tree model was able to decrease the number of false-positive alerts by more than 50%. At an Sp of 99%, 40% of the CM cases were detected. Sixty-four percent of the severe CM cases were detected and only 12.5% of the CM that were scored as watery milk. The Se increased considerably from 40% to 66.7% when the time window increased from less than 24 h before the CM observation, to a time window from 24 h before to 24 h after the CM observation. Even at very wide time windows, however, it was impossible to reach an Se of 100%. This indicates the inability to detect all CM cases based on sensor data alone. Sensitivity levels varied largely when the Decision Tree was validated per herd. This trend was confirmed when Decision Trees were trained using data from 8 herds and tested on data from the ninth herd. This indicates that when using the Decision Tree as a generic CM detection model in practice, some herds will continue having difficulties in detecting CM using mastitis alert lists, whereas others will perform well.

  • Decision Tree Induction to detect clinical mastitis with automatic milking
    Computers and Electronics in Agriculture, 2010
    Co-Authors: C Kamphuis, H Mollenhorst, Ad Feelders, D Pietersma, H Hogeveen
    Abstract:

    This study explored the potential of using Decision-Tree Induction to develop models for the detection of clinical mastitis with automatic milking. Sensor data (including electrical conductivity and colour) of over 711,000 quarter milkings were collected from December 2006 till August 2007 at six Dutch dairy herds milking automatically. Farmer recordings of quarter milkings with visible signs of mastitis were considered as gold standard positive cases (n=97), quarter milkings that were recorded as being visually normal as gold standard negatives (n=339). Randomly chosen quarter milkings that were not visually checked, that were outside a 2-week range before or after a gold standard positive case and that were not manually or automatically separated were added to end up with 3000 gold standard negatives. Decision Trees, with varying confidence factors and cost matrices to study their effect on performance characteristics, were developed with the probability of having clinical mastitis for each quarter milking as output. Detection performance of Decision Trees was estimated using 10-fold cross-validation. Evaluated performance characteristics were the sensitivity and specificity, both calculated at a threshold value of 0.50 for the probability estimate for clinical mastitis. The transformed partial area under the curve was used to summarise the diagnostic ability of Decision Trees within a specified range of interest (specificity >=97%). Receiver operating characteristic curves visualized all combinations of sensitivity and specificity of Decision Trees within this range. Results showed that Decision Trees are easy to interpret when visualised. The lower the confidence factor, the smaller the Decision Trees: a cost insensitive Decision Tree with a confidence factor of 0.05 needed only eleven test nodes to classify all 3097 records with a sensitivity of 23.7% and a specificity of 99.2%. The Decision Tree with default parameter settings showed a transformed partial area under the curve value of 0.6420. By introducing costs for false negative classifications this value increased to 0.6476. At a specificity level of 99%, the Decision Tree with the highest transformed partial area under the curve value showed a sensitivity of 29.8%. Detection performances of the different Decision Trees were comparable with those of models currently used by automatic milking systems. As it was possible to achieve these results with the use of a rather simple Decision Tree algorithm, we believe that Decision Tree Induction shows potential for detecting clinical mastitis with automatic milking.

André C. P. L. F. De Carvalho - One of the best experts on this subject based on the ideXlab platform.

  • hyper parameter tuning of a Decision Tree Induction algorithm
    Brazilian Conference on Intelligent Systems, 2016
    Co-Authors: Rafael Gomes Mantovani, Ricardo Cerri, Tomas Horvath, Joaquin Vanschoren, André C. P. L. F. De Carvalho
    Abstract:

    Supervised classification is the most studied task in Machine Learning. Among the many algorithms used in such task, Decision Tree algorithms are a popular choice, since they are robust and efficient to construct. Moreover, they have the advantage of producing comprehensible models and satisfactory accuracy levels in several application domains. Like most of the Machine Leaning methods, these algorithms have some hyper-parameters whose values directly affect the performance of the induced models. Due to the high number of possibilities for these hyper-parameter values, several studies use optimization techniques to find a good set of solutions in order to produce classifiers with good predictive performance. This study investigates how sensitive Decision Trees are to a hyper-parameter optimization process. Four different tuning techniques were explored to adjust J48 Decision Tree algorithm hyper-parameters. In total, experiments using 102 heterogeneous datasets analyzed the tuning effect on the induced models. The experimental results show that even presenting a low average improvement over all datasets, in most of the cases the improvement is statistically significant.

  • Decision Tree Induction
    2015
    Co-Authors: Rodrigo C Barros, André C. P. L. F. De Carvalho, Alex A Freitas
    Abstract:

    Decision-Tree Induction algorithms are highly used in a variety of domains for knowledge discovery and pattern recognition. They have the advantage of producing a comprehensible classification/regression model and satisfactory accuracy levels in several application domains, such as medical diagnosis and credit risk assessment. In this chapter, we present in detail the most common approach for Decision-Tree Induction: top-down Induction (Sect. 2.3). Furthermore, we briefly comment on some alternative strategies for Induction of Decision Trees (Sect. 2.4). Our goal is to summarize the main design options one has to face when building Decision-Tree Induction algorithms. These design choices will be specially interesting when designing an evolutionary algorithm for evolving Decision-Tree Induction algorithms.

  • automatic design of Decision Tree Induction algorithms tailored to flexible receptor docking data
    BMC Bioinformatics, 2012
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Duncan D Ruiz, Ana T Winck, Karina S Machado, Osmar Norberto De Souza
    Abstract:

    This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-Tree Induction algorithms have been successfully used in drug-design related applications, specially considering that Decision Trees are simple to understand, interpret, and validate. There are several Decision-Tree Induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of Decision-Tree Induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to Decision Tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating Decision-Tree Induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a Decision-Tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

  • towards the automatic design of Decision Tree Induction algorithms
    Genetic and Evolutionary Computation Conference, 2011
    Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Alex A Freitas
    Abstract:

    Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes two different approaches for automatically generating generic Decision Tree Induction algorithms. Both approaches are based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. We also propose guidelines to design interesting fitness functions for these evolutionary algorithms, which take into account the requirements and needs of the end-user.

  • A bottom-up oblique Decision Tree Induction algorithm
    2011 11th International Conference on Intelligent Systems Design and Applications, 2011
    Co-Authors: Rodrigo C Barros, Ricardo Cerri, Pablo A. Jaskowiak, André C. P. L. F. De Carvalho
    Abstract:

    Decision Tree Induction algorithms are widely used in knowledge discovery and data mining, specially in scenarios where model comprehensibility is desired. A variation of the traditional univariate approach is the so-called oblique Decision Tree, which allows multivariate tests in its non-terminal nodes. Oblique Decision Trees can model Decision boundaries that are oblique to the attribute axes, whereas univariate Trees can only perform axis-parallel splits. The majority of the oblique and univariate Decision Tree Induction algorithms perform a top-down strategy for growing the Tree, relying on an impurity-based measure for splitting nodes. In this paper, we propose a novel bottom-up algorithm for inducing oblique Trees named BUTIA. It does not require an impurity-measure for dividing nodes, since we know a priori the data resulting from each split. For generating the splitting hyperplanes, our algorithm implements a support vector machine solution, and a clustering algorithm is used for generating the initial leaves. We compare BUTIA to traditional univariate and oblique Decision Tree algorithms, C4.5, CART, OC1 and FT, as well as to a standard SVM implementation, using real gene expression benchmark data. Experimental results show the effectiveness of the proposed approach in several cases.