Decision Tree Induction

The Experts below are selected from a list of 5037 Experts worldwide ranked by ideXlab platform

Rodrigo C Barros - One of the best experts on this subject based on the ideXlab platform.

Decision Tree Induction

2015

Co-Authors: Rodrigo C Barros, André C. P. L. F. De Carvalho, Alex A Freitas

Abstract:

Decision-Tree Induction algorithms are highly used in a variety of domains for knowledge discovery and pattern recognition. They have the advantage of producing a comprehensible classification/regression model and satisfactory accuracy levels in several application domains, such as medical diagnosis and credit risk assessment. In this chapter, we present in detail the most common approach for Decision-Tree Induction: top-down Induction (Sect. 2.3). Furthermore, we briefly comment on some alternative strategies for Induction of Decision Trees (Sect. 2.4). Our goal is to summarize the main design options one has to face when building Decision-Tree Induction algorithms. These design choices will be specially interesting when designing an evolutionary algorithm for evolving Decision-Tree Induction algorithms.

15 days free trial to Access Article
Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets

IEEE Transactions on Evolutionary Computation, 2014

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, Alex A Freitas, A C P L F De Carvalho

Abstract:

Decision-Tree Induction algorithms are widely used in machine learning applications in which the goal is to extract knowledge from data and present it in a graphically intuitive way. The most successful strategy for inducing Decision Trees is the greedy top-down recursive approach, which has been continuously improved by researchers over the past 40 years. In this paper, we propose a paradigm shift in the research of Decision Trees: instead of proposing a new manually designed method for inducing Decision Trees, we propose automatically designing Decision-Tree Induction algorithms tailored to a specific type of classification data set (or application domain). Following recent breakthroughs in the automatic design of machine learning algorithms, we propose a hyper-heuristic evolutionary algorithm called hyper-heuristic evolutionary algorithm for designing Decision-Tree algorithms (HEAD-DT) that evolves design components of top-down Decision-Tree Induction algorithms. By the end of the evolution, we expect HEAD-DT to generate a new and possibly better Decision-Tree algorithm for a given application domain. We perform extensive experiments in 35 real-world microarray gene expression data sets to assess the performance of HEAD-DT, and compare it with very well known Decision-Tree algorithms such as C4.5, CART, and REPTree. Results show that HEAD-DT is capable of generating algorithms that significantly outperform the baseline manually designed Decision-Tree algorithms regarding predictive accuracy and F-measure.

15 days free trial to Access Article
automatic design of Decision Tree Induction algorithms tailored to flexible receptor docking data

BMC Bioinformatics, 2012

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Duncan D Ruiz, Ana T Winck, Karina S Machado, Osmar Norberto De Souza

Abstract:

This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-Tree Induction algorithms have been successfully used in drug-design related applications, specially considering that Decision Trees are simple to understand, interpret, and validate. There are several Decision-Tree Induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of Decision-Tree Induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to Decision Tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating Decision-Tree Induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a Decision-Tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

15 days free trial to Access Article
a hyper heuristic evolutionary algorithm for automatically designing Decision Tree algorithms

Genetic and Evolutionary Computation Conference, 2012

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas

Abstract:

Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes a hyper-heuristic evolutionary algorithm for automatically generating Decision-Tree Induction algorithms, named HEAD-DT. We perform extensive experiments in 20 public data sets to assess the performance of HEAD-DT, and we compare it to traditional Decision-Tree algorithms such as C4.5 and CART. Results show that HEAD-DT can generate algorithms that significantly outperform C4.5 and CART regarding predictive accuracy and F-Measure.

15 days free trial to Access Article
a survey of evolutionary algorithms for Decision Tree Induction

Systems Man and Cybernetics, 2012

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas

Abstract:

This paper presents a survey of evolutionary algorithms that are designed for Decision-Tree Induction. In this context, most of the paper focuses on approaches that evolve Decision Trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of Decision-Tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and Decision Trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve Decision Trees and works that design Decision-Tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for Decision-Tree Induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

15 days free trial to Access Article

Alex A Freitas - One of the best experts on this subject based on the ideXlab platform.

Decision Tree Induction

2015

Co-Authors: Rodrigo C Barros, André C. P. L. F. De Carvalho, Alex A Freitas

Abstract:

Decision-Tree Induction algorithms are highly used in a variety of domains for knowledge discovery and pattern recognition. They have the advantage of producing a comprehensible classification/regression model and satisfactory accuracy levels in several application domains, such as medical diagnosis and credit risk assessment. In this chapter, we present in detail the most common approach for Decision-Tree Induction: top-down Induction (Sect. 2.3). Furthermore, we briefly comment on some alternative strategies for Induction of Decision Trees (Sect. 2.4). Our goal is to summarize the main design options one has to face when building Decision-Tree Induction algorithms. These design choices will be specially interesting when designing an evolutionary algorithm for evolving Decision-Tree Induction algorithms.

15 days free trial to Access Article
Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets

IEEE Transactions on Evolutionary Computation, 2014

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, Alex A Freitas, A C P L F De Carvalho

Abstract:

Decision-Tree Induction algorithms are widely used in machine learning applications in which the goal is to extract knowledge from data and present it in a graphically intuitive way. The most successful strategy for inducing Decision Trees is the greedy top-down recursive approach, which has been continuously improved by researchers over the past 40 years. In this paper, we propose a paradigm shift in the research of Decision Trees: instead of proposing a new manually designed method for inducing Decision Trees, we propose automatically designing Decision-Tree Induction algorithms tailored to a specific type of classification data set (or application domain). Following recent breakthroughs in the automatic design of machine learning algorithms, we propose a hyper-heuristic evolutionary algorithm called hyper-heuristic evolutionary algorithm for designing Decision-Tree algorithms (HEAD-DT) that evolves design components of top-down Decision-Tree Induction algorithms. By the end of the evolution, we expect HEAD-DT to generate a new and possibly better Decision-Tree algorithm for a given application domain. We perform extensive experiments in 35 real-world microarray gene expression data sets to assess the performance of HEAD-DT, and compare it with very well known Decision-Tree algorithms such as C4.5, CART, and REPTree. Results show that HEAD-DT is capable of generating algorithms that significantly outperform the baseline manually designed Decision-Tree algorithms regarding predictive accuracy and F-measure.

15 days free trial to Access Article
a hyper heuristic evolutionary algorithm for automatically designing Decision Tree algorithms

Genetic and Evolutionary Computation Conference, 2012

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas

Abstract:

Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes a hyper-heuristic evolutionary algorithm for automatically generating Decision-Tree Induction algorithms, named HEAD-DT. We perform extensive experiments in 20 public data sets to assess the performance of HEAD-DT, and we compare it to traditional Decision-Tree algorithms such as C4.5 and CART. Results show that HEAD-DT can generate algorithms that significantly outperform C4.5 and CART regarding predictive accuracy and F-Measure.

15 days free trial to Access Article
a survey of evolutionary algorithms for Decision Tree Induction

Systems Man and Cybernetics, 2012

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas

Abstract:

This paper presents a survey of evolutionary algorithms that are designed for Decision-Tree Induction. In this context, most of the paper focuses on approaches that evolve Decision Trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of Decision-Tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and Decision Trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve Decision Trees and works that design Decision-Tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for Decision-Tree Induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

15 days free trial to Access Article
towards the automatic design of Decision Tree Induction algorithms

Genetic and Evolutionary Computation Conference, 2011

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Alex A Freitas

Abstract:

Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes two different approaches for automatically generating generic Decision Tree Induction algorithms. Both approaches are based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. We also propose guidelines to design interesting fitness functions for these evolutionary algorithms, which take into account the requirements and needs of the end-user.

15 days free trial to Access Article

Marcio P Basgalupp - One of the best experts on this subject based on the ideXlab platform.

Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets

IEEE Transactions on Evolutionary Computation, 2014

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, Alex A Freitas, A C P L F De Carvalho

Abstract:

Decision-Tree Induction algorithms are widely used in machine learning applications in which the goal is to extract knowledge from data and present it in a graphically intuitive way. The most successful strategy for inducing Decision Trees is the greedy top-down recursive approach, which has been continuously improved by researchers over the past 40 years. In this paper, we propose a paradigm shift in the research of Decision Trees: instead of proposing a new manually designed method for inducing Decision Trees, we propose automatically designing Decision-Tree Induction algorithms tailored to a specific type of classification data set (or application domain). Following recent breakthroughs in the automatic design of machine learning algorithms, we propose a hyper-heuristic evolutionary algorithm called hyper-heuristic evolutionary algorithm for designing Decision-Tree algorithms (HEAD-DT) that evolves design components of top-down Decision-Tree Induction algorithms. By the end of the evolution, we expect HEAD-DT to generate a new and possibly better Decision-Tree algorithm for a given application domain. We perform extensive experiments in 35 real-world microarray gene expression data sets to assess the performance of HEAD-DT, and compare it with very well known Decision-Tree algorithms such as C4.5, CART, and REPTree. Results show that HEAD-DT is capable of generating algorithms that significantly outperform the baseline manually designed Decision-Tree algorithms regarding predictive accuracy and F-measure.

15 days free trial to Access Article
automatic design of Decision Tree Induction algorithms tailored to flexible receptor docking data

BMC Bioinformatics, 2012

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Duncan D Ruiz, Ana T Winck, Karina S Machado, Osmar Norberto De Souza

Abstract:

This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-Tree Induction algorithms have been successfully used in drug-design related applications, specially considering that Decision Trees are simple to understand, interpret, and validate. There are several Decision-Tree Induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of Decision-Tree Induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to Decision Tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating Decision-Tree Induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a Decision-Tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

15 days free trial to Access Article
a hyper heuristic evolutionary algorithm for automatically designing Decision Tree algorithms

Genetic and Evolutionary Computation Conference, 2012

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas

Abstract:

Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes a hyper-heuristic evolutionary algorithm for automatically generating Decision-Tree Induction algorithms, named HEAD-DT. We perform extensive experiments in 20 public data sets to assess the performance of HEAD-DT, and we compare it to traditional Decision-Tree algorithms such as C4.5 and CART. Results show that HEAD-DT can generate algorithms that significantly outperform C4.5 and CART regarding predictive accuracy and F-Measure.

15 days free trial to Access Article
a survey of evolutionary algorithms for Decision Tree Induction

Systems Man and Cybernetics, 2012

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, A C P L F De Carvalho, Alex A Freitas

Abstract:

This paper presents a survey of evolutionary algorithms that are designed for Decision-Tree Induction. In this context, most of the paper focuses on approaches that evolve Decision Trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of Decision-Tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and Decision Trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve Decision Trees and works that design Decision-Tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for Decision-Tree Induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

15 days free trial to Access Article
towards the automatic design of Decision Tree Induction algorithms

Genetic and Evolutionary Computation Conference, 2011

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Alex A Freitas

Abstract:

Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes two different approaches for automatically generating generic Decision Tree Induction algorithms. Both approaches are based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. We also propose guidelines to design interesting fitness functions for these evolutionary algorithms, which take into account the requirements and needs of the end-user.

15 days free trial to Access Article

H Hogeveen - One of the best experts on this subject based on the ideXlab platform.

detection of clinical mastitis with sensor data from automatic milking systems is improved by using Decision Tree Induction

Journal of Dairy Science, 2010

Co-Authors: C Kamphuis, H Mollenhorst, H Hogeveen, J A P Heesterbeek

Abstract:

The objective was to develop and validate a clinical mastitis (CM) detection model by means of Decision-Tree Induction. For farmers milking with an automatic milking system (AMS), it is desirable that the detection model has a high level of sensitivity (Se), especially for more severe cases of CM, at a very high specificity (Sp). In addition, an alert for CM should be generated preferably at the quarter milking (QM) at which the CM infection is visible for the first time. Data were collected from 9 Dutch dairy herds milking automatically during a 2.5-yr period. Data included sensor data (electrical conductivity, color, and yield) at the QM level and visual observations of quarters with CM recorded by the farmers. Visual observations of quarters with CM were combined with sensor data of the most recent automatic milking recorded for that same quarter, within a 24-h time window before the visual assessment time. Sensor data of 3.5 million QM were collected, of which 348 QM were combined with a CM observation. Data were divided into a training set, including two-thirds of all data, and a test set. Cows in the training set were not included in the test set and vice versa. A Decision-Tree model was trained using only clear examples of healthy (n = 24,717) or diseased (n = 243) QM. The model was tested on 105 QM with CM and a random sample of 50,000 QM without CM. While keeping the Se at a level comparable to that of models currently used by AMS, the Decision-Tree model was able to decrease the number of false-positive alerts by more than 50%. At an Sp of 99%, 40% of the CM cases were detected. Sixty-four percent of the severe CM cases were detected and only 12.5% of the CM that were scored as watery milk. The Se increased considerably from 40% to 66.7% when the time window increased from less than 24 h before the CM observation, to a time window from 24 h before to 24 h after the CM observation. Even at very wide time windows, however, it was impossible to reach an Se of 100%. This indicates the inability to detect all CM cases based on sensor data alone. Sensitivity levels varied largely when the Decision Tree was validated per herd. This trend was confirmed when Decision Trees were trained using data from 8 herds and tested on data from the ninth herd. This indicates that when using the Decision Tree as a generic CM detection model in practice, some herds will continue having difficulties in detecting CM using mastitis alert lists, whereas others will perform well.

15 days free trial to Access Article
Decision Tree Induction to detect clinical mastitis with automatic milking

Computers and Electronics in Agriculture, 2010

Co-Authors: C Kamphuis, H Mollenhorst, Ad Feelders, D Pietersma, H Hogeveen

Abstract:

This study explored the potential of using Decision-Tree Induction to develop models for the detection of clinical mastitis with automatic milking. Sensor data (including electrical conductivity and colour) of over 711,000 quarter milkings were collected from December 2006 till August 2007 at six Dutch dairy herds milking automatically. Farmer recordings of quarter milkings with visible signs of mastitis were considered as gold standard positive cases (n=97), quarter milkings that were recorded as being visually normal as gold standard negatives (n=339). Randomly chosen quarter milkings that were not visually checked, that were outside a 2-week range before or after a gold standard positive case and that were not manually or automatically separated were added to end up with 3000 gold standard negatives. Decision Trees, with varying confidence factors and cost matrices to study their effect on performance characteristics, were developed with the probability of having clinical mastitis for each quarter milking as output. Detection performance of Decision Trees was estimated using 10-fold cross-validation. Evaluated performance characteristics were the sensitivity and specificity, both calculated at a threshold value of 0.50 for the probability estimate for clinical mastitis. The transformed partial area under the curve was used to summarise the diagnostic ability of Decision Trees within a specified range of interest (specificity >=97%). Receiver operating characteristic curves visualized all combinations of sensitivity and specificity of Decision Trees within this range. Results showed that Decision Trees are easy to interpret when visualised. The lower the confidence factor, the smaller the Decision Trees: a cost insensitive Decision Tree with a confidence factor of 0.05 needed only eleven test nodes to classify all 3097 records with a sensitivity of 23.7% and a specificity of 99.2%. The Decision Tree with default parameter settings showed a transformed partial area under the curve value of 0.6420. By introducing costs for false negative classifications this value increased to 0.6476. At a specificity level of 99%, the Decision Tree with the highest transformed partial area under the curve value showed a sensitivity of 29.8%. Detection performances of the different Decision Trees were comparable with those of models currently used by automatic milking systems. As it was possible to achieve these results with the use of a rather simple Decision Tree algorithm, we believe that Decision Tree Induction shows potential for detecting clinical mastitis with automatic milking.

15 days free trial to Access Article

André C. P. L. F. De Carvalho - One of the best experts on this subject based on the ideXlab platform.

hyper parameter tuning of a Decision Tree Induction algorithm

Brazilian Conference on Intelligent Systems, 2016

Co-Authors: Rafael Gomes Mantovani, Ricardo Cerri, Tomas Horvath, Joaquin Vanschoren, André C. P. L. F. De Carvalho

Abstract:

Supervised classification is the most studied task in Machine Learning. Among the many algorithms used in such task, Decision Tree algorithms are a popular choice, since they are robust and efficient to construct. Moreover, they have the advantage of producing comprehensible models and satisfactory accuracy levels in several application domains. Like most of the Machine Leaning methods, these algorithms have some hyper-parameters whose values directly affect the performance of the induced models. Due to the high number of possibilities for these hyper-parameter values, several studies use optimization techniques to find a good set of solutions in order to produce classifiers with good predictive performance. This study investigates how sensitive Decision Trees are to a hyper-parameter optimization process. Four different tuning techniques were explored to adjust J48 Decision Tree algorithm hyper-parameters. In total, experiments using 102 heterogeneous datasets analyzed the tuning effect on the induced models. The experimental results show that even presenting a low average improvement over all datasets, in most of the cases the improvement is statistically significant.

15 days free trial to Access Article
Decision Tree Induction

2015

Co-Authors: Rodrigo C Barros, André C. P. L. F. De Carvalho, Alex A Freitas

Abstract:

Decision-Tree Induction algorithms are highly used in a variety of domains for knowledge discovery and pattern recognition. They have the advantage of producing a comprehensible classification/regression model and satisfactory accuracy levels in several application domains, such as medical diagnosis and credit risk assessment. In this chapter, we present in detail the most common approach for Decision-Tree Induction: top-down Induction (Sect. 2.3). Furthermore, we briefly comment on some alternative strategies for Induction of Decision Trees (Sect. 2.4). Our goal is to summarize the main design options one has to face when building Decision-Tree Induction algorithms. These design choices will be specially interesting when designing an evolutionary algorithm for evolving Decision-Tree Induction algorithms.

15 days free trial to Access Article
automatic design of Decision Tree Induction algorithms tailored to flexible receptor docking data

BMC Bioinformatics, 2012

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Duncan D Ruiz, Ana T Winck, Karina S Machado, Osmar Norberto De Souza

Abstract:

This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-Tree Induction algorithms have been successfully used in drug-design related applications, specially considering that Decision Trees are simple to understand, interpret, and validate. There are several Decision-Tree Induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of Decision-Tree Induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to Decision Tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating Decision-Tree Induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a Decision-Tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

15 days free trial to Access Article
towards the automatic design of Decision Tree Induction algorithms

Genetic and Evolutionary Computation Conference, 2011

Co-Authors: Rodrigo C Barros, Marcio P Basgalupp, André C. P. L. F. De Carvalho, Alex A Freitas

Abstract:

Decision Tree Induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing Decision Trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes two different approaches for automatically generating generic Decision Tree Induction algorithms. Both approaches are based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. We also propose guidelines to design interesting fitness functions for these evolutionary algorithms, which take into account the requirements and needs of the end-user.

15 days free trial to Access Article
A bottom-up oblique Decision Tree Induction algorithm

2011 11th International Conference on Intelligent Systems Design and Applications, 2011

Co-Authors: Rodrigo C Barros, Ricardo Cerri, Pablo A. Jaskowiak, André C. P. L. F. De Carvalho

Abstract:

Decision Tree Induction algorithms are widely used in knowledge discovery and data mining, specially in scenarios where model comprehensibility is desired. A variation of the traditional univariate approach is the so-called oblique Decision Tree, which allows multivariate tests in its non-terminal nodes. Oblique Decision Trees can model Decision boundaries that are oblique to the attribute axes, whereas univariate Trees can only perform axis-parallel splits. The majority of the oblique and univariate Decision Tree Induction algorithms perform a top-down strategy for growing the Tree, relying on an impurity-based measure for splitting nodes. In this paper, we propose a novel bottom-up algorithm for inducing oblique Trees named BUTIA. It does not require an impurity-measure for dividing nodes, since we know a priori the data resulting from each split. For generating the splitting hyperplanes, our algorithm implements a support vector machine solution, and a clustering algorithm is used for generating the initial leaves. We compare BUTIA to traditional univariate and oblique Decision Tree algorithms, C4.5, CART, OC1 and FT, as well as to a standard SVM implementation, using real gene expression benchmark data. Experimental results show the effectiveness of the proposed approach in several cases.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Rodrigo C Barros - One of the best experts on this subject based on the ideXlab platform.

Decision Tree Induction

Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets

automatic design of Decision Tree Induction algorithms tailored to flexible receptor docking data

a hyper heuristic evolutionary algorithm for automatically designing Decision Tree algorithms

a survey of evolutionary algorithms for Decision Tree Induction

Alex A Freitas - One of the best experts on this subject based on the ideXlab platform.

Decision Tree Induction

Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets

a hyper heuristic evolutionary algorithm for automatically designing Decision Tree algorithms

a survey of evolutionary algorithms for Decision Tree Induction

towards the automatic design of Decision Tree Induction algorithms

Marcio P Basgalupp - One of the best experts on this subject based on the ideXlab platform.

Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets

automatic design of Decision Tree Induction algorithms tailored to flexible receptor docking data

a hyper heuristic evolutionary algorithm for automatically designing Decision Tree algorithms

a survey of evolutionary algorithms for Decision Tree Induction

towards the automatic design of Decision Tree Induction algorithms

H Hogeveen - One of the best experts on this subject based on the ideXlab platform.

detection of clinical mastitis with sensor data from automatic milking systems is improved by using Decision Tree Induction

Decision Tree Induction to detect clinical mastitis with automatic milking

André C. P. L. F. De Carvalho - One of the best experts on this subject based on the ideXlab platform.

hyper parameter tuning of a Decision Tree Induction algorithm

Decision Tree Induction

automatic design of Decision Tree Induction algorithms tailored to flexible receptor docking data

towards the automatic design of Decision Tree Induction algorithms

A bottom-up oblique Decision Tree Induction algorithm

Decision Tree Induction

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Rodrigo C Barros - One of the best experts on this subject based on the ideXlab platform.

Alex A Freitas - One of the best experts on this subject based on the ideXlab platform.

Marcio P Basgalupp - One of the best experts on this subject based on the ideXlab platform.

H Hogeveen - One of the best experts on this subject based on the ideXlab platform.

André C. P. L. F. De Carvalho - One of the best experts on this subject based on the ideXlab platform.

Related terms