The Experts below are selected from a list of 15 Experts worldwide ranked by ideXlab platform
Carolin Strobl - One of the best experts on this subject based on the ideXlab platform.
-
an auc based permutation variable importance measure for random forests
BMC Bioinformatics, 2013Co-Authors: Silke Janitza, Carolin Strobl, Annelaure BoulesteixAbstract:The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R Package Party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html
-
psychotree - Recursive partitioning based on psychometric models: Version 0.12-1
2011Co-Authors: Achim Zeileis, Florian Wickelmaier, Carolin Strobl, Julia KopfAbstract:Recursive partitioning based on psychometric models,employing the general MOB algo- rithm (from Package Party) to obtain Bradley-Terry trees and Rasch trees.
-
Why and how to use random forest variable importance measures (and how you shouldn't)
Statistik.Tu-Dortmund.De, 2007Co-Authors: Carolin StroblAbstract:Random forests are becoming increasingly popular in many scientific fields, especially in genetics and bioinformatics, for assessing the importance of predictor variables in high dimensional settings. Advantages of random forests in these areas are that they can cope with “small n large p” problems, complex interactions and even highly correlated predictor variables. The talk gives a short introduction to the rationale of random forests and the their variable importance measures as well as the two random forest implementations offered in the R system for statistical computing: randomForest in the Package of the same name by Breiman et al. (2006) and cforest in the Package Party by Hothorn et al. (2008). Moreover, recent research issues are addressed: • Solutions are presented for bias in random forest variable importance measures towards, e.g., predictor variables with many categories (Strobl, Boulesteix, Zeileis, and Hothorn 2007) and correlated predictor variables (Archer and Kimes 2008). • Currently suggested tests for random forest variable importance measures (Breiman and Cutler 2008; Rodenburg et al. 2008) are critically discussed in an outlook.
Annelaure Boulesteix - One of the best experts on this subject based on the ideXlab platform.
-
an auc based permutation variable importance measure for random forests
BMC Bioinformatics, 2013Co-Authors: Silke Janitza, Carolin Strobl, Annelaure BoulesteixAbstract:The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R Package Party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html
Yan Jiao - One of the best experts on this subject based on the ideXlab platform.
-
Influences of gillnet fishing on lake sturgeon bycatch in Lake Erie and implications for conservation
Endangered Species Research, 2011Co-Authors: Yan JiaoAbstract:Three candidate classification tree models were constructed to estimate the probability of obtaining lake sturgeon Acipenser fulvescens bycatch under specific environmental and gillnet fishing conditions in Lake Erie. This analysis was based on a fishery-independent survey, the Lake Erie Partnership Index Fishing Survey (PIS), from 1989 to 2008. The 3 classification tree models included 1 conditional-inference classification tree generated by the R-Package 'Party' and 2 exhaus- tive-search-based classification trees generated by the R-Package 'tree' and 'rpart,' respectively. The discriminative performance of each tree was evaluated by the receiver operating characteristic (ROC) curve and the area under the curve (AUC) using a jackknife approach. Most of the lake stur- geon captured in the PIS were juveniles. The 3 tree models identified fishing basin and gear type as factors related to gillnet fishing that had important influences on lake sturgeon bycatch and implica- tions for lake sturgeon conservation and management. Results indicated that the west basin of Lake Erie could be a hotspot for lake sturgeon bycatch in the commercial gillnet fisheries, and the use of bottom gillnets might increase the probability of catching lake sturgeon. A model comparison indi- cated that the conditional-inference tree model could provide unbiased predictor selection and bet- ter discriminative performance in predicting the probability of taking lake sturgeon as bycatch.
Silke Janitza - One of the best experts on this subject based on the ideXlab platform.
-
an auc based permutation variable importance measure for random forests
BMC Bioinformatics, 2013Co-Authors: Silke Janitza, Carolin Strobl, Annelaure BoulesteixAbstract:The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R Package Party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html
Kurt Hornik - One of the best experts on this subject based on the ideXlab platform.
-
Let's Have a Party! An Open-Source Toolbox for Recursive Partytioning
2007Co-Authors: Torsten Hothorn, Achim Zeileis, Kurt HornikAbstract:Package Party, implemented in the R system for statistical computing, provides basic classes and methods for recursive partitioning along with reference implementations for three recently-suggested tree-based learners: conditional inference trees and forests, and model-based recursive partitioning.