Selection Methods

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 577293 Experts worldwide ranked by ideXlab platform

Jeanphilippe Vert - One of the best experts on this subject based on the ideXlab platform.

  • the influence of feature Selection Methods on accuracy stability and interpretability of molecular signatures
    PLOS ONE, 2011
    Co-Authors: Anneclaire Haury, Pierre Gestraud, Jeanphilippe Vert
    Abstract:

    Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature Selection Methods. In this study we compare feature Selection Methods on public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. We observe that the feature Selection method has a significant influence on the accuracy, stability and interpretability of signatures. Surprisingly, complex wrapper and embedded Methods generally do not outperform simple univariate feature Selection Methods, and ensemble feature Selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results.

  • The influence of feature Selection Methods on accuracy, stability and interpretability of molecular signatures
    PLoS ONE, 2011
    Co-Authors: Anneclaire Haury, Pierre Gestraud, Jeanphilippe Vert
    Abstract:

    Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature Selection Methods. Methods: We compare 32 feature Selection Methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. Results: We observe that the feature Selection method has a significant influence on the accuracy, stability and interpretability of signatures. Simple filter Methods generally outperform more complex embedded or wrapper Methods, and ensemble feature Selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Availability: Code and data are publicly available at http://cbio.ensmp.fr/~ahaury/.

Anneclaire Haury - One of the best experts on this subject based on the ideXlab platform.

  • the influence of feature Selection Methods on accuracy stability and interpretability of molecular signatures
    PLOS ONE, 2011
    Co-Authors: Anneclaire Haury, Pierre Gestraud, Jeanphilippe Vert
    Abstract:

    Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature Selection Methods. In this study we compare feature Selection Methods on public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. We observe that the feature Selection method has a significant influence on the accuracy, stability and interpretability of signatures. Surprisingly, complex wrapper and embedded Methods generally do not outperform simple univariate feature Selection Methods, and ensemble feature Selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results.

  • The influence of feature Selection Methods on accuracy, stability and interpretability of molecular signatures
    PLoS ONE, 2011
    Co-Authors: Anneclaire Haury, Pierre Gestraud, Jeanphilippe Vert
    Abstract:

    Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature Selection Methods. Methods: We compare 32 feature Selection Methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. Results: We observe that the feature Selection method has a significant influence on the accuracy, stability and interpretability of signatures. Simple filter Methods generally outperform more complex embedded or wrapper Methods, and ensemble feature Selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Availability: Code and data are publicly available at http://cbio.ensmp.fr/~ahaury/.

Mao Hanpin - One of the best experts on this subject based on the ideXlab platform.

  • variables Selection Methods in near infrared spectroscopy
    Analytica Chimica Acta, 2010
    Co-Authors: Zou Xiaobo, Zhao Jiewen, Malcolm J W Povey, Mel Holmes, Mao Hanpin
    Abstract:

    Near-infrared (NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields, such as the petrochemical, pharmaceutical, environmental, clinical, agricultural, food and biomedical sectors during the past 15 years. A NIR spectrum of a sample is typically measured by modern scanning instruments at hundreds of equally spaced wavelengths. The large number of spectral variables in most data sets encountered in NIR spectral chemometrics often renders the prediction of a dependent variable unreliable. Recently, considerable effort has been directed towards developing and evaluating different procedures that objectively identify variables which contribute useful information and/or eliminate variables containing mostly noise. This review focuses on the variable Selection Methods in NIR spectroscopy. Selection Methods include some classical approaches, such as manual approach (knowledge based Selection), “Univariate” and “Sequential” Selection Methods; sophisticated Methods such as successive projections algorithm (SPA) and uninformative variable elimination (UVE), elaborate search-based strategies such as simulated annealing (SA), artificial neural networks (ANN) and genetic algorithms (GAs) and interval base algorithms such as interval partial least squares (iPLS), windows PLS and iterative PLS. Wavelength Selection with B-spline, Kalman filtering, Fisher's weights and Bayesian are also mentioned. Finally, the websites of some variable Selection software and toolboxes for non-commercial use are given.

  • variables Selection Methods in near infrared spectroscopy
    Analytica Chimica Acta, 2010
    Co-Authors: Zou Xiaobo, Zhao Jiewen, Malcolm J W Povey, Mel Holmes, Mao Hanpin
    Abstract:

    Abstract Near-infrared (NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields, such as the petrochemical, pharmaceutical, environmental, clinical, agricultural, food and biomedical sectors during the past 15 years. A NIR spectrum of a sample is typically measured by modern scanning instruments at hundreds of equally spaced wavelengths. The large number of spectral variables in most data sets encountered in NIR spectral chemometrics often renders the prediction of a dependent variable unreliable. Recently, considerable effort has been directed towards developing and evaluating different procedures that objectively identify variables which contribute useful information and/or eliminate variables containing mostly noise. This review focuses on the variable Selection Methods in NIR spectroscopy. Selection Methods include some classical approaches, such as manual approach (knowledge based Selection), “Univariate” and “Sequential” Selection Methods; sophisticated Methods such as successive projections algorithm (SPA) and uninformative variable elimination (UVE), elaborate search-based strategies such as simulated annealing (SA), artificial neural networks (ANN) and genetic algorithms (GAs) and interval base algorithms such as interval partial least squares (iPLS), windows PLS and iterative PLS. Wavelength Selection with B-spline, Kalman filtering, Fisher's weights and Bayesian are also mentioned. Finally, the websites of some variable Selection software and toolboxes for non-commercial use are given.

Pierre Gestraud - One of the best experts on this subject based on the ideXlab platform.

  • the influence of feature Selection Methods on accuracy stability and interpretability of molecular signatures
    PLOS ONE, 2011
    Co-Authors: Anneclaire Haury, Pierre Gestraud, Jeanphilippe Vert
    Abstract:

    Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature Selection Methods. In this study we compare feature Selection Methods on public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. We observe that the feature Selection method has a significant influence on the accuracy, stability and interpretability of signatures. Surprisingly, complex wrapper and embedded Methods generally do not outperform simple univariate feature Selection Methods, and ensemble feature Selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results.

  • The influence of feature Selection Methods on accuracy, stability and interpretability of molecular signatures
    PLoS ONE, 2011
    Co-Authors: Anneclaire Haury, Pierre Gestraud, Jeanphilippe Vert
    Abstract:

    Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature Selection Methods. Methods: We compare 32 feature Selection Methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. Results: We observe that the feature Selection method has a significant influence on the accuracy, stability and interpretability of signatures. Simple filter Methods generally outperform more complex embedded or wrapper Methods, and ensemble feature Selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Availability: Code and data are publicly available at http://cbio.ensmp.fr/~ahaury/.

Pierre Chagnon - One of the best experts on this subject based on the ideXlab platform.

  • Comparison of Selection Methods of explanatory variables in PLS regression with application to manufacturing process data
    Chemometrics and Intelligent Laboratory Systems, 2001
    Co-Authors: Jean-pierre Gauchi, Pierre Chagnon
    Abstract:

    A large number of variables are used to describe manufacturing processes in the oil, chemical and food industries. In order to pilot and optimise these processes, the manufacturer or the researcher needs both very explanatory and good predictive models of explained variables (the responses), based on reduced numbers of pertinent explanatory variables. To achieve this goal, it is therefore necessary to have access to efficient Selection Methods of explanatory variables. Several variable Selection Methods have been compared in the context of PLS regression, under the same conditions, on several real datasets of chemical manufacturing processes. Their effectiveness, evaluated on the basis of several criteria, are compared with the final PLS model for each dataset. In conclusion, we propose a stepwise variable Selection based on the maximum Qcum2 criterion, similar to the Stone–Geisser index, depending on the number of eliminated variables