Target Variable

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 99744 Experts worldwide ranked by ideXlab platform

Kunio Takezawa - One of the best experts on this subject based on the ideXlab platform.

  • Tree Model Optimization Criterion without Using Prediction Error
    Open Journal of Statistics, 2012
    Co-Authors: Kunio Takezawa
    Abstract:

    The use of prediction error to optimize the number of splitting rules in a tree model does not control the probability of the emergence of splitting rules with a predictor that has no functional relationship with the Target Variable. To solve this problem, a new optimization method is proposed. Using this method, the probability that the predictors used in splitting rules in the optimized tree model have no functional relationships with the Target Variable is confined to less than 0.05. It is fairly convincing that the tree model given by the new method represents knowledge contained in the data.

  • flexible model selection criterion for multiple regression
    Open Journal of Statistics, 2012
    Co-Authors: Kunio Takezawa
    Abstract:

    Predictors of a multiple linear regression equation selected by GCV (Generalized Cross Validation) may contain undesirable predictors with no linear functional relationship with the Target Variable, but are chosen only by accident. This is because GCV estimates prediction error, but does not control the probability of selecting irrelevant predictors of the Target Variable. To take this possibility into account, a new statistics “GCVf” (“f”stands for “flexible”) is suggested. The rigidness in accepting predictors by GCVf is adjustable; GCVf is a natural generalization of GCV. For example, GCVf is designed so that the possibility of erroneous identification of linear relationships is 5 percent when all predictors have no linear relationships with the Target Variable. Predictors of the multiple linear regression equation by this method are highly likely to have linear relationships with the Target Variable.

Rita P. Ribeiro - One of the best experts on this subject based on the ideXlab platform.

  • Pre-processing approaches for imbalanced distributions in regression
    Neurocomputing, 2019
    Co-Authors: Paula Branco, Luís Torgo, Rita P. Ribeiro
    Abstract:

    Abstract Imbalanced domains are an important problem frequently arising in real world predictive analytics. A significant body of research has addressed imbalanced distributions in classification tasks, where the Target Variable is nominal. In the context of regression tasks, where the Target Variable is continuous, imbalanced distributions of the Target Variable also raise several challenges to learning algorithms. Imbalanced domains are characterized by: (1) a higher relevance being assigned to the performance on a subset of the Target Variable values; and (2) these most relevant values being underrepresented on the available data set. Recently, some proposals were made to address the problem of imbalanced distributions in regression. Still, this remains a scarcely explored issue with few existing solutions. This paper describes three new approaches for tackling the problem of imbalanced distributions in regression tasks. We propose the adaptation to regression tasks of random over-sampling and introduction of Gaussian Noise, and we present a new method called WEighted Relevance-based Combination Strategy (WERCS). An extensive set of experiments provides empirical evidence of the advantage of using the proposed strategies and, in particular, the WERCS  method. We analyze the impact of different data characteristics in the performance of the methods. A data repository with 15 imbalanced regression data sets is also provided to the research community.

  • A Survey of Predictive Modeling on Imbalanced Domains
    ACM Computing Surveys, 2016
    Co-Authors: Paula Branco, Luís Torgo, Rita P. Ribeiro
    Abstract:

    Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the Target Variable. Frequently, the least-common values of this Target Variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal Target Variables), we also describe methods designed to handle similar problems within regression tasks (numeric Target Variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

  • A Survey of Predictive Modelling under Imbalanced Distributions
    arXiv: Learning, 2015
    Co-Authors: Paula Branco, Luís Torgo, Rita P. Ribeiro
    Abstract:

    Many real world data mining applications involve obtaining predictive models using data sets with strongly imbalanced distributions of the Target Variable. Frequently, the least common values of this Target Variable are associated with events that are highly relevant for end users (e.g. fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have dierent costs and benets, which when associated with the rarity of some of them on the available training data creates serious problems to predictive modelling techniques. This paper presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classication tasks (nominal Target Variables), we also describe methods designed to handle similar problems within regression tasks (numeric Target Variables). In this survey we discuss the main challenges raised by imbalanced distributions, describe the main approaches to these problems, propose a taxonomy of these methods and refer to some related problems within predictive modelling.

  • Resampling strategies for regression
    Expert Systems, 2014
    Co-Authors: Luís Torgo, Paula Branco, Rita P. Ribeiro, Bernhard Pfahringer
    Abstract:

    Several real world prediction problems involve forecasting rare values of a Target Variable. When this Variable is nominal, we have a problem of class imbalance that was thoroughly studied within machine learning. For regression tasks, where the Target Variable is continuous, few works exist addressing this type of problem. Still, important applications involve forecasting rare extreme values of a continuous Target Variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by resampling approaches that change the distribution of the given data set to decrease the problem of imbalance between the rare Target cases and the most frequent ones. We present two modifications of well-known resampling strategies for classification tasks: the under-sampling and the synthetic minority over-sampling technique SMOTE methods. These modifications allow the use of these strategies on regression tasks where the goal is to forecast rare extreme values of the Target Variable. In an extensive set of experiments, we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed resampling methods can be used with any existing regression algorithm, which means that they are general tools for addressing problems of forecasting rare extreme values of a continuous Target Variable.

  • EPIA - SMOTE for Regression
    Progress in Artificial Intelligence, 2013
    Co-Authors: Luís Torgo, Rita P. Ribeiro, Bernhard Pfahringer, Paula Branco
    Abstract:

    Several real world prediction problems involve forecasting rare values of a Target Variable. When this Variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the Target Variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous Target Variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare Target cases and the most frequent ones. We present a modification of the well-known Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous Target Variable.

George Kapetanios - One of the best experts on this subject based on the ideXlab platform.

  • Revisiting useful approaches to data-rich macroeconomic forecasting
    Computational Statistics & Data Analysis, 2016
    Co-Authors: Jan Groen, George Kapetanios
    Abstract:

    The properties of a number of data-rich methods that are widely used in macroeconomic forecasting are analyzed. In particular, this analysis focuses on principal components (PC) and Bayesian regressions, as well as a lesser known alternative, partial least squares (PLS) regression. In the latter method, linear, orthogonal combinations of a large number of predictor Variables are constructed such that the covariance between a Target Variable and these common components is maximized. Existing studies have focused on modeling the Target Variable as a function of a finite set of unobserved common factors that underlies a large set of predictor Variables, but here it is assumed that this Target Variable depends directly on the whole set of predictor Variables. Given this set up it is shown theoretically that under a variety of different unobserved factor structures, PLS and Bayesian regressions provide asymptotically the best fit for the Target Variable of interest. This includes the case of an asymptotically weak factor structure for the predictor Variables, for which it is known that PC regression becomes inconsistent. Monte Carlo experiments confirm that PLS regression is close to Bayesian regression when the data has a factor structure. When the factor structure in the data becomes weak, PLS and Bayesian regressions outperform principal components. Finally, PLS, principal components, and Bayesian regressions are applied on a large panel of monthly U.S. macroeconomic data to forecast key Variables across different subperiods, and PLS and Bayesian regressions usually have the best out-of-sample performances.

  • Revisiting useful approaches to data-rich macroeconomic forecasting
    Staff Reports, 2008
    Co-Authors: Jan Groen, George Kapetanios
    Abstract:

    This paper analyzes the properties of a number of data-rich methods that are widely used in macroeconomic forecasting, in particular principal components (PC) and Bayesian regressions, as well as a lesser-known alternative, partial least squares (PLS) regression. In the latter method, linear, orthogonal combinations of a large number of predictor Variables are constructed such that the covariance between a Target Variable and these common components is maximized. Existing studies have focused on modelling the Target Variable as a function of a finite set of unobserved common factors that underlies a large set of predictor Variables, but here it is assumed that this Target Variable depends directly on the whole set of predictor Variables. Given this setup, it is shown theoretically that under a variety of different unobserved factor structures, PLS and Bayesian regressions provide asymptotically the best fit for the Target Variable of interest. This includes the case of an asymptotically weak factor structure for the predictor Variables, for which it is known that PC regression becomes inconsistent. Monte Carlo experiments confirm that PLS regression is close to Bayesian regression when the data has a factor structure. When the factor structure in the data becomes weak, PLS and Bayesian regressions outperform principal components. Finally, PLS, principal components, and Bayesian regressions are applied on a large panel of monthly U.S. macroeconomic data to forecast key Variables across different subperiods, and PLS and Bayesian regression usually have the best out-of-sample performances.

Paula Branco - One of the best experts on this subject based on the ideXlab platform.

  • Pre-processing approaches for imbalanced distributions in regression
    Neurocomputing, 2019
    Co-Authors: Paula Branco, Luís Torgo, Rita P. Ribeiro
    Abstract:

    Abstract Imbalanced domains are an important problem frequently arising in real world predictive analytics. A significant body of research has addressed imbalanced distributions in classification tasks, where the Target Variable is nominal. In the context of regression tasks, where the Target Variable is continuous, imbalanced distributions of the Target Variable also raise several challenges to learning algorithms. Imbalanced domains are characterized by: (1) a higher relevance being assigned to the performance on a subset of the Target Variable values; and (2) these most relevant values being underrepresented on the available data set. Recently, some proposals were made to address the problem of imbalanced distributions in regression. Still, this remains a scarcely explored issue with few existing solutions. This paper describes three new approaches for tackling the problem of imbalanced distributions in regression tasks. We propose the adaptation to regression tasks of random over-sampling and introduction of Gaussian Noise, and we present a new method called WEighted Relevance-based Combination Strategy (WERCS). An extensive set of experiments provides empirical evidence of the advantage of using the proposed strategies and, in particular, the WERCS  method. We analyze the impact of different data characteristics in the performance of the methods. A data repository with 15 imbalanced regression data sets is also provided to the research community.

  • A Survey of Predictive Modeling on Imbalanced Domains
    ACM Computing Surveys, 2016
    Co-Authors: Paula Branco, Luís Torgo, Rita P. Ribeiro
    Abstract:

    Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the Target Variable. Frequently, the least-common values of this Target Variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal Target Variables), we also describe methods designed to handle similar problems within regression tasks (numeric Target Variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

  • A Survey of Predictive Modelling under Imbalanced Distributions
    arXiv: Learning, 2015
    Co-Authors: Paula Branco, Luís Torgo, Rita P. Ribeiro
    Abstract:

    Many real world data mining applications involve obtaining predictive models using data sets with strongly imbalanced distributions of the Target Variable. Frequently, the least common values of this Target Variable are associated with events that are highly relevant for end users (e.g. fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have dierent costs and benets, which when associated with the rarity of some of them on the available training data creates serious problems to predictive modelling techniques. This paper presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classication tasks (nominal Target Variables), we also describe methods designed to handle similar problems within regression tasks (numeric Target Variables). In this survey we discuss the main challenges raised by imbalanced distributions, describe the main approaches to these problems, propose a taxonomy of these methods and refer to some related problems within predictive modelling.

  • Resampling strategies for regression
    Expert Systems, 2014
    Co-Authors: Luís Torgo, Paula Branco, Rita P. Ribeiro, Bernhard Pfahringer
    Abstract:

    Several real world prediction problems involve forecasting rare values of a Target Variable. When this Variable is nominal, we have a problem of class imbalance that was thoroughly studied within machine learning. For regression tasks, where the Target Variable is continuous, few works exist addressing this type of problem. Still, important applications involve forecasting rare extreme values of a continuous Target Variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by resampling approaches that change the distribution of the given data set to decrease the problem of imbalance between the rare Target cases and the most frequent ones. We present two modifications of well-known resampling strategies for classification tasks: the under-sampling and the synthetic minority over-sampling technique SMOTE methods. These modifications allow the use of these strategies on regression tasks where the goal is to forecast rare extreme values of the Target Variable. In an extensive set of experiments, we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed resampling methods can be used with any existing regression algorithm, which means that they are general tools for addressing problems of forecasting rare extreme values of a continuous Target Variable.

  • EPIA - SMOTE for Regression
    Progress in Artificial Intelligence, 2013
    Co-Authors: Luís Torgo, Rita P. Ribeiro, Bernhard Pfahringer, Paula Branco
    Abstract:

    Several real world prediction problems involve forecasting rare values of a Target Variable. When this Variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the Target Variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous Target Variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare Target cases and the most frequent ones. We present a modification of the well-known Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous Target Variable.

Steven M Donn - One of the best experts on this subject based on the ideXlab platform.

  • mechanical ventilation of very low birth weight infants is volume or pressure a better Target Variable
    The Journal of Pediatrics, 2006
    Co-Authors: Jaideep K Singh, Sunil K Sinha, Paul Clarke, Steve Byrne, Steven M Donn
    Abstract:

    OBJECTIVE: To compare the efficacy and safety of volume-controlled (VC) ventilation to time-cycled pressure-limited (TCPL) ventilation in very low birth weight infants with respiratory distress syndrome (RDS). STUDY DESIGN: Newborns weighing between 600 and 1500 g and with a gestational age of 24 to 31 weeks who had RDS were randomized to receive either VC or TCPL ventilation and treated with a standardized protocol. The 2 modalities were compared by determining the time required to achieve a predetermined success criterion, on the basis of either the alveolar-arterial oxygen gradient <100 mm Hg or the mean airway pressure <8 cm H(2)O. Secondary outcomes included mortality, duration of mechanical ventilation, and complications commonly associated with ventilation. RESULTS: The mean time to reach the success criterion was 23 hours in the VC group versus 33 hours in the TCPL group (P = .15). This difference was more striking in babies weighing <1000g (21 versus 58 hours; P = .03). Mean duration of ventilation with VC was 255 hours versus 327 hours with TCPL (P = .60). There were 5 deaths in the VC group and 10 deaths in the TCPL group (P = .10). The incidence of other complications was similar. CONCLUSION: VC ventilation is safe and efficacious in very low birth weight infants and may have advantages when compared with TCPL, especially in smaller infants.

  • Mechanical ventilation of very low birth weight infants: is volume or pressure a better Target Variable?
    The Journal of pediatrics, 2006
    Co-Authors: Jaideep K Singh, Sunil K Sinha, Paul Clarke, Steve Byrne, Steven M Donn
    Abstract:

    Objective To compare the efficacy and safety of volume-controlled (VC) ventilation to time-cycled pressure-limited (TCPL) ventilation in very low birth weight infants with respiratory distress syndrome (RDS). Study design Newborns weighing between 600 and 1500 g and with a gestational age of 24 to 31 weeks who had RDS were randomized to receive either VC or TCPL ventilation and treated with a standardized protocol. The 2 modalities were compared by determining the time required to achieve a predetermined success criterion, on the basis of either the alveolar-arterial oxygen gradient 2 O. Secondary outcomes included mortality, duration of mechanical ventilation, and complications commonly associated with ventilation. Results The mean time to reach the success criterion was 23 hours in the VC group versus 33 hours in the TCPL group ( P = .15). This difference was more striking in babies weighing P = .03). Mean duration of ventilation with VC was 255 hours versus 327 hours with TCPL ( P = .60). There were 5 deaths in the VC group and 10 deaths in the TCPL group ( P = .10). The incidence of other complications was similar. Conclusion VC ventilation is safe and efficacious in very low birth weight infants and may have advantages when compared with TCPL, especially in smaller infants.