Additive Regression

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 32511 Experts worldwide ranked by ideXlab platform

Thomas Kneib - One of the best experts on this subject based on the ideXlab platform.

  • Predicting the occurrence of wildfires with binary structured Additive Regression models.
    Journal of Environmental Management, 2016
    Co-Authors: Laura Ríos-pena, Thomas Kneib, Carmen Cadarso-suárez, M.f Marey-pérez
    Abstract:

    Abstract Wildfires are one of the main environmental problems facing societies today, and in the case of Galicia (north-west Spain), they are the main cause of forest destruction. This paper used binary structured Additive Regression (STAR) for modelling the occurrence of wildfires in Galicia. Binary STAR models are a recent contribution to the classical logistic Regression and binary generalized Additive models. Their main advantage lies in their flexibility for modelling non-linear effects, while simultaneously incorporating spatial and temporal variables directly, thereby making it possible to reveal possible relationships among the variables considered. The results showed that the occurrence of wildfires depends on many covariates which display variable behaviour across space and time, and which largely determine the likelihood of ignition of a fire. The joint possibility of working on spatial scales with a resolution of 1 × 1 km cells and mapping predictions in a colour range makes STAR models a useful tool for plotting and predicting wildfire occurrence. Lastly, it will facilitate the development of fire behaviour models, which can be invaluable when it comes to drawing up fire-prevention and firefighting plans.

  • Structured Additive Regression Models: An R Interface to BayesX
    Journal of Statistical Software, 2015
    Co-Authors: Nikolaus Umlauf, Thomas Kneib, Stefan Lang, Daniel Adler, Achim Zeileis
    Abstract:

    Structured Additive Regression (STAR) models provide a flexible framework for modeling possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized Additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source Regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise Regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard Regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using R’s formula language (with some extended terms), fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric Regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.

  • Multilevel structured Additive Regression
    Statistics and Computing, 2014
    Co-Authors: Stefan Lang, Nikolaus Umlauf, Peter Wechselberger, Kenneth Harttgen, Thomas Kneib
    Abstract:

    Models with structured Additive predictor provide a very broad and rich framework for complex Regression modeling. They can deal simultaneously with nonlinear covariate effects and time trends, unit- or cluster-specific heterogeneity, spatial heterogeneity and complex interactions between covariates of different type. In this paper, we propose a hierarchical or multilevel version of Regression models with structured Additive predictor where the Regression coefficients of a particular nonlinear term may obey another Regression model with structured Additive predictor. In that sense, the model is composed of a hierarchy of complex structured Additive Regression models. The proposed model may be regarded as an extended version of a multilevel model with nonlinear covariate terms in every level of the hierarchy. The model framework is also the basis for generalized random slope modeling based on multiplicative random effects. Inference is fully Bayesian and based on Markov chain Monte Carlo simulation techniques. We provide an in depth description of several highly efficient sampling schemes that allow to estimate complex models with several hierarchy levels and a large number of observations within a couple of minutes (often even seconds). We demonstrate the practicability of the approach in a complex application on childhood undernutrition with large sample size and three hierarchy levels.

  • Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models
    Journal of the American Statistical Association, 2012
    Co-Authors: Fabian Scheipl, Ludwig Fahrmeir, Thomas Kneib
    Abstract:

    Structured Additive Regression provides a general framework for complex Gaussian and non-Gaussian Regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further Regression terms. The large flexibility of structured Additive Regression makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor and (3) determining the required interactions. We propose a spike-and-slab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (real) benchmark classification data, we investigate sensitivity to hyperparameter settings and compare performance to competitors. The flexibility and applicability of our approach are demonstrated in an Additive piecewise exponential model with time-varying effects for right-censored survival times of intensive care patients with sepsis. GeoAdditive and Additive mixed logit model applications are discussed in an extensive appendix.

  • spike and slab priors for function selection in structured Additive Regression models
    Journal of the American Statistical Association, 2012
    Co-Authors: Fabian Scheipl, Ludwig Fahrmeir, Thomas Kneib
    Abstract:

    Structured Additive Regression (STAR) provides a general framework for complex Gaussian and non-Gaussian Regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects, and further Regression terms. The large flexibility of STAR makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor, and (3) determining the required interactions. We propose a spike-and-slab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (re...

Suku Nair - One of the best experts on this subject based on the ideXlab platform.

  • Hardening Email Security via Bayesian Additive Regression Trees
    2009
    Co-Authors: Saeed Abu-nimeh, Dario Nappa, Xinlei Wang, Suku Nair
    Abstract:

    The changeable structures and variability of email attacks render current email filtering solutions useless. Consequently, the need for new techniques to harden the protection of users' security and privacy becomes a necessity. The variety of email attacks, namely spam, damages networks' infrastructure and exposes users to new attack vectors daily. Spam is unsolicited email which targets users with different types of commercial messages or advertisements. Porn-related content that contains explicit material or commercials of exploited children is a major trend in these messages as well. The waste of network bandwidth due to the numerous number of spam messages sent and the requirement of complex hardware, software, network resources, and human power are other problems associated with these attacks. Recently, security researchers have noticed an increase in malicious content delivered by these messages, which arises security concerns due to their attack potential. More seriously, phishing attacks have been on the rise for the past couple of years. Phishing is the act of sending a forged e-mail to a recipient, falsely mimicking a legitimate establishment in an attempt to scam the recipient into divulging private information such as credit card numbers or bank account passwords (James, 2005). Recently phishing attacks have become a major concern to financial institutions and law enforcement due to the heavy monetary losses involved. According to a survey by Gartner group, in 2006 approximately 3.25 million victims were spoofed by phishing attacks and in 2007 the number increased by almost 1.3 million victims. Furthermore, in 2007, monetary losses, related to phishing attacks, were estimated by $3.2 billion. All the aforementioned concerns raise the need for new detection mechanisms to subvert email attacks in their various forms. Despite the abundance of applications available for phishing detection, unlike spam classification, there are only few studies that compare machine learning techniques in predicting phishing emails (Abu-Nimeh et al., 2007). We describe a new version of Bayesian Additive Regression Trees (BART) and apply it to phishing detection. A phishing dataset is constructed from 1409 raw phishing emails and 5152 legitimate emails, where 71 features (variables) are used in classifiers' training and testing. The variables consist of both textual and structural features that are extracted from raw emails. The performance of six classifiers, on this dataset, is compared using the area under the curve (AUC) (Huang & Ling, 2005). The classifiers include Logistic Regression (LR), Classification and Regression Trees (CART), Bayesian Additive Regression Trees (BART), Support Vector Machines (SVM), Random O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg

  • ICC - Distributed Phishing Detection by Applying Variable Selection Using Bayesian Additive Regression Trees
    2009 IEEE International Conference on Communications, 2009
    Co-Authors: Saeed Abu-nimeh, Dario Nappa, Xinlei Wang, Suku Nair
    Abstract:

    Phishing continue to be one of the most drastic attacks causing both financial institutions and customers huge monetary losses. Nowadays mobile devices are widely used to access the Internet and therefore access financial and confidential data. However, unlike PCs and wired devices, such devices lack basic defensive applications to protect against various types of attacks. In consequence, phishing has evolved to target mobile users in Vishing and SMishing attacks recently. This study presents a client-server distributed architecture to detect phishing e-mails by taking advantage of automatic variable selection in Bayesian Additive Regression Trees (BART). When combined with other classifiers, BART improves their predictive accuracy. Further the overall architecture proves to leverage well in resource constrained environments.

  • bayesian Additive Regression trees based spam detection for enhanced email privacy
    Availability Reliability and Security, 2008
    Co-Authors: Saeed Abunimeh, Dario Nappa, Xinlei Wang, Suku Nair
    Abstract:

    Spam is considered an invasion of privacy. Its changeable structures and variability raise the need for new spam classification techniques. The present study proposes using Bayesian Additive Regression trees (BART) for spam classification and evaluates its performance against other classification methods, including logistic Regression, support vector machines, classification and Regression trees, neural networks, random forests, and naive Bayes. BART in its original form is not designed for such problems, hence we modify BART and make it applicable to classification problems. We evaluate the classifiers using three spam datasets; Ling-Spam, PU1, and Spambase to determine the predictive accuracy and the false positive rate.

  • A distributed architecture for phishing detection using Bayesian Additive Regression Trees
    eCrime Researchers Summit eCrime 2008, 2008
    Co-Authors: Saeed Abu-nimeh, Dario Nappa, Xinlei Wang, Suku Nair
    Abstract:

    With the variety of applications in mobile devices, such devices are no longer deemed calling gadgets merely. Various applications are used to browse the Internet, thus access financial data, and store sensitive personal information. In consequence, mobile devices are exposed to several types of attacks. Specifically, phishing attacks can easily take advantage of the limited or lack of security and defense applications therein. Furthermore, the limited power, storage, and processing capabilities render machine learning techniques inapt to classify phishing and spam emails in such devices. The present study proposes a distributed architecture hinging on machine learning approaches to detect phishing emails in a mobile environment based on a modified version of Bayesian Additive Regression Trees (BART). Apparently, BART suffers from high computational time and memory overhead, therefore, distributed algorithms are proposed to accommodate detection applications in resource constrained wireless environments.

  • ARES - Bayesian Additive Regression Trees-Based Spam Detection for Enhanced Email Privacy
    2008 Third International Conference on Availability Reliability and Security, 2008
    Co-Authors: Saeed Abu-nimeh, Dario Nappa, Xinlei Wang, Suku Nair
    Abstract:

    Spam is considered an invasion of privacy. Its changeable structures and variability raise the need for new spam classification techniques. The present study proposes using Bayesian Additive Regression trees (BART) for spam classification and evaluates its performance against other classification methods, including logistic Regression, support vector machines, classification and Regression trees, neural networks, random forests, and naive Bayes. BART in its original form is not designed for such problems, hence we modify BART and make it applicable to classification problems. We evaluate the classifiers using three spam datasets; Ling-Spam, PU1, and Spambase to determine the predictive accuracy and the false positive rate.

Rodney Sparapani - One of the best experts on this subject based on the ideXlab platform.

  • Fully Nonparametric Bayesian Additive Regression Trees
    Topics in Identification Limited Dependent Variables Partial Observability Experimentation and Flexible Modeling: Part B, 2019
    Co-Authors: Edward I. George, Robert E. Mcculloch, Purushottam W. Laud, Brent R. Logan, Rodney Sparapani
    Abstract:

    Bayesian Additive Regression trees (BART) is a fully Bayesian approach to modeling with ensembles of trees. BART can uncover complex Regression functions with high-dimensional regressors in a fairly automatic way and provide Bayesian quantification of the uncertainty through the posterior. However, BART assumes independent and identical distributed (i.i.d) normal errors. This strong parametric assumption can lead to misleading inference and uncertainty quantification. In this chapter we use the classic Dirichlet process mixture (DPM) mechanism to nonparametrically model the error distribution. A key strength of BART is that default prior settings work reasonably well in a variety of problems. The challenge in extending BART is to choose the parameters of the DPM so that the strengths of the standard BART approach is not lost when the errors are close to normal, but the DPM has the ability to adapt to non-normal errors.

  • Detection of Left Ventricular Hypertrophy Using Bayesian Additive Regression Trees: The MESA (Multi‐Ethnic Study of Atherosclerosis)
    Journal of the American Heart Association, 2019
    Co-Authors: Rodney Sparapani, Noura M. Dabbouseh, David D. Gutterman, Jun Zhang, Haiying Chen, David A. Bluemke, Joao A.c. Lima, Gregory L. Burke, Elsayed Z. Soliman
    Abstract:

    Background We developed a new left ventricular hypertrophy (LVH) criterion using a machine‐learning technique called Bayesian Additive Regression Trees (BART). Methods and Results This analysis inc...

  • detection of left ventricular hypertrophy using bayesian Additive Regression trees the mesa multi ethnic study of atherosclerosis
    Journal of the American Heart Association, 2019
    Co-Authors: Rodney Sparapani, Noura M. Dabbouseh, David D. Gutterman, Jun Zhang, Haiying Chen, David A. Bluemke, Joao A.c. Lima, Gregory L. Burke, Elsayed Z. Soliman
    Abstract:

    Background We developed a new left ventricular hypertrophy (LVH) criterion using a machine‐learning technique called Bayesian Additive Regression Trees (BART). Methods and Results This analysis inc...

  • Nonparametric competing risks analysis using Bayesian Additive Regression Trees.
    Statistical methods in medical research, 2019
    Co-Authors: Rodney Sparapani, Robert E. Mcculloch, Brent R. Logan, Purushottam W. Laud
    Abstract:

    Many time-to-event studies are complicated by the presence of competing risks. Such data are often analyzed using Cox models for the cause-specific hazard function or Fine and Gray models for the subdistribution hazard. In practice, Regression relationships in competing risks data are often complex and may include nonlinear functions of covariates, interactions, high-dimensional parameter spaces and nonproportional cause-specific, or subdistribution, hazards. Model misspecification can lead to poor predictive performance. To address these issues, we propose a novel approach: flexible prediction modeling of competing risks data using Bayesian Additive Regression Trees (BART). We study the simulation performance in two-sample scenarios as well as a complex Regression setting, and benchmark its performance against standard Regression techniques as well as random survival forests. We illustrate the use of the proposed method on a recently published study of patients undergoing hematopoietic stem cell transplantation.

  • Nonparametric competing risks analysis using Bayesian Additive Regression Trees (BART)
    arXiv: Methodology, 2018
    Co-Authors: Rodney Sparapani, Robert E. Mcculloch, Brent R. Logan, Purushottam W. Laud
    Abstract:

    Many time-to-event studies are complicated by the presence of competing risks. Such data are often analyzed using Cox models for the cause specific hazard function or Fine-Gray models for the subdistribution hazard. In practice Regression relationships in competing risks data with either strategy are often complex and may include nonlinear functions of covariates, interactions, high-dimensional parameter spaces and nonproportional cause specific or subdistribution hazards. Model misspecification can lead to poor predictive performance. To address these issues, we propose a novel approach to flexible prediction modeling of competing risks data using Bayesian Additive Regression Trees (BART). We study the simulation performance in two-sample scenarios as well as a complex Regression setting, and benchmark its performance against standard Regression techniques as well as random survival forests. We illustrate the use of the proposed method on a recently published study of patients undergoing hematopoietic stem cell transplantation.

Ludwig Fahrmeir - One of the best experts on this subject based on the ideXlab platform.

  • Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models
    Journal of the American Statistical Association, 2012
    Co-Authors: Fabian Scheipl, Ludwig Fahrmeir, Thomas Kneib
    Abstract:

    Structured Additive Regression provides a general framework for complex Gaussian and non-Gaussian Regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further Regression terms. The large flexibility of structured Additive Regression makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor and (3) determining the required interactions. We propose a spike-and-slab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (real) benchmark classification data, we investigate sensitivity to hyperparameter settings and compare performance to competitors. The flexibility and applicability of our approach are demonstrated in an Additive piecewise exponential model with time-varying effects for right-censored survival times of intensive care patients with sepsis. GeoAdditive and Additive mixed logit model applications are discussed in an extensive appendix.

  • spike and slab priors for function selection in structured Additive Regression models
    Journal of the American Statistical Association, 2012
    Co-Authors: Fabian Scheipl, Ludwig Fahrmeir, Thomas Kneib
    Abstract:

    Structured Additive Regression (STAR) provides a general framework for complex Gaussian and non-Gaussian Regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects, and further Regression terms. The large flexibility of STAR makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor, and (3) determining the required interactions. We propose a spike-and-slab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (re...

  • High dimensional structured Additive Regression models: Bayesian regularization, smoothing and predictive performance
    Journal of The Royal Statistical Society Series C-applied Statistics, 2011
    Co-Authors: Thomas Kneib, Susanne Konrath, Ludwig Fahrmeir
    Abstract:

    Data structures in modern applications frequently combine the necessity of flexible Regression techniques such as nonlinear and spatial effects with high-dimensional covariate vectors. While estimation of the former is typically achieved by supplementing the likelihood with a suitable smoothness penalty, the latter are usually assigned shrinkage penalties that enforce sparse models. In this paper, we consider a Bayesian unifying perspective, where conditionally Gaussian priors can be assigned to all types of Regression effects. Suitable hyperprior assumptions on the variances of the Gaussian distributions then induce the desired smoothness or sparseness properties. As a major advantage, general Markov chain Monte Carlo simulation algorithms can be developed that allow for the joint estimation of smooth and spatial effects and regularised coefficient vectors. Two applications demonstrate the usefulness of the proposed procedure: A geoAdditive Regression model for data from the Munich rental guide and an Additive probit model for the prediction of consumer credit defaults. In both cases, high-dimensional vectors of categorical covariates will be included in the Regression models. The predictive ability of the resulting high-dimensional structure Additive Regression models compared to expert models will be of particular relevance and will be evaluated on cross-validation test data.

  • Propriety of posteriors in structured Additive Regression models: Theory and empirical evidence
    Journal of Statistical Planning and Inference, 2009
    Co-Authors: Ludwig Fahrmeir, Thomas Kneib
    Abstract:

    Abstract Structured Additive Regression comprises many semiparametric Regression models such as generalized Additive (mixed) models, geoAdditive models, and hazard Regression models within a unified framework. In a Bayesian formulation, non-parametric functions, spatial effects and further model components are specified in terms of multivariate Gaussian priors for high-dimensional vectors of Regression coefficients. For several model terms, such as penalized splines or Markov random fields, these Gaussian prior distributions involve rank-deficient precision matrices, yielding partially improper priors. Moreover, hyperpriors for the variances (corresponding to inverse smoothing parameters) may also be specified as improper, e.g. corresponding to Jeffreys prior or a flat prior for the standard deviation. Hence, propriety of the joint posterior is a crucial issue for full Bayesian inference in particular if based on Markov chain Monte Carlo simulations. We establish theoretical results providing sufficient (and sometimes necessary) conditions for propriety and provide empirical evidence through several accompanying simulation studies.

  • Structured Additive Regression for categorical space-time data: a mixed model approach.
    Biometrics, 2005
    Co-Authors: Thomas Kneib, Ludwig Fahrmeir
    Abstract:

    Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured Additive Regression models for categorical responses, allowing for a flexible semiparametric predictor. Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines. Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines. We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation. The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood. In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates. As an application we analyze data from the forest health survey.

Elsayed Z. Soliman - One of the best experts on this subject based on the ideXlab platform.