Default Implementation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 165 Experts worldwide ranked by ideXlab platform

Caleb E. Welton - One of the best experts on this subject based on the ideXlab platform.

  • PAKDD Workshops - Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics
    Lecture Notes in Computer Science, 2014
    Co-Authors: Hai Qian, Shengwen Yang, Rahul Iyer, Xixuan Feng, Mark Wellons, Caleb E. Welton
    Abstract:

    MADlibis an open-source library for scalable in-database analytics. In this paper, we present our parallel design of time series analysis and Implementation of ARIMA modeling in MADlib’s framework. The algorithms for fitting time series models are intrinsically sequential since any calculation for a specific time \(t\) depends on the result from the previous time step \(t-1\). Our solution parallelizes this computation by splitting the data into \(n\) chunks. Since the model fitting involves multiple iterations, we use the results from previous iteration as the initial values for each chunk in the current iteration. Thus the computation for each chunk of data is not dependenton on the results from the previous chunk. We further improve performance by redistributing the original data such that each chunk can be loaded into memory, minimizing communication overhead. Experiments show that our parallel Implementation has good speed-up when compared to a sequential version of the algorithm and R’s Default Implementation in the “stats” package.

Hai Qian - One of the best experts on this subject based on the ideXlab platform.

  • PAKDD Workshops - Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics
    Lecture Notes in Computer Science, 2014
    Co-Authors: Hai Qian, Shengwen Yang, Rahul Iyer, Xixuan Feng, Mark Wellons, Caleb E. Welton
    Abstract:

    MADlibis an open-source library for scalable in-database analytics. In this paper, we present our parallel design of time series analysis and Implementation of ARIMA modeling in MADlib’s framework. The algorithms for fitting time series models are intrinsically sequential since any calculation for a specific time \(t\) depends on the result from the previous time step \(t-1\). Our solution parallelizes this computation by splitting the data into \(n\) chunks. Since the model fitting involves multiple iterations, we use the results from previous iteration as the initial values for each chunk in the current iteration. Thus the computation for each chunk of data is not dependenton on the results from the previous chunk. We further improve performance by redistributing the original data such that each chunk can be loaded into memory, minimizing communication overhead. Experiments show that our parallel Implementation has good speed-up when compared to a sequential version of the algorithm and R’s Default Implementation in the “stats” package.

Jonathan M. Samet - One of the best experts on this subject based on the ideXlab platform.

  • On the Use of Generalized Additive Models in Time-Series Studies of Air Pollution and Health
    American journal of epidemiology, 2002
    Co-Authors: Francesca Dominici, Aidan Mcdermott, Scott L. Zeger, Jonathan M. Samet
    Abstract:

    The widely used generalized additive models (GAM) method is a flexible and effective technique for conducting nonlinear regression analysis in time-series studies of the health effects of air pollution. When the data to which the GAM are being applied have two characteristics--1) the estimated regression coefficients are small and 2) there exist confounding factors that are modeled using at least two nonparametric smooth functions--the Default settings in the gam function of the S-Plus software package (version 3.4) do not assure convergence of its iterative estimation procedure and can provide biased estimates of regression coefficients and standard errors. This phenomenon has occurred in time-series analyses of contemporary data on air pollution and mortality. To evaluate the impact of Default Implementation of the gam software on published analyses, the authors reanalyzed data from the National Morbidity, Mortality, and Air Pollution Study (NMMAPS) using three different methods: 1) Poisson regression with parametric nonlinear adjustments for confounding factors; 2) GAM with Default convergence parameters; and 3) GAM with more stringent convergence parameters than the Default settings. The authors found that pooled NMMAPS estimates were very similar under the first and third methods but were biased upward under the second method.

Siddarth Saha - One of the best experts on this subject based on the ideXlab platform.

  • Network intrusion detection using string matching
    2010
    Co-Authors: Praveen Kumar Telugu, Siddarth Saha
    Abstract:

    Network intrusion detection system is a retrofit approach for providing a sense of security in existing computers and data networks, while allowing them to operate in their current open mode. The goal of a network intrusion detection system is to identify, preferably in real time, unauthorized use, misuse and abuse of computer systems by insiders as well as from outside perpetrators. At the heart of every network intrusion detection system is packet inspection which employs nothing but string matching. This string matching is the bottleneck of performance for the whole network intrusion detection system. Thus, the need to increase the performance of string matching cannot be more exemplified. In this project, we have studied some of the standard string matching algorithms and implemented them. We have then compared the performance of the various algorithms with varying input sizes. The main focus of the project was the Aho-Corasick algorithm. In addition to using the Default Implementation of suffix trees, we have used a dense hash set and a sparse hash set Implementation- which are libraries from the Google code repository- and we show that the performance for these Implementations are better. They give noticeable enhancement in performance when the input size increases.

Rahul Iyer - One of the best experts on this subject based on the ideXlab platform.

  • PAKDD Workshops - Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics
    Lecture Notes in Computer Science, 2014
    Co-Authors: Hai Qian, Shengwen Yang, Rahul Iyer, Xixuan Feng, Mark Wellons, Caleb E. Welton
    Abstract:

    MADlibis an open-source library for scalable in-database analytics. In this paper, we present our parallel design of time series analysis and Implementation of ARIMA modeling in MADlib’s framework. The algorithms for fitting time series models are intrinsically sequential since any calculation for a specific time \(t\) depends on the result from the previous time step \(t-1\). Our solution parallelizes this computation by splitting the data into \(n\) chunks. Since the model fitting involves multiple iterations, we use the results from previous iteration as the initial values for each chunk in the current iteration. Thus the computation for each chunk of data is not dependenton on the results from the previous chunk. We further improve performance by redistributing the original data such that each chunk can be loaded into memory, minimizing communication overhead. Experiments show that our parallel Implementation has good speed-up when compared to a sequential version of the algorithm and R’s Default Implementation in the “stats” package.