Default Implementation

The Experts below are selected from a list of 165 Experts worldwide ranked by ideXlab platform

Caleb E. Welton - One of the best experts on this subject based on the ideXlab platform.

PAKDD Workshops - Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics

Lecture Notes in Computer Science, 2014

Co-Authors: Hai Qian, Shengwen Yang, Rahul Iyer, Xixuan Feng, Mark Wellons, Caleb E. Welton

Abstract:

MADlibis an open-source library for scalable in-database analytics. In this paper, we present our parallel design of time series analysis and Implementation of ARIMA modeling in MADlib’s framework. The algorithms for fitting time series models are intrinsically sequential since any calculation for a specific time \(t\) depends on the result from the previous time step \(t-1\). Our solution parallelizes this computation by splitting the data into \(n\) chunks. Since the model fitting involves multiple iterations, we use the results from previous iteration as the initial values for each chunk in the current iteration. Thus the computation for each chunk of data is not dependenton on the results from the previous chunk. We further improve performance by redistributing the original data such that each chunk can be loaded into memory, minimizing communication overhead. Experiments show that our parallel Implementation has good speed-up when compared to a sequential version of the algorithm and R’s Default Implementation in the “stats” package.

15 days free trial to Access Article

Hai Qian - One of the best experts on this subject based on the ideXlab platform.

PAKDD Workshops - Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics

Lecture Notes in Computer Science, 2014

Co-Authors: Hai Qian, Shengwen Yang, Rahul Iyer, Xixuan Feng, Mark Wellons, Caleb E. Welton

Abstract:

MADlibis an open-source library for scalable in-database analytics. In this paper, we present our parallel design of time series analysis and Implementation of ARIMA modeling in MADlib’s framework. The algorithms for fitting time series models are intrinsically sequential since any calculation for a specific time \(t\) depends on the result from the previous time step \(t-1\). Our solution parallelizes this computation by splitting the data into \(n\) chunks. Since the model fitting involves multiple iterations, we use the results from previous iteration as the initial values for each chunk in the current iteration. Thus the computation for each chunk of data is not dependenton on the results from the previous chunk. We further improve performance by redistributing the original data such that each chunk can be loaded into memory, minimizing communication overhead. Experiments show that our parallel Implementation has good speed-up when compared to a sequential version of the algorithm and R’s Default Implementation in the “stats” package.

15 days free trial to Access Article

Jonathan M. Samet - One of the best experts on this subject based on the ideXlab platform.

On the Use of Generalized Additive Models in Time-Series Studies of Air Pollution and Health

American journal of epidemiology, 2002

Co-Authors: Francesca Dominici, Aidan Mcdermott, Scott L. Zeger, Jonathan M. Samet

Abstract:

The widely used generalized additive models (GAM) method is a flexible and effective technique for conducting nonlinear regression analysis in time-series studies of the health effects of air pollution. When the data to which the GAM are being applied have two characteristics--1) the estimated regression coefficients are small and 2) there exist confounding factors that are modeled using at least two nonparametric smooth functions--the Default settings in the gam function of the S-Plus software package (version 3.4) do not assure convergence of its iterative estimation procedure and can provide biased estimates of regression coefficients and standard errors. This phenomenon has occurred in time-series analyses of contemporary data on air pollution and mortality. To evaluate the impact of Default Implementation of the gam software on published analyses, the authors reanalyzed data from the National Morbidity, Mortality, and Air Pollution Study (NMMAPS) using three different methods: 1) Poisson regression with parametric nonlinear adjustments for confounding factors; 2) GAM with Default convergence parameters; and 3) GAM with more stringent convergence parameters than the Default settings. The authors found that pooled NMMAPS estimates were very similar under the first and third methods but were biased upward under the second method.

15 days free trial to Access Article

Siddarth Saha - One of the best experts on this subject based on the ideXlab platform.

Network intrusion detection using string matching

2010

Co-Authors: Praveen Kumar Telugu, Siddarth Saha

Abstract:

Network intrusion detection system is a retrofit approach for providing a sense of security in existing computers and data networks, while allowing them to operate in their current open mode. The goal of a network intrusion detection system is to identify, preferably in real time, unauthorized use, misuse and abuse of computer systems by insiders as well as from outside perpetrators. At the heart of every network intrusion detection system is packet inspection which employs nothing but string matching. This string matching is the bottleneck of performance for the whole network intrusion detection system. Thus, the need to increase the performance of string matching cannot be more exemplified. In this project, we have studied some of the standard string matching algorithms and implemented them. We have then compared the performance of the various algorithms with varying input sizes. The main focus of the project was the Aho-Corasick algorithm. In addition to using the Default Implementation of suffix trees, we have used a dense hash set and a sparse hash set Implementation- which are libraries from the Google code repository- and we show that the performance for these Implementations are better. They give noticeable enhancement in performance when the input size increases.

15 days free trial to Access Article

Rahul Iyer - One of the best experts on this subject based on the ideXlab platform.

PAKDD Workshops - Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics

Lecture Notes in Computer Science, 2014

Co-Authors: Hai Qian, Shengwen Yang, Rahul Iyer, Xixuan Feng, Mark Wellons, Caleb E. Welton

Abstract:

MADlibis an open-source library for scalable in-database analytics. In this paper, we present our parallel design of time series analysis and Implementation of ARIMA modeling in MADlib’s framework. The algorithms for fitting time series models are intrinsically sequential since any calculation for a specific time \(t\) depends on the result from the previous time step \(t-1\). Our solution parallelizes this computation by splitting the data into \(n\) chunks. Since the model fitting involves multiple iterations, we use the results from previous iteration as the initial values for each chunk in the current iteration. Thus the computation for each chunk of data is not dependenton on the results from the previous chunk. We further improve performance by redistributing the original data such that each chunk can be loaded into memory, minimizing communication overhead. Experiments show that our parallel Implementation has good speed-up when compared to a sequential version of the algorithm and R’s Default Implementation in the “stats” package.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Caleb E. Welton - One of the best experts on this subject based on the ideXlab platform.

PAKDD Workshops - Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics

Hai Qian - One of the best experts on this subject based on the ideXlab platform.

PAKDD Workshops - Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics

Jonathan M. Samet - One of the best experts on this subject based on the ideXlab platform.

On the Use of Generalized Additive Models in Time-Series Studies of Air Pollution and Health

Siddarth Saha - One of the best experts on this subject based on the ideXlab platform.

Network intrusion detection using string matching

Rahul Iyer - One of the best experts on this subject based on the ideXlab platform.

PAKDD Workshops - Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics