Parallel Database

The Experts below are selected from a list of 29499 Experts worldwide ranked by ideXlab platform

Spyros Blanas - One of the best experts on this subject based on the ideXlab platform.

design and evaluation of an rdma aware data shuffling operator for Parallel Database systems

ACM Transactions on Database Systems, 2019

Co-Authors: Feilong Liu, Lingyan Yin, Spyros Blanas

Abstract:

The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This article considers how to leverage RDMA to improve the analytical performance of Parallel Database systems. To shuffle data efficiently using RDMA, one needs to consider a complex design space that includes (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to exchange data between nodes during query processing. We contribute eight designs that capture salient tradeoffs in this design space as well as an adaptive algorithm to dynamically manage RDMA-registered memory. We comprehensively evaluate how transport-layer decisions impact the query performance of a Database system for different generations of InfiniBand. We find that a shuffling operator that uses the RDMA Send/Receive transport function over the Unreliable Datagram transport service can transmit data up to 4× faster than an RDMA-capable MPI implementation in a 16-node cluster. The response time of TPC-H queries improves by as much as 2×.

15 days free trial to Access Article
design and evaluation of an rdma aware data shuffling operator for Parallel Database systems

European Conference on Computer Systems, 2017

Co-Authors: Feilong Liu, Lingyan Yin, Spyros Blanas

Abstract:

The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This paper considers how to leverage RDMA to improve the analytical performance of Parallel Database systems. To shuffle data efficiently using RDMA, one needs to consider a complex design space that includes (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to exchange data between nodes during query processing. We contribute six designs that capture salient trade-offs in this design space. We comprehensively evaluate how transport-layer decisions impact the query performance of a Database system for different generations of InfiniBand. We find that a shuffling operator that uses the RDMA Send/Receive transport function over the Unreliable Datagram transport service can transmit data up to 4× faster than an RDMA-capable MPI implementation in a 16-node cluster. The response time of TPC-H queries improves by as much as 2×.

15 days free trial to Access Article

David Taniar - One of the best experts on this subject based on the ideXlab platform.

APPT - Multi-scheduler Concurrency Control for Parallel Database Systems

Lecture Notes in Computer Science, 2003

Co-Authors: Sushant Goel, Hema Sharda, David Taniar

Abstract:

Increase in amount of data stored and requirement of fast response time has motivated the research in Parallel Database Systems (PDS). Requirement for correctness of data still remains one of the major issues. Concurrency control algorithms used by PDS uses single scheduler approach. Single scheduler approach has some inherent weaknesses such as - very big lock tables, overloaded centralized scheduler and more number of messages in the system. In this paper we investigate the possibility of multiple schedulers and conclude that single scheduler algorithms cannot be migrated in the present form to multi-scheduler environment. Next, we propose a Multi-Scheduler Concurrency Control algorithm for PDS that distributes the scheduling responsibilities to the respective Processing Elements. Correctness of the proposed algorithm is then discussed using a different serializability criterion - Parallel Database Quasi-Serializability.

15 days free trial to Access Article
Global B+ tree indexing in Parallel Database systems

Lecture Notes in Computer Science, 2003

Co-Authors: David Taniar, Wenny Rahayu

Abstract:

In this paper, we propose a global B + indexing tree structure for Parallel Database systems, where the index tree is partitioned into multi processors with a possible overlap. We also present algorithms for maintenance of global B + indexing trees (e.g. insertion and deletion of nodes), and describe operation algorithms (e.g. search and join) on tables that are indexed using global B + indexing trees.

15 days free trial to Access Article
IDEAL - Global B+ tree indexing in Parallel Database systems

Intelligent Data Engineering and Automated Learning, 2003

Co-Authors: David Taniar, Wenny Rahayu

Abstract:

In this paper, we propose a global B+ indexing tree structure for Parallel Database systems, where the index tree is partitioned into multi processors with a possible overlap. We also present algorithms for maintenance of global B+ indexing trees (e.g. insertion and deletion of nodes), and describe operation algorithms (e.g. search and join) on tables that are indexed using global B+ indexing trees.

15 days free trial to Access Article
Parallel Database sorting

Information Sciences, 2002

Co-Authors: David Taniar, Wenny Rahayu

Abstract:

Sorting in Database processing is frequently required through the use of Order By and Distinct clauses in SQL. Sorting is also widely known in computer science community at large. Sorting in general covers internal and external sorting. Past published work has extensively focused on external sorting on uni-processors (serial external sorting), and internal sorting on multi-processors (Parallel internal sorting). External sorting on multi-processors (Parallel external sorting) has received surprisingly little attention; furthermore, the way current Parallel Database systems do sorting is far from optimal in many scenarios. In this paper, we present a taxonomy for Parallel sorting in Parallel Database systems, which covers five sorting methods: namely Parallel merge-all sort, Parallel binary-merge sort, Parallel redistribution binary-merge sort, Parallel redistribution merge-all sort, and Parallel partitioned sort. The first two methods are previously proposed approaches to Parallel external sorting which have been adopted as status quo of Parallel Database sorting, whereas the latter three methods which are based on redistribution and repartitioning are new that have not been discussed in the literature of Parallel external sorting. Performance of these five methods is investigated and the results are reported.

15 days free trial to Access Article
A Taxonomy of Indexing Schemes for Parallel Database Systems

Distributed and Parallel Databases, 2002

Co-Authors: David Taniar, Wenny Rahayu

Abstract:

In this paper, we present a taxonomy of indexing schemes in Parallel Database systems. Index partitioning is not recognized widely as yet. One of the reasons is that most of index structures are trees, not flat structures like tables, and consequently, index partitioning imposes some degree of complexity compared with common data partitioning for tables. We present three Parallel indexing schemes, and discuss their maintenance strategies. We also analyze their storage requirements.

15 days free trial to Access Article

Sivarama P. Dandamudi - One of the best experts on this subject based on the ideXlab platform.

A Comparative Study of Soft Real-Time Transaction Scheduling Policies in Parallel Database Systems

High Performance Computing Systems and Applications, 1998

Co-Authors: S. Takkar, Sivarama P. Dandamudi

Abstract:

Real-time Database systems support transactions with timing constraints such as deadlines. Real-time transactions, in addition to preserving the consistency of the Database as in traditional transactions, have to meet the deadlines. Scheduling real-time transactions has received considerable attention in centralized and distributed Databases. However, real-time transaction scheduling in Parallel Database systems has not received much attention. This paper focuses on real-time transaction scheduling in shared-nothing Parallel Database systems. We evaluate the performance of a new priority-based scheduling policy in which all scheduling decisions are made locally by each node. In contrast, several other algorithms, proposed for the distributed systems, require communication among the nodes to globally synchronize their local block and abort decisions. Such synchronization can deteriorate performance as real-time transactions will have to meet deadlines. We use miss ratio and average lateness as the performance metrics and show that, in general, the new policy provides a superior performance for the workload and system parameters considered in this study.

15 days free trial to Access Article
IPDPS - Dynamic versus static locking in real-time Parallel Database systems

18th International Parallel and Distributed Processing Symposium 2004. Proceedings., 1

Co-Authors: A. Mittal, Sivarama P. Dandamudi

Abstract:

Summary form only given. Parallel Database systems are capable of providing significant performance gains in terms of transaction processing rates. These gains are realized by running many transactions concurrently. A requirement in real-time transaction scheduling is to complete the transactions within their deadline. Due to its simplicity, two-phase locking (2PL) is one of the most commonly used concurrency control mechanism. Two alternative methods of securing locks in the 2PL protocol are static locking or dynamic locking. We report performance of the two locking variants of the 2PL protocol under various degrees of resource and data contention in a real-time Parallel Database system.

15 days free trial to Access Article
MASCOTS - Performance of hard real-time transaction scheduling policies in Parallel Database systems

Proceedings. Sixth International Symposium on Modeling Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.98TB100247), 1

Co-Authors: S. Takkar, Sivarama P. Dandamudi

Abstract:

Real-time transactions, in addition to preserving consistency of the Database as in traditional transactions, have to meet the deadlines. Scheduling real-time transactions in Parallel Database systems has not received much attention. This paper focuses on real-time transaction scheduling in shared-nothing Parallel Database systems. We evaluate the performance of a new priority-based scheduling policy, in which all scheduling decisions are made locally by each node. In contrast, several other algorithms, proposed for the distributed systems, require communication among the nodes to globally synchronize their local block and abort decisions. Such synchronization can deteriorate performance as real-time transactions will have to meet deadlines. We use miss ratio as the performance metric and show that, in general, the new policy provides a superior performance for the workload and system parameters considered in this study.

15 days free trial to Access Article

Feilong Liu - One of the best experts on this subject based on the ideXlab platform.

design and evaluation of an rdma aware data shuffling operator for Parallel Database systems

ACM Transactions on Database Systems, 2019

Co-Authors: Feilong Liu, Lingyan Yin, Spyros Blanas

Abstract:

The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This article considers how to leverage RDMA to improve the analytical performance of Parallel Database systems. To shuffle data efficiently using RDMA, one needs to consider a complex design space that includes (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to exchange data between nodes during query processing. We contribute eight designs that capture salient tradeoffs in this design space as well as an adaptive algorithm to dynamically manage RDMA-registered memory. We comprehensively evaluate how transport-layer decisions impact the query performance of a Database system for different generations of InfiniBand. We find that a shuffling operator that uses the RDMA Send/Receive transport function over the Unreliable Datagram transport service can transmit data up to 4× faster than an RDMA-capable MPI implementation in a 16-node cluster. The response time of TPC-H queries improves by as much as 2×.

15 days free trial to Access Article
design and evaluation of an rdma aware data shuffling operator for Parallel Database systems

European Conference on Computer Systems, 2017

Co-Authors: Feilong Liu, Lingyan Yin, Spyros Blanas

Abstract:

The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This paper considers how to leverage RDMA to improve the analytical performance of Parallel Database systems. To shuffle data efficiently using RDMA, one needs to consider a complex design space that includes (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to exchange data between nodes during query processing. We contribute six designs that capture salient trade-offs in this design space. We comprehensively evaluate how transport-layer decisions impact the query performance of a Database system for different generations of InfiniBand. We find that a shuffling operator that uses the RDMA Send/Receive transport function over the Unreliable Datagram transport service can transmit data up to 4× faster than an RDMA-capable MPI implementation in a 16-node cluster. The response time of TPC-H queries improves by as much as 2×.

15 days free trial to Access Article

Rimma V. Nehme - One of the best experts on this subject based on the ideXlab platform.

Resource bricolage and resource selection for Parallel Database systems

The VLDB Journal, 2017

Co-Authors: Jiexing Li, Jeffrey F. Naughton, Rimma V. Nehme

Abstract:

Running Parallel Database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds. Performance differences among machines in the same cluster pose new challenges for Parallel Database systems. First, for Database systems running in a heterogeneous cluster, the default uniform data partitioning strategy may overload some of the slow machines, while at the same time it may underutilize the more powerful machines. Since the processing time of a Parallel query is determined by the slowest machine, such an allocation strategy may result in a significant query performance degradation. Second, since machines might have varying resources or performance, different choices of machines may lead to different costs or performance for executing the same workload. By carefully selecting the most suitable machines for running a workload, we may achieve better performance with the same budget, or we may meet the same performance requirements with a lower cost. We address these challenges by introducing techniques we call resource bricolage and resource selection that improve Database performance in heterogeneous environments. Our approaches quantify the performance differences among machines with various resources as they process workloads with diverse resource requirements. For the purpose of better resource utilization, we formalize the problem of minimizing workload execution time and view it as an optimization problem, and then, we employ linear programming to obtain a recommended data partitioning scheme. For the purpose of better resource selection, we formalize two problems: One minimizes the total workload execution time with a given budget, and the other minimizes the total budget with a given performance target. We then employ different mixed-integer programs to search for the optimal resource selection decisions. We verify the effectiveness of both resource bricolage and resource selection techniques with an extensive experimental study.

15 days free trial to Access Article
Resource bricolage for Parallel Database systems

Proceedings of the VLDB Endowment, 2014

Co-Authors: Jiexing Li, Jeffrey Naughton, Rimma V. Nehme

Abstract:

Running Parallel Database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds. For Database systems running in a heterogeneous cluster, the default uniform data partitioning strategy may overload some of the slow machines while at the same time it may under-utilize the more powerful machines. Since the processing time of a Parallel query is determined by the slowest machine, such an allocation strategy may result in a significant query performance degradation.\n\nWe take a first step to address this problem by introducing a technique we call resource bricolage that improves Database performance in heterogeneous environments. Our approach quantifies the performance differences among machines with various resources as they process workloads with diverse resource requirements. We formalize the problem of minimizing workload execution time and view it as an optimization problem, and then we employ linear programming to obtain a recommended data partitioning scheme. We verify the effectiveness of our technique with an extensive experimental study on a commercial Database system.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Spyros Blanas - One of the best experts on this subject based on the ideXlab platform.

design and evaluation of an rdma aware data shuffling operator for Parallel Database systems

design and evaluation of an rdma aware data shuffling operator for Parallel Database systems

David Taniar - One of the best experts on this subject based on the ideXlab platform.

APPT - Multi-scheduler Concurrency Control for Parallel Database Systems

Global B+ tree indexing in Parallel Database systems

IDEAL - Global B+ tree indexing in Parallel Database systems

Parallel Database sorting

A Taxonomy of Indexing Schemes for Parallel Database Systems

Sivarama P. Dandamudi - One of the best experts on this subject based on the ideXlab platform.

A Comparative Study of Soft Real-Time Transaction Scheduling Policies in Parallel Database Systems

IPDPS - Dynamic versus static locking in real-time Parallel Database systems

MASCOTS - Performance of hard real-time transaction scheduling policies in Parallel Database systems

Feilong Liu - One of the best experts on this subject based on the ideXlab platform.

design and evaluation of an rdma aware data shuffling operator for Parallel Database systems

design and evaluation of an rdma aware data shuffling operator for Parallel Database systems

Rimma V. Nehme - One of the best experts on this subject based on the ideXlab platform.

Resource bricolage and resource selection for Parallel Database systems

Resource bricolage for Parallel Database systems