Parallel Database

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 29499 Experts worldwide ranked by ideXlab platform

Spyros Blanas - One of the best experts on this subject based on the ideXlab platform.

  • design and evaluation of an rdma aware data shuffling operator for Parallel Database systems
    ACM Transactions on Database Systems, 2019
    Co-Authors: Feilong Liu, Lingyan Yin, Spyros Blanas
    Abstract:

    The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This article considers how to leverage RDMA to improve the analytical performance of Parallel Database systems. To shuffle data efficiently using RDMA, one needs to consider a complex design space that includes (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to exchange data between nodes during query processing. We contribute eight designs that capture salient tradeoffs in this design space as well as an adaptive algorithm to dynamically manage RDMA-registered memory. We comprehensively evaluate how transport-layer decisions impact the query performance of a Database system for different generations of InfiniBand. We find that a shuffling operator that uses the RDMA Send/Receive transport function over the Unreliable Datagram transport service can transmit data up to 4× faster than an RDMA-capable MPI implementation in a 16-node cluster. The response time of TPC-H queries improves by as much as 2×.

  • design and evaluation of an rdma aware data shuffling operator for Parallel Database systems
    European Conference on Computer Systems, 2017
    Co-Authors: Feilong Liu, Lingyan Yin, Spyros Blanas
    Abstract:

    The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This paper considers how to leverage RDMA to improve the analytical performance of Parallel Database systems. To shuffle data efficiently using RDMA, one needs to consider a complex design space that includes (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to exchange data between nodes during query processing. We contribute six designs that capture salient trade-offs in this design space. We comprehensively evaluate how transport-layer decisions impact the query performance of a Database system for different generations of InfiniBand. We find that a shuffling operator that uses the RDMA Send/Receive transport function over the Unreliable Datagram transport service can transmit data up to 4× faster than an RDMA-capable MPI implementation in a 16-node cluster. The response time of TPC-H queries improves by as much as 2×.

David Taniar - One of the best experts on this subject based on the ideXlab platform.

  • APPT - Multi-scheduler Concurrency Control for Parallel Database Systems
    Lecture Notes in Computer Science, 2003
    Co-Authors: Sushant Goel, Hema Sharda, David Taniar
    Abstract:

    Increase in amount of data stored and requirement of fast response time has motivated the research in Parallel Database Systems (PDS). Requirement for correctness of data still remains one of the major issues. Concurrency control algorithms used by PDS uses single scheduler approach. Single scheduler approach has some inherent weaknesses such as - very big lock tables, overloaded centralized scheduler and more number of messages in the system. In this paper we investigate the possibility of multiple schedulers and conclude that single scheduler algorithms cannot be migrated in the present form to multi-scheduler environment. Next, we propose a Multi-Scheduler Concurrency Control algorithm for PDS that distributes the scheduling responsibilities to the respective Processing Elements. Correctness of the proposed algorithm is then discussed using a different serializability criterion - Parallel Database Quasi-Serializability.

  • Global B+ tree indexing in Parallel Database systems
    Lecture Notes in Computer Science, 2003
    Co-Authors: David Taniar, Wenny Rahayu
    Abstract:

    In this paper, we propose a global B + indexing tree structure for Parallel Database systems, where the index tree is partitioned into multi processors with a possible overlap. We also present algorithms for maintenance of global B + indexing trees (e.g. insertion and deletion of nodes), and describe operation algorithms (e.g. search and join) on tables that are indexed using global B + indexing trees.

  • IDEAL - Global B+ tree indexing in Parallel Database systems
    Intelligent Data Engineering and Automated Learning, 2003
    Co-Authors: David Taniar, Wenny Rahayu
    Abstract:

    In this paper, we propose a global B+ indexing tree structure for Parallel Database systems, where the index tree is partitioned into multi processors with a possible overlap. We also present algorithms for maintenance of global B+ indexing trees (e.g. insertion and deletion of nodes), and describe operation algorithms (e.g. search and join) on tables that are indexed using global B+ indexing trees.

  • Parallel Database sorting
    Information Sciences, 2002
    Co-Authors: David Taniar, Wenny Rahayu
    Abstract:

    Sorting in Database processing is frequently required through the use of Order By and Distinct clauses in SQL. Sorting is also widely known in computer science community at large. Sorting in general covers internal and external sorting. Past published work has extensively focused on external sorting on uni-processors (serial external sorting), and internal sorting on multi-processors (Parallel internal sorting). External sorting on multi-processors (Parallel external sorting) has received surprisingly little attention; furthermore, the way current Parallel Database systems do sorting is far from optimal in many scenarios. In this paper, we present a taxonomy for Parallel sorting in Parallel Database systems, which covers five sorting methods: namely Parallel merge-all sort, Parallel binary-merge sort, Parallel redistribution binary-merge sort, Parallel redistribution merge-all sort, and Parallel partitioned sort. The first two methods are previously proposed approaches to Parallel external sorting which have been adopted as status quo of Parallel Database sorting, whereas the latter three methods which are based on redistribution and repartitioning are new that have not been discussed in the literature of Parallel external sorting. Performance of these five methods is investigated and the results are reported.

  • A Taxonomy of Indexing Schemes for Parallel Database Systems
    Distributed and Parallel Databases, 2002
    Co-Authors: David Taniar, Wenny Rahayu
    Abstract:

    In this paper, we present a taxonomy of indexing schemes in Parallel Database systems. Index partitioning is not recognized widely as yet. One of the reasons is that most of index structures are trees, not flat structures like tables, and consequently, index partitioning imposes some degree of complexity compared with common data partitioning for tables. We present three Parallel indexing schemes, and discuss their maintenance strategies. We also analyze their storage requirements.

Sivarama P. Dandamudi - One of the best experts on this subject based on the ideXlab platform.

  • A Comparative Study of Soft Real-Time Transaction Scheduling Policies in Parallel Database Systems
    High Performance Computing Systems and Applications, 1998
    Co-Authors: S. Takkar, Sivarama P. Dandamudi
    Abstract:

    Real-time Database systems support transactions with timing constraints such as deadlines. Real-time transactions, in addition to preserving the consistency of the Database as in traditional transactions, have to meet the deadlines. Scheduling real-time transactions has received considerable attention in centralized and distributed Databases. However, real-time transaction scheduling in Parallel Database systems has not received much attention. This paper focuses on real-time transaction scheduling in shared-nothing Parallel Database systems. We evaluate the performance of a new priority-based scheduling policy in which all scheduling decisions are made locally by each node. In contrast, several other algorithms, proposed for the distributed systems, require communication among the nodes to globally synchronize their local block and abort decisions. Such synchronization can deteriorate performance as real-time transactions will have to meet deadlines. We use miss ratio and average lateness as the performance metrics and show that, in general, the new policy provides a superior performance for the workload and system parameters considered in this study.

  • IPDPS - Dynamic versus static locking in real-time Parallel Database systems
    18th International Parallel and Distributed Processing Symposium 2004. Proceedings., 1
    Co-Authors: A. Mittal, Sivarama P. Dandamudi
    Abstract:

    Summary form only given. Parallel Database systems are capable of providing significant performance gains in terms of transaction processing rates. These gains are realized by running many transactions concurrently. A requirement in real-time transaction scheduling is to complete the transactions within their deadline. Due to its simplicity, two-phase locking (2PL) is one of the most commonly used concurrency control mechanism. Two alternative methods of securing locks in the 2PL protocol are static locking or dynamic locking. We report performance of the two locking variants of the 2PL protocol under various degrees of resource and data contention in a real-time Parallel Database system.

  • MASCOTS - Performance of hard real-time transaction scheduling policies in Parallel Database systems
    Proceedings. Sixth International Symposium on Modeling Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.98TB100247), 1
    Co-Authors: S. Takkar, Sivarama P. Dandamudi
    Abstract:

    Real-time transactions, in addition to preserving consistency of the Database as in traditional transactions, have to meet the deadlines. Scheduling real-time transactions in Parallel Database systems has not received much attention. This paper focuses on real-time transaction scheduling in shared-nothing Parallel Database systems. We evaluate the performance of a new priority-based scheduling policy, in which all scheduling decisions are made locally by each node. In contrast, several other algorithms, proposed for the distributed systems, require communication among the nodes to globally synchronize their local block and abort decisions. Such synchronization can deteriorate performance as real-time transactions will have to meet deadlines. We use miss ratio as the performance metric and show that, in general, the new policy provides a superior performance for the workload and system parameters considered in this study.

Feilong Liu - One of the best experts on this subject based on the ideXlab platform.

  • design and evaluation of an rdma aware data shuffling operator for Parallel Database systems
    ACM Transactions on Database Systems, 2019
    Co-Authors: Feilong Liu, Lingyan Yin, Spyros Blanas
    Abstract:

    The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This article considers how to leverage RDMA to improve the analytical performance of Parallel Database systems. To shuffle data efficiently using RDMA, one needs to consider a complex design space that includes (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to exchange data between nodes during query processing. We contribute eight designs that capture salient tradeoffs in this design space as well as an adaptive algorithm to dynamically manage RDMA-registered memory. We comprehensively evaluate how transport-layer decisions impact the query performance of a Database system for different generations of InfiniBand. We find that a shuffling operator that uses the RDMA Send/Receive transport function over the Unreliable Datagram transport service can transmit data up to 4× faster than an RDMA-capable MPI implementation in a 16-node cluster. The response time of TPC-H queries improves by as much as 2×.

  • design and evaluation of an rdma aware data shuffling operator for Parallel Database systems
    European Conference on Computer Systems, 2017
    Co-Authors: Feilong Liu, Lingyan Yin, Spyros Blanas
    Abstract:

    The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This paper considers how to leverage RDMA to improve the analytical performance of Parallel Database systems. To shuffle data efficiently using RDMA, one needs to consider a complex design space that includes (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to exchange data between nodes during query processing. We contribute six designs that capture salient trade-offs in this design space. We comprehensively evaluate how transport-layer decisions impact the query performance of a Database system for different generations of InfiniBand. We find that a shuffling operator that uses the RDMA Send/Receive transport function over the Unreliable Datagram transport service can transmit data up to 4× faster than an RDMA-capable MPI implementation in a 16-node cluster. The response time of TPC-H queries improves by as much as 2×.

Rimma V. Nehme - One of the best experts on this subject based on the ideXlab platform.

  • Resource bricolage and resource selection for Parallel Database systems
    The VLDB Journal, 2017
    Co-Authors: Jiexing Li, Jeffrey F. Naughton, Rimma V. Nehme
    Abstract:

    Running Parallel Database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds. Performance differences among machines in the same cluster pose new challenges for Parallel Database systems. First, for Database systems running in a heterogeneous cluster, the default uniform data partitioning strategy may overload some of the slow machines, while at the same time it may underutilize the more powerful machines. Since the processing time of a Parallel query is determined by the slowest machine, such an allocation strategy may result in a significant query performance degradation. Second, since machines might have varying resources or performance, different choices of machines may lead to different costs or performance for executing the same workload. By carefully selecting the most suitable machines for running a workload, we may achieve better performance with the same budget, or we may meet the same performance requirements with a lower cost. We address these challenges by introducing techniques we call resource bricolage and resource selection that improve Database performance in heterogeneous environments. Our approaches quantify the performance differences among machines with various resources as they process workloads with diverse resource requirements. For the purpose of better resource utilization, we formalize the problem of minimizing workload execution time and view it as an optimization problem, and then, we employ linear programming to obtain a recommended data partitioning scheme. For the purpose of better resource selection, we formalize two problems: One minimizes the total workload execution time with a given budget, and the other minimizes the total budget with a given performance target. We then employ different mixed-integer programs to search for the optimal resource selection decisions. We verify the effectiveness of both resource bricolage and resource selection techniques with an extensive experimental study.

  • Resource bricolage for Parallel Database systems
    Proceedings of the VLDB Endowment, 2014
    Co-Authors: Jiexing Li, Jeffrey Naughton, Rimma V. Nehme
    Abstract:

    Running Parallel Database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds. For Database systems running in a heterogeneous cluster, the default uniform data partitioning strategy may overload some of the slow machines while at the same time it may under-utilize the more powerful machines. Since the processing time of a Parallel query is determined by the slowest machine, such an allocation strategy may result in a significant query performance degradation.\n\nWe take a first step to address this problem by introducing a technique we call resource bricolage that improves Database performance in heterogeneous environments. Our approach quantifies the performance differences among machines with various resources as they process workloads with diverse resource requirements. We formalize the problem of minimizing workload execution time and view it as an optimization problem, and then we employ linear programming to obtain a recommended data partitioning scheme. We verify the effectiveness of our technique with an extensive experimental study on a commercial Database system.