Data Warehouse Appliance

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 78 Experts worldwide ranked by ideXlab platform

Ravi Krishnamurthy - One of the best experts on this subject based on the ideXlab platform.

  • SIGMOD Conference - A Data Warehouse Appliance for the mass market
    Proceedings of the 35th SIGMOD international conference on Management of data - SIGMOD '09, 2009
    Co-Authors: Ravi Krishnamurthy
    Abstract:

    Vast majority of the Data Warehouses have less than few terabytes of Data and their performance for complex queries on traditional Database systems are often not very satisfactory. Data Warehouse Appliances have been announced by vendors (HP Oracle ExaData Storage server, HP Neoview, Neteeza etc.) to address this burgeoning need. Most of these involve creating a large parallel Database systems using scale-out of commodity machines and/or pushing filters into disk retrieval system to reduce the Data coming to memory; these done along the lines pioneered by research projects such as Gamma, Bubba and other prior Database machine research. These approaches deliver performance by deploying many CPUs, large amount of memory, large number of disk-heads & disk space and in effect extracting performance by under utilizing the resources -- albeit very inexpensive commodity resources. In contrast we propose a Database system in a box (i.e., a single system) that can deliver high performance for complex queries while utilizing much less resources (memory, disks etc.); i.e., better resource utilization and therefore lower cost. This approach consists of using column store (pioneered in the Bubba project) which has the effect of 1) reducing the need for large number of disk heads (i.e., I/O bandwidth); and 2) reducing the need for large amount of memory for achieving memory-resident query execution. Having mitigated the disk I/O problem using column store & memory, the Von Neumann bottleneck becomes the force majeure. This problem has been pursued by Database researchers in the context of cache-conscious query execution. Unfortunately, traditional CPUs provide limited control to "page" the Data into the cache and retain it there to leverage the cache effectively. Our approach is to leverage a custom Dataflow machine that can be coupled with a large memory and thereby practically eliminating the Von Neumann bottleneck. Besides mitigating this bottleneck, the exploitation of fine-grained pipelined and operator parallelism in hardware provides significant performance improvement. This results in a low-cost high-performance Database Appliance for vast majority of the Data Warehouse market. Kickfire has shown that such an Appliance can deliver both price/performance and raw performance as compared to the competitive approaches. Note that this high performance Appliance does not preclude leveraging scale-out; i.e., it can itself be used to scale-out to a much larger Database in the future.

  • a Data Warehouse Appliance for the mass market
    International Conference on Management of Data, 2009
    Co-Authors: Ravi Krishnamurthy
    Abstract:

    Vast majority of the Data Warehouses have less than few terabytes of Data and their performance for complex queries on traditional Database systems are often not very satisfactory. Data Warehouse Appliances have been announced by vendors (HP Oracle ExaData Storage server, HP Neoview, Neteeza etc.) to address this burgeoning need. Most of these involve creating a large parallel Database systems using scale-out of commodity machines and/or pushing filters into disk retrieval system to reduce the Data coming to memory; these done along the lines pioneered by research projects such as Gamma, Bubba and other prior Database machine research. These approaches deliver performance by deploying many CPUs, large amount of memory, large number of disk-heads & disk space and in effect extracting performance by under utilizing the resources -- albeit very inexpensive commodity resources. In contrast we propose a Database system in a box (i.e., a single system) that can deliver high performance for complex queries while utilizing much less resources (memory, disks etc.); i.e., better resource utilization and therefore lower cost. This approach consists of using column store (pioneered in the Bubba project) which has the effect of 1) reducing the need for large number of disk heads (i.e., I/O bandwidth); and 2) reducing the need for large amount of memory for achieving memory-resident query execution. Having mitigated the disk I/O problem using column store & memory, the Von Neumann bottleneck becomes the force majeure. This problem has been pursued by Database researchers in the context of cache-conscious query execution. Unfortunately, traditional CPUs provide limited control to "page" the Data into the cache and retain it there to leverage the cache effectively. Our approach is to leverage a custom Dataflow machine that can be coupled with a large memory and thereby practically eliminating the Von Neumann bottleneck. Besides mitigating this bottleneck, the exploitation of fine-grained pipelined and operator parallelism in hardware provides significant performance improvement. This results in a low-cost high-performance Database Appliance for vast majority of the Data Warehouse market. Kickfire has shown that such an Appliance can deliver both price/performance and raw performance as compared to the competitive approaches. Note that this high performance Appliance does not preclude leveraging scale-out; i.e., it can itself be used to scale-out to a much larger Database in the future.

  • SIGMOD Conference - A Data Warehouse Appliance for the mass market
    Proceedings of the 35th SIGMOD international conference on Management of data - SIGMOD '09, 2009
    Co-Authors: Ravi Krishnamurthy
    Abstract:

    Vast majority of the Data Warehouses have less than few terabytes of Data and their performance for complex queries on traditional Database systems are often not very satisfactory. Data Warehouse Appliances have been announced by vendors (HP Oracle ExaData Storage server, HP Neoview, Neteeza etc.) to address this burgeoning need. Most of these involve creating a large parallel Database systems using scale-out of commodity machines and/or pushing filters into disk retrieval system to reduce the Data coming to memory; these done along the lines pioneered by research projects such as Gamma, Bubba and other prior Database machine research. These approaches deliver performance by deploying many CPUs, large amount of memory, large number of disk-heads & disk space and in effect extracting performance by under utilizing the resources -- albeit very inexpensive commodity resources. In contrast we propose a Database system in a box (i.e., a single system) that can deliver high performance for complex queries while utilizing much less resources (memory, disks etc.); i.e., better resource utilization and therefore lower cost. This approach consists of using column store (pioneered in the Bubba project) which has the effect of 1) reducing the need for large number of disk heads (i.e., I/O bandwidth); and 2) reducing the need for large amount of memory for achieving memory-resident query execution. Having mitigated the disk I/O problem using column store & memory, the Von Neumann bottleneck becomes the force majeure. This problem has been pursued by Database researchers in the context of cache-conscious query execution. Unfortunately, traditional CPUs provide limited control to "page" the Data into the cache and retain it there to leverage the cache effectively. Our approach is to leverage a custom Dataflow machine that can be coupled with a large memory and thereby practically eliminating the Von Neumann bottleneck. Besides mitigating this bottleneck, the exploitation of fine-grained pipelined and operator parallelism in hardware provides significant performance improvement. This results in a low-cost high-performance Database Appliance for vast majority of the Data Warehouse market. Kickfire has shown that such an Appliance can deliver both price/performance and raw performance as compared to the competitive approaches. Note that this high performance Appliance does not preclude leveraging scale-out; i.e., it can itself be used to scale-out to a much larger Database in the future.

Marco Czech - One of the best experts on this subject based on the ideXlab platform.

  • architecture of a highly scalable Data Warehouse Appliance integrated to mainframe Database systems
    BTW, 2011
    Co-Authors: Knut Stolze, Felix Beier, Kaiuwe Sattler, Sebastian Sprenger, Carlos Caballero Grolimund, Marco Czech
    Abstract:

    Main memory processing and Data compression are valuable techniques to address the new challenges of Data warehousing regarding scalability, large Data volumes, near realtime response times, and the tight connection to OLTP. The IBM Smart Analytics Optimizer (ISAOPT) is a Data Warehouse Appliance that implements a main memory Database system for OLAP workloads using a cluster-based architecture. It is tightly integrated with IBM DB2 for z/OS (DB2) to speed up complex queries issued against DB2. In this paper, we focus on autonomic cluster management, high availability, and incremental update mechanisms for Data maintenance in ISAOPT.

  • BTW - Architecture of a Highly Scalable Data Warehouse Appliance Integrated to Mainframe Database Systems.
    2011
    Co-Authors: Knut Stolze, Felix Beier, Kaiuwe Sattler, Sebastian Sprenger, Carlos Caballero Grolimund, Marco Czech
    Abstract:

    Main memory processing and Data compression are valuable techniques to address the new challenges of Data warehousing regarding scalability, large Data volumes, near realtime response times, and the tight connection to OLTP. The IBM Smart Analytics Optimizer (ISAOPT) is a Data Warehouse Appliance that implements a main memory Database system for OLAP workloads using a cluster-based architecture. It is tightly integrated with IBM DB2 for z/OS (DB2) to speed up complex queries issued against DB2. In this paper, we focus on autonomic cluster management, high availability, and incremental update mechanisms for Data maintenance in ISAOPT.

Knut Stolze - One of the best experts on this subject based on the ideXlab platform.

  • architecture of a highly scalable Data Warehouse Appliance integrated to mainframe Database systems
    BTW, 2011
    Co-Authors: Knut Stolze, Felix Beier, Kaiuwe Sattler, Sebastian Sprenger, Carlos Caballero Grolimund, Marco Czech
    Abstract:

    Main memory processing and Data compression are valuable techniques to address the new challenges of Data warehousing regarding scalability, large Data volumes, near realtime response times, and the tight connection to OLTP. The IBM Smart Analytics Optimizer (ISAOPT) is a Data Warehouse Appliance that implements a main memory Database system for OLAP workloads using a cluster-based architecture. It is tightly integrated with IBM DB2 for z/OS (DB2) to speed up complex queries issued against DB2. In this paper, we focus on autonomic cluster management, high availability, and incremental update mechanisms for Data maintenance in ISAOPT.

  • BTW - Architecture of a Highly Scalable Data Warehouse Appliance Integrated to Mainframe Database Systems.
    2011
    Co-Authors: Knut Stolze, Felix Beier, Kaiuwe Sattler, Sebastian Sprenger, Carlos Caballero Grolimund, Marco Czech
    Abstract:

    Main memory processing and Data compression are valuable techniques to address the new challenges of Data warehousing regarding scalability, large Data volumes, near realtime response times, and the tight connection to OLTP. The IBM Smart Analytics Optimizer (ISAOPT) is a Data Warehouse Appliance that implements a main memory Database system for OLAP workloads using a cluster-based architecture. It is tightly integrated with IBM DB2 for z/OS (DB2) to speed up complex queries issued against DB2. In this paper, we focus on autonomic cluster management, high availability, and incremental update mechanisms for Data maintenance in ISAOPT.

Vipin Chaudhary - One of the best experts on this subject based on the ideXlab platform.

  • DISCS@SC - Large Data and computation in a hazard map workflow using Hadoop and Neteeza architectures
    Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems - DISCS-2013, 2013
    Co-Authors: Shivaswamy Rohit, Abani K. Patra, Vipin Chaudhary
    Abstract:

    Uncertainty Quantification(UQ) using simulation ensembles leads to twin challenges of managing large amount of Data and performing cpu intensive computing. While algorithmic innovations using surrogates, localization and parallelization can make the problem feasible one still has very large Data and compute tasks. Such integration of large Data analytics and computationally expensive tasks is increasingly common. We present here an approach to solving this problem by using a mix of hardware and a workflow that maps tasks to appropriate hardware. We experiment with two computing environments -- the first is an integration of a Netezza Data Warehouse Appliance and a high performance cluster and the second a hadoop based environment. Our approach is based on segregating the Data intensive and compute intensive tasks and assigning the right architecture to each. We present here the computing models and the new schemes in the context of generating probabilistic hazard maps using ensemble runs of the volcanic debris avalanche simulator TITAN2D and UQ methodology.

  • XtremeData dbX: An FPGA-Based Data Warehouse Appliance
    Computing in Science & Engineering, 2010
    Co-Authors: Todd C. Scofield, Jeffrey A. Delmerico, Vipin Chaudhary, Geno Valente
    Abstract:

    FPGA-based architectures are known for their applicability to embedded systems. The article looks at how recent developments make it possible to exploit this technology's benefits for large-scale systems targeting compute- and Data-intensive applications.

  • HiPC - Comparing the performance of clusters, Hadoop, and Active Disks on microarray correlation computations
    2009 International Conference on High Performance Computing (HiPC), 2009
    Co-Authors: Jeffrey A. Delmerico, Nathanial A. Byrnes, Andrew E. Bruno, Matthew D. Jones, Steven M. Gallo, Vipin Chaudhary
    Abstract:

    Microarray-based comparative genomic hybridization (aCGH) offers an increasingly fine-grained method for detecting copy number variations in DNA. These copy number variations can directly influence the expression of the proteins that are encoded in the genes in question. A useful analysis of the Data produced from these microarray experiments is pairwise correlation. However, the high resolution of today's microarray technology requires that supercomputing computation and storage resources be leveraged in order to perform this analysis. This application is an exemplar of the class of Data intensive problems which require high-throughput I/O in order to be tractable. Although the performance of these types of applications on a cluster can be improved by parallelization, storage hardware and network limitations restrict the scalability of an I/O-bound application such as this. The Hadoop software framework is designed to enable Data-intensive applications on cluster architectures, and offers significantly better scalability due to its distributed file system. However, specialized architecture adhering to the Active Disk paradigm, in which compute power is placed close to the disk instead of across a network, can further improve performance. The Netezza Corporation's Database systems are designed around the Active Disk approach, and offer tremendous gains in implementing this application over the traditional cluster architecture. We present methods and performance analyses of several implementations of this application: on a cluster, on a cluster with a parallel file system, with Hadoop on a cluster, and using a Netezza Data Warehouse Appliance. Our results offer benchmarks for the performance of Data intensive applications within these distributed computing paradigms.1

Felix Beier - One of the best experts on this subject based on the ideXlab platform.

  • architecture of a highly scalable Data Warehouse Appliance integrated to mainframe Database systems
    BTW, 2011
    Co-Authors: Knut Stolze, Felix Beier, Kaiuwe Sattler, Sebastian Sprenger, Carlos Caballero Grolimund, Marco Czech
    Abstract:

    Main memory processing and Data compression are valuable techniques to address the new challenges of Data warehousing regarding scalability, large Data volumes, near realtime response times, and the tight connection to OLTP. The IBM Smart Analytics Optimizer (ISAOPT) is a Data Warehouse Appliance that implements a main memory Database system for OLAP workloads using a cluster-based architecture. It is tightly integrated with IBM DB2 for z/OS (DB2) to speed up complex queries issued against DB2. In this paper, we focus on autonomic cluster management, high availability, and incremental update mechanisms for Data maintenance in ISAOPT.

  • BTW - Architecture of a Highly Scalable Data Warehouse Appliance Integrated to Mainframe Database Systems.
    2011
    Co-Authors: Knut Stolze, Felix Beier, Kaiuwe Sattler, Sebastian Sprenger, Carlos Caballero Grolimund, Marco Czech
    Abstract:

    Main memory processing and Data compression are valuable techniques to address the new challenges of Data warehousing regarding scalability, large Data volumes, near realtime response times, and the tight connection to OLTP. The IBM Smart Analytics Optimizer (ISAOPT) is a Data Warehouse Appliance that implements a main memory Database system for OLAP workloads using a cluster-based architecture. It is tightly integrated with IBM DB2 for z/OS (DB2) to speed up complex queries issued against DB2. In this paper, we focus on autonomic cluster management, high availability, and incremental update mechanisms for Data maintenance in ISAOPT.