Database Analytics

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 7413 Experts worldwide ranked by ideXlab platform

Samuel Madden - One of the best experts on this subject based on the ideXlab platform.

  • a study of the fundamental performance characteristics of gpus and cpus for Database Analytics
    International Conference on Management of Data, 2020
    Co-Authors: Anil Shanbhag, Samuel Madden
    Abstract:

    There has been significant amount of excitement and recent work on GPU-based Database systems. Previous work has claimed that these systems can perform orders of magnitude better than CPU-based Database systems on analytical workloads such as those found in decision support and business intelligence applications. A hardware expert would view these claims with suspicion. Given the general notion that Database operators are memory-bandwidth bound, one would expect the maximum gain to be roughly equal to the ratio of the memory bandwidth of GPU to that of CPU. In this paper, we adopt a model-based approach to understand when and why the performance gains of running queries on GPUs vs on CPUs vary from the bandwidth ratio (which is roughly 16× on modern hardware). We propose Crystal, a library of parallel routines that can be combined together to run full SQL queries on a GPU with minimal materialization overhead. We implement individual query operators to show that while the speedups for selection, projection, and sorts are near the bandwidth ratio, joins achieve less speedup due to differences in hardware capabilities. Interestingly, we show on a popular analytical workload that full query performance gain from running on GPU exceeds the bandwidth ratio despite individual operators having speedup less than bandwidth ratio, as a result of limitations of vectorizing chained operators on CPUs, resulting in a 25× speedup for GPUs over CPUs on the benchmark.

  • starling a scalable query engine on cloud functions
    International Conference on Management of Data, 2020
    Co-Authors: Matthew Perron, Raul Fernandez, David J Dewitt, Samuel Madden
    Abstract:

    Much like on-premises systems, the natural choice for running Database Analytics workloads in the cloud is to provision a cluster of nodes to run a Database instance. However, Analytics workloads are often bursty or low volume, leaving clusters idle much of the time, meaning customers pay for compute resources even when underutilized. The ability of cloud function services, such as AWS Lambda or Azure Functions, to run small, fine granularity tasks make them appear to be a natural choice for query processing in such settings. But implementing an Analytics system on cloud functions comes with its own set of challenges. These include managing hundreds of tiny stateless resource-constrained workers, handling stragglers, and shuffling data through opaque cloud services. In this paper we present Starling, a query execution engine built on cloud function services that employs a number of techniques to mitigate these challenges, providing interactive query latency at a lower total cost than provisioned systems with low-to-moderate utilization. In particular, on a 1TB TPC-H dataset in cloud storage, Starling is less expensive than the best provisioned systems for workloads when queries arrive 1 minute apart or more. Starling also has lower latency than competing systems reading from cloud object stores and can scale to larger datasets.

  • starling a scalable query engine on cloud function services
    arXiv: Databases, 2019
    Co-Authors: Matthew Perron, Raul Fernandez, David J Dewitt, Samuel Madden
    Abstract:

    Much like on-premises systems, the natural choice for running Database Analytics workloads in the cloud is to provision a cluster of nodes to run a Database instance. However, Analytics workloads are often bursty or low volume, leaving clusters idle much of the time, meaning customers pay for compute resources even when unused. The ability of cloud function services, such as AWS Lambda or Azure Functions, to run small, fine granularity tasks make them appear to be a natural choice for query processing in such settings. But implementing an Analytics system on cloud functions comes with its own set of challenges. These include managing hundreds of tiny stateless resource-constrained workers, handling stragglers, and shuffling data through opaque cloud services. In this paper we present Starling, a query execution engine built on cloud function services that employs number of techniques to mitigate these challenges, providing interactive query latency at a lower total cost than provisioned systems with low-to-moderate utilization. In particular, on a 1TB TPC-H dataset in cloud storage, Starling is less expensive than the best provisioned systems for workloads when queries arrive 1 minute apart or more. Starling also has lower latency than competing systems reading from cloud object stores and can scale to larger datasets.

Ronald R Yager - One of the best experts on this subject based on the ideXlab platform.

  • advances in computational intelligence part iii 14th international conference on information processing and management of uncertainty in knowledge based systems ipmu 2012 catania italy july 2012 proceedings part iii
    International Conference Information Processing, 2012
    Co-Authors: Salvatore Greco, Bernadette Bouchonmeunier, Giulianella Coletti, Mario Fedrizzi, Benedetto Matarazzo, Ronald R Yager
    Abstract:

    These four volumes (CCIS 297, 298, 299, 300) constitute the proceedings of the 14th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2012, held in Catania, Italy, in July 2012. The 258 revised full papers presented together with six invited talks were carefully reviewed and selected from numerous submissions. The papers are organized in topical sections on fuzzy machine learning and on-line modeling; computing with words and decision making; soft computing in computer vision; rough sets and complex data analysis: theory and applications; intelligent Databases and information system; information fusion systems; philosophical and methodological aspects of soft computing; basic issues in rough sets; 40th anniversary of the measures of fuziness; SPS11 uncertainty in profiling systems and applications; handling uncertainty with copulas; formal methods to deal with uncertainty of many-valued events; linguistic summarization and description of data; fuzzy implications: theory and applications; sensing and data mining for teaching and learning; theory and applications of intuitionistic fuzzy sets; approximate aspects of data mining and Database Analytics; fuzzy numbers and their applications; information processing and management of uncertainty in knowledge-based systems; aggregation functions; imprecise probabilities; probabilistic graphical models with imprecision: theory and applications; belief function theory: basics and/or applications; fuzzy uncertainty in economics and business; new trends in De Finetti's approach; fuzzy measures and integrals; multicriteria decision making; uncertainty in privacy and security; uncertainty in the spirit of Pietro Benvenuti; coopetition; game theory; probabilistic approach.

  • advances on computational intelligence 14th international conference on information processing and management of uncertainty in knowledge based systems proceedings part 1
    2012
    Co-Authors: Salvatore Greco, Bernadette Bouchonmeunier, Giulianella Coletti, Mario Fedrizzi, Benedetto Matarazzo, Ronald R Yager
    Abstract:

    These four volumes (CCIS 297, 298, 299, 300) constitute the proceedings of the 14th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2012, held in Catania, Italy, in July 2012. The 258 revised full papers presented together with six invited talks were carefully reviewed and selected from numerous submissions. The papers are organized in topical sections on fuzzy machine learning and on-line modeling; computing with words and decision making; soft computing in computer vision; rough sets and complex data analysis: theory and applications; intelligent Databases and information system; information fusion systems; philosophical and methodological aspects of soft computing; basic issues in rough sets; 40th anniversary of the measures of fuziness; SPS11 uncertainty in profiling systems and applications; handling uncertainty with copulas; formal methods to deal with uncertainty of many-valued events; linguistic summarization and description of data; fuzzy implications: theory and applications; sensing and data mining for teaching and learning; theory and applications of intuitionistic fuzzy sets; approximate aspects of data mining and Database Analytics; fuzzy numbers and their applications; information processing and management of uncertainty in knowledge-based systems; aggregation functions; imprecise probabilities; probabilistic graphical models with imprecision: theory and applications; belief function theory: basics and/or applications; fuzzy uncertainty in economics and business; new trends in De Finetti's approach; fuzzy measures and integrals; multicriteria decision making; uncertainty in privacy and security; uncertainty in the spirit of Pietro Benvenuti; coopetition; game theory; probabilistic approach.

Varia Mayank - One of the best experts on this subject based on the ideXlab platform.

  • Parallel Vectorized Algebraic AES in MATLAB for Rapid Prototyping of Encrypted Sensor Processing Algorithms and Database Analytics
    'Institute of Electrical and Electronics Engineers (IEEE)', 2015
    Co-Authors: Kepner Jeremy, Gadepally Vijay, Hancock Braden, Michaleas Peter, Michel Elizabeth, Varia Mayank
    Abstract:

    The increasing use of networked sensor systems and networked Databases has led to an increased interest in incorporating encryption directly into sensor algorithms and Database Analytics. MATLAB is the dominant tool for rapid prototyping of sensor algorithms and has extensive Database Analytics capabilities. The advent of high level and high performance Galois Field mathematical environments allows encryption algorithms to be expressed succinctly and efficiently. This work leverages the Galois Field primitives found the MATLAB Communication Toolbox to implement a mode of the Advanced Encrypted Standard (AES) based on first principals mathematics. The resulting implementation requires 100x less code than standard AES implementations and delivers speed that is effective for many design purposes. The parallel version achieves speed comparable to native OpenSSL on a single node and is sufficient for real-time prototyping of many sensor processing algorithms and Database Analytics.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing Conference (HPEC) 201

Mayank Varia - One of the best experts on this subject based on the ideXlab platform.

  • parallel vectorized algebraic aes in matlab for rapid prototyping of encrypted sensor processing algorithms and Database Analytics
    IEEE High Performance Extreme Computing Conference, 2015
    Co-Authors: Jeremy Kepner, Vijay Gadepally, Braden Hancock, Peter Michaleas, Elizabeth Michel, Mayank Varia
    Abstract:

    The increasing use of networked sensor systems and networked Databases has led to an increased interest in incorporating encryption directly into sensor algorithms and Database Analytics. Matlab is the dominant tool for rapid prototyping of sensor algorithms and has extensive Database Analytics capabilities. The advent of high level and high performance Galois Field mathematical environments allows encryption algorithms to be expressed succinctly and efficiently. This work leverages the Galois Field primitives found the Matlab Communication Toolbox to implement a mode of the Advanced Encrypted Standard (AES) based on first principals mathematics. The resulting implementation requires 100x less code than standard AES implementations and delivers speed that is effective for many design purposes. The parallel version achieves speed comparable to native OpenSSL on a single node and is sufficient for real-time prototyping of many sensor processing algorithms and Database Analytics.

Salvatore Greco - One of the best experts on this subject based on the ideXlab platform.

  • advances in computational intelligence part iii 14th international conference on information processing and management of uncertainty in knowledge based systems ipmu 2012 catania italy july 2012 proceedings part iii
    International Conference Information Processing, 2012
    Co-Authors: Salvatore Greco, Bernadette Bouchonmeunier, Giulianella Coletti, Mario Fedrizzi, Benedetto Matarazzo, Ronald R Yager
    Abstract:

    These four volumes (CCIS 297, 298, 299, 300) constitute the proceedings of the 14th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2012, held in Catania, Italy, in July 2012. The 258 revised full papers presented together with six invited talks were carefully reviewed and selected from numerous submissions. The papers are organized in topical sections on fuzzy machine learning and on-line modeling; computing with words and decision making; soft computing in computer vision; rough sets and complex data analysis: theory and applications; intelligent Databases and information system; information fusion systems; philosophical and methodological aspects of soft computing; basic issues in rough sets; 40th anniversary of the measures of fuziness; SPS11 uncertainty in profiling systems and applications; handling uncertainty with copulas; formal methods to deal with uncertainty of many-valued events; linguistic summarization and description of data; fuzzy implications: theory and applications; sensing and data mining for teaching and learning; theory and applications of intuitionistic fuzzy sets; approximate aspects of data mining and Database Analytics; fuzzy numbers and their applications; information processing and management of uncertainty in knowledge-based systems; aggregation functions; imprecise probabilities; probabilistic graphical models with imprecision: theory and applications; belief function theory: basics and/or applications; fuzzy uncertainty in economics and business; new trends in De Finetti's approach; fuzzy measures and integrals; multicriteria decision making; uncertainty in privacy and security; uncertainty in the spirit of Pietro Benvenuti; coopetition; game theory; probabilistic approach.

  • advances on computational intelligence 14th international conference on information processing and management of uncertainty in knowledge based systems proceedings part 1
    2012
    Co-Authors: Salvatore Greco, Bernadette Bouchonmeunier, Giulianella Coletti, Mario Fedrizzi, Benedetto Matarazzo, Ronald R Yager
    Abstract:

    These four volumes (CCIS 297, 298, 299, 300) constitute the proceedings of the 14th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2012, held in Catania, Italy, in July 2012. The 258 revised full papers presented together with six invited talks were carefully reviewed and selected from numerous submissions. The papers are organized in topical sections on fuzzy machine learning and on-line modeling; computing with words and decision making; soft computing in computer vision; rough sets and complex data analysis: theory and applications; intelligent Databases and information system; information fusion systems; philosophical and methodological aspects of soft computing; basic issues in rough sets; 40th anniversary of the measures of fuziness; SPS11 uncertainty in profiling systems and applications; handling uncertainty with copulas; formal methods to deal with uncertainty of many-valued events; linguistic summarization and description of data; fuzzy implications: theory and applications; sensing and data mining for teaching and learning; theory and applications of intuitionistic fuzzy sets; approximate aspects of data mining and Database Analytics; fuzzy numbers and their applications; information processing and management of uncertainty in knowledge-based systems; aggregation functions; imprecise probabilities; probabilistic graphical models with imprecision: theory and applications; belief function theory: basics and/or applications; fuzzy uncertainty in economics and business; new trends in De Finetti's approach; fuzzy measures and integrals; multicriteria decision making; uncertainty in privacy and security; uncertainty in the spirit of Pietro Benvenuti; coopetition; game theory; probabilistic approach.