Query Execution

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 20946 Experts worldwide ranked by ideXlab platform

Jeffrey F Naughton - One of the best experts on this subject based on the ideXlab platform.

  • Uncertainty Aware Query Execution Time Prediction
    arXiv: Databases, 2014
    Co-Authors: Hakan Hacigümuş, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is a fundamental issue underlying many database management tasks. Existing predictors rely on information such as cardinality estimates and system performance constants that are difficult to know exactly. As a result, accurate prediction still remains elusive for many queries. However, existing predictors provide a single, point estimate of the true Execution time, but fail to characterize the uncertainty in the prediction. In this paper, we take a first step towards providing uncertainty information along with Query Execution time predictions. We use the Query optimizer's cost model to represent the Query Execution time as a function of the selectivities of operators in the Query plan as well as the constants that describe the cost of CPU and I/O operations in the system. By treating these quantities as random variables rather than constants, we show that with low overhead we can infer the distribution of likely prediction errors. We further show that the estimated prediction errors by our proposed techniques are strongly correlated with the actual prediction errors.

  • Towards predicting Query Execution time for concurrent and dynamic database workloads
    Proceedings of the VLDB Endowment, 2013
    Co-Authors: Yun Chi, Hakan Hacigümuş, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is crucial for many database management tasks including admission control, Query scheduling, and progress monitoring. While a number of recent papers have explored this problem, the bulk of the existing work either considers prediction for a single Query, or prediction for a static workload of concurrent queries, where by "static" we mean that the queries to be run are fixed and known. In this paper, we consider the more general problem of dynamic concurrent workloads. Unlike most previous work on Query Execution time prediction, our proposed framework is based on analytic modeling rather than machine learning. We first use the optimizer's cost model to estimate the I/O and CPU requirements for each pipeline of each Query in isolation, and then use a combination queueing model and buffer pool model that merges the I/O and CPU requests from concurrent queries to predict running times. We compare the proposed approach with a machine-learning based approach that is a variant of previous work. Our experiments show that our analytic-model based approach can lead to competitive and often better prediction accuracy than its machine-learning based counterpart.

  • Predicting Query Execution time: Are optimizer cost models really unusable?
    Proceedings - International Conference on Data Engineering, 2013
    Co-Authors: Wentao Wu, Junichi Tatemura, Hakan Hacigümuş, Shenghuo Zhu, Yun Chi, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is useful in many database management issues including admission control, Query scheduling, progress monitoring, and system sizing. Recently the research community has been exploring the use of statistical machine learning approaches to build predictive models for this task. An implicit assumption behind this work is that the cost models used by Query optimizers are insufficient for Query Execution time prediction. In this paper we challenge this assumption and show while the simple approach of scaling the optimizer's estimated cost indeed fails, a properly calibrated optimizer cost model is surprisingly effective. However, even a well-tuned optimizer cost model will fail in the presence of errors in cardinality estimates. Accordingly we investigate the novel idea of spending extra resources to refine estimates for the Query plan after it has been chosen by the optimizer but before Execution. In our experiments we find that a well calibrated Query optimizer model along with cardinality estimation refinement provides a low overhead way to provide estimates that are always competitive and often much better than the best reported numbers from the machine learning approaches.

  • ICDE - Predicting Query Execution time: Are optimizer cost models really unusable?
    2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013
    Co-Authors: Yun Chi, Junichi Tatemura, Hakan Hacigümuş, Shenghuo Zhu, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is useful in many database management issues including admission control, Query scheduling, progress monitoring, and system sizing. Recently the research community has been exploring the use of statistical machine learning approaches to build predictive models for this task. An implicit assumption behind this work is that the cost models used by Query optimizers are insufficient for Query Execution time prediction. In this paper we challenge this assumption and show while the simple approach of scaling the optimizer's estimated cost indeed fails, a properly calibrated optimizer cost model is surprisingly effective. However, even a well-tuned optimizer cost model will fail in the presence of errors in cardinality estimates. Accordingly we investigate the novel idea of spending extra resources to refine estimates for the Query plan after it has been chosen by the optimizer but before Execution. In our experiments we find that a well calibrated Query optimizer model along with cardinality estimation refinement provides a low overhead way to provide estimates that are always competitive and often much better than the best reported numbers from the machine learning approaches.

Hakan Hacigümuş - One of the best experts on this subject based on the ideXlab platform.

  • Uncertainty Aware Query Execution Time Prediction
    arXiv: Databases, 2014
    Co-Authors: Hakan Hacigümuş, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is a fundamental issue underlying many database management tasks. Existing predictors rely on information such as cardinality estimates and system performance constants that are difficult to know exactly. As a result, accurate prediction still remains elusive for many queries. However, existing predictors provide a single, point estimate of the true Execution time, but fail to characterize the uncertainty in the prediction. In this paper, we take a first step towards providing uncertainty information along with Query Execution time predictions. We use the Query optimizer's cost model to represent the Query Execution time as a function of the selectivities of operators in the Query plan as well as the constants that describe the cost of CPU and I/O operations in the system. By treating these quantities as random variables rather than constants, we show that with low overhead we can infer the distribution of likely prediction errors. We further show that the estimated prediction errors by our proposed techniques are strongly correlated with the actual prediction errors.

  • Towards predicting Query Execution time for concurrent and dynamic database workloads
    Proceedings of the VLDB Endowment, 2013
    Co-Authors: Yun Chi, Hakan Hacigümuş, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is crucial for many database management tasks including admission control, Query scheduling, and progress monitoring. While a number of recent papers have explored this problem, the bulk of the existing work either considers prediction for a single Query, or prediction for a static workload of concurrent queries, where by "static" we mean that the queries to be run are fixed and known. In this paper, we consider the more general problem of dynamic concurrent workloads. Unlike most previous work on Query Execution time prediction, our proposed framework is based on analytic modeling rather than machine learning. We first use the optimizer's cost model to estimate the I/O and CPU requirements for each pipeline of each Query in isolation, and then use a combination queueing model and buffer pool model that merges the I/O and CPU requests from concurrent queries to predict running times. We compare the proposed approach with a machine-learning based approach that is a variant of previous work. Our experiments show that our analytic-model based approach can lead to competitive and often better prediction accuracy than its machine-learning based counterpart.

  • Predicting Query Execution time: Are optimizer cost models really unusable?
    Proceedings - International Conference on Data Engineering, 2013
    Co-Authors: Wentao Wu, Junichi Tatemura, Hakan Hacigümuş, Shenghuo Zhu, Yun Chi, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is useful in many database management issues including admission control, Query scheduling, progress monitoring, and system sizing. Recently the research community has been exploring the use of statistical machine learning approaches to build predictive models for this task. An implicit assumption behind this work is that the cost models used by Query optimizers are insufficient for Query Execution time prediction. In this paper we challenge this assumption and show while the simple approach of scaling the optimizer's estimated cost indeed fails, a properly calibrated optimizer cost model is surprisingly effective. However, even a well-tuned optimizer cost model will fail in the presence of errors in cardinality estimates. Accordingly we investigate the novel idea of spending extra resources to refine estimates for the Query plan after it has been chosen by the optimizer but before Execution. In our experiments we find that a well calibrated Query optimizer model along with cardinality estimation refinement provides a low overhead way to provide estimates that are always competitive and often much better than the best reported numbers from the machine learning approaches.

  • ICDE - Predicting Query Execution time: Are optimizer cost models really unusable?
    2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013
    Co-Authors: Yun Chi, Junichi Tatemura, Hakan Hacigümuş, Shenghuo Zhu, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is useful in many database management issues including admission control, Query scheduling, progress monitoring, and system sizing. Recently the research community has been exploring the use of statistical machine learning approaches to build predictive models for this task. An implicit assumption behind this work is that the cost models used by Query optimizers are insufficient for Query Execution time prediction. In this paper we challenge this assumption and show while the simple approach of scaling the optimizer's estimated cost indeed fails, a properly calibrated optimizer cost model is surprisingly effective. However, even a well-tuned optimizer cost model will fail in the presence of errors in cardinality estimates. Accordingly we investigate the novel idea of spending extra resources to refine estimates for the Query plan after it has been chosen by the optimizer but before Execution. In our experiments we find that a well calibrated Query optimizer model along with cardinality estimation refinement provides a low overhead way to provide estimates that are always competitive and often much better than the best reported numbers from the machine learning approaches.

Yun Chi - One of the best experts on this subject based on the ideXlab platform.

  • Towards predicting Query Execution time for concurrent and dynamic database workloads
    Proceedings of the VLDB Endowment, 2013
    Co-Authors: Yun Chi, Hakan Hacigümuş, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is crucial for many database management tasks including admission control, Query scheduling, and progress monitoring. While a number of recent papers have explored this problem, the bulk of the existing work either considers prediction for a single Query, or prediction for a static workload of concurrent queries, where by "static" we mean that the queries to be run are fixed and known. In this paper, we consider the more general problem of dynamic concurrent workloads. Unlike most previous work on Query Execution time prediction, our proposed framework is based on analytic modeling rather than machine learning. We first use the optimizer's cost model to estimate the I/O and CPU requirements for each pipeline of each Query in isolation, and then use a combination queueing model and buffer pool model that merges the I/O and CPU requests from concurrent queries to predict running times. We compare the proposed approach with a machine-learning based approach that is a variant of previous work. Our experiments show that our analytic-model based approach can lead to competitive and often better prediction accuracy than its machine-learning based counterpart.

  • Predicting Query Execution time: Are optimizer cost models really unusable?
    Proceedings - International Conference on Data Engineering, 2013
    Co-Authors: Wentao Wu, Junichi Tatemura, Hakan Hacigümuş, Shenghuo Zhu, Yun Chi, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is useful in many database management issues including admission control, Query scheduling, progress monitoring, and system sizing. Recently the research community has been exploring the use of statistical machine learning approaches to build predictive models for this task. An implicit assumption behind this work is that the cost models used by Query optimizers are insufficient for Query Execution time prediction. In this paper we challenge this assumption and show while the simple approach of scaling the optimizer's estimated cost indeed fails, a properly calibrated optimizer cost model is surprisingly effective. However, even a well-tuned optimizer cost model will fail in the presence of errors in cardinality estimates. Accordingly we investigate the novel idea of spending extra resources to refine estimates for the Query plan after it has been chosen by the optimizer but before Execution. In our experiments we find that a well calibrated Query optimizer model along with cardinality estimation refinement provides a low overhead way to provide estimates that are always competitive and often much better than the best reported numbers from the machine learning approaches.

  • ICDE - Predicting Query Execution time: Are optimizer cost models really unusable?
    2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013
    Co-Authors: Yun Chi, Junichi Tatemura, Hakan Hacigümuş, Shenghuo Zhu, Jeffrey F Naughton
    Abstract:

    Predicting Query Execution time is useful in many database management issues including admission control, Query scheduling, progress monitoring, and system sizing. Recently the research community has been exploring the use of statistical machine learning approaches to build predictive models for this task. An implicit assumption behind this work is that the cost models used by Query optimizers are insufficient for Query Execution time prediction. In this paper we challenge this assumption and show while the simple approach of scaling the optimizer's estimated cost indeed fails, a properly calibrated optimizer cost model is surprisingly effective. However, even a well-tuned optimizer cost model will fail in the presence of errors in cardinality estimates. Accordingly we investigate the novel idea of spending extra resources to refine estimates for the Query plan after it has been chosen by the optimizer but before Execution. In our experiments we find that a well calibrated Query optimizer model along with cardinality estimation refinement provides a low overhead way to provide estimates that are always competitive and often much better than the best reported numbers from the machine learning approaches.

Angela Demke Brown - One of the best experts on this subject based on the ideXlab platform.

  • speeding up spatial database Query Execution using gpus
    International Conference on Conceptual Structures, 2012
    Co-Authors: Bogdan Simion, Suprio Ray, Angela Demke Brown
    Abstract:

    Abstract Spatial databases are used in a wide variety of real-world applications, such as land surveying, urban planning, and environmental assessments, as well as geospatial Web services. As uses of spatial databases become more widespread, there is a growing need for good performance of spatial applications. In spatial workloads, queries tend to be computationally-intensive due to the complex processing of geometric relationships. Furthermore, a significant fraction of spatial Query Execution time is spent on CPU stalls due to memory accesses, caused by the ever-increasing processor-memory speed gap. With the advent of massively-parallel graphics-processing hardware (GPUs) and frameworks like CUDA, opportunities for speeding up spatial processing have emerged. In addition to massive parallelism, GPUs can also better hide the memory latency.We aim to speed up spatial Query Execution using CUDA and recent GPU cards. One of the main challenges in using GPUs is the transfer time from main memory to GPU memory. We implement a set of six typical spatial queries and achieve a baseline speedup (without the transfer cost) of 62-318x over the CPU counterparts. We show that the transfer cost can be amortized over the Execution of each individual Query. For simpler spatial queries, the transfer time is a significant fraction of the Query Execution time, but we still achieve a 6-10x speedup. For more complex spatial queries, the transfer time becomes negligible compared to the processing time, and we obtain a 62-240x speedup.

  • ICCS - Speeding up Spatial Database Query Execution using GPUs
    Procedia Computer Science, 2012
    Co-Authors: Bogdan Simion, Suprio Ray, Angela Demke Brown
    Abstract:

    AbstractSpatial databases are used in a wide variety of real-world applications, such as land surveying, urban planning, and environmental assessments, as well as geospatial Web services. As uses of spatial databases become more widespread, there is a growing need for good performance of spatial applications. In spatial workloads, queries tend to be computationally-intensive due to the complex processing of geometric relationships. Furthermore, a significant fraction of spatial Query Execution time is spent on CPU stalls due to memory accesses, caused by the ever-increasing processor-memory speed gap. With the advent of massively-parallel graphics-processing hardware (GPUs) and frameworks like CUDA, opportunities for speeding up spatial processing have emerged. In addition to massive parallelism, GPUs can also better hide the memory latency.We aim to speed up spatial Query Execution using CUDA and recent GPU cards. One of the main challenges in using GPUs is the transfer time from main memory to GPU memory. We implement a set of six typical spatial queries and achieve a baseline speedup (without the transfer cost) of 62-318x over the CPU counterparts. We show that the transfer cost can be amortized over the Execution of each individual Query. For simpler spatial queries, the transfer time is a significant fraction of the Query Execution time, but we still achieve a 6-10x speedup. For more complex spatial queries, the transfer time becomes negligible compared to the processing time, and we obtain a 62-240x speedup

Rik Van De Walle - One of the best experts on this subject based on the ideXlab platform.

  • Query Execution optimization for clients of triple pattern fragments
    European Semantic Web Conference, 2015
    Co-Authors: Joachim Van Herwegen, Ruben Verborgh, Erik Mannens, Rik Van De Walle
    Abstract:

    In order to reduce the server-side cost of publishing Queryable Linked Data, Triple Pattern Fragments tpf were introduced as ai¾?simple interface to rdf triples. They allow for sparql Query Execution at low server cost, by partially shifting the load from servers to clients. The previously proposed Query Execution algorithm uses more http requests than necessary, and only makes partial use of the available metadata. In this paper, we propose a new Query Execution algorithm for a client communicating with a tpf server. In contrast to ai¾?greedy solution, we maintain an overview of the entire Query to find the optimal steps for solving a given Query. We show multiple cases in which our algorithm reaches solutions with far fewer http requests, without significantly increasing the cost in other cases. This improves the efficiency of common sparql queries against tpf interfaces, augmenting their viability compared to the more powerful, but more costly, sparql interface.

  • ESWC - Query Execution Optimization for Clients of Triple Pattern Fragments
    The Semantic Web. Latest Advances and New Domains, 2015
    Co-Authors: Joachim Van Herwegen, Ruben Verborgh, Erik Mannens, Rik Van De Walle
    Abstract:

    In order to reduce the server-side cost of publishing Queryable Linked Data, Triple Pattern Fragments tpf were introduced as ai¾?simple interface to rdf triples. They allow for sparql Query Execution at low server cost, by partially shifting the load from servers to clients. The previously proposed Query Execution algorithm uses more http requests than necessary, and only makes partial use of the available metadata. In this paper, we propose a new Query Execution algorithm for a client communicating with a tpf server. In contrast to ai¾?greedy solution, we maintain an overview of the entire Query to find the optimal steps for solving a given Query. We show multiple cases in which our algorithm reaches solutions with far fewer http requests, without significantly increasing the cost in other cases. This improves the efficiency of common sparql queries against tpf interfaces, augmenting their viability compared to the more powerful, but more costly, sparql interface.