Database Vendor

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 2205 Experts worldwide ranked by ideXlab platform

David A Patterso - One of the best experts on this subject based on the ideXlab platform.

  • predicting multiple metrics for queries better decisions enabled by machine learning
    International Conference on Data Engineering, 2009
    Co-Authors: Archana Ganapathi, Harumi Kuno, Umeshwa Dayal, Michael I Jorda, Jane L Wiene, David A Patterso
    Abstract:

    One of the most challenging aspects of managing a very large data warehouse is identifying how queries will behave before they start executing. Yet knowing their performance characteristics --- their runtimes and resource usage --- can solve two important problems. First, every Database Vendor struggles with managing unexpectedly long-running queries. When these long-running queries can be identified before they start, they can be rejected or scheduled when they will not cause extreme resource contention for the other queries in the system. Second, deciding whether a system can complete a given workload in a given time period (or a bigger system is necessary) depends on knowing the resource requirements of the queries in that workload. We have developed a system that uses machine learning to accurately predict the performance metrics of Database queries whose execution times range from milliseconds to hours. For training and testing our system, we used both real customer queries and queries generated from an extended set of TPC-DS templates. The extensions mimic queries that caused customer problems. We used these queries to compare how accurately different techniques predict metrics such as elapsed time, records used, disk I/Os, and message bytes. The most promising technique was not only the most accurate, but also predicted these metrics simultaneously and using only information available prior to query execution. We validated the accuracy of this machine learning technique on a number of HP Neoview configurations. We were able to predict individual query elapsed time within 20% of its actual time for 85% of the test queries. Most importantly, we were able to correctly identify both the short and long-running (up to two hour) queries to inform workload management and capacity planning.

Archana Ganapathi - One of the best experts on this subject based on the ideXlab platform.

  • predicting multiple metrics for queries better decisions enabled by machine learning
    International Conference on Data Engineering, 2009
    Co-Authors: Archana Ganapathi, Harumi Kuno, Umeshwa Dayal, Michael I Jorda, Jane L Wiene, David A Patterso
    Abstract:

    One of the most challenging aspects of managing a very large data warehouse is identifying how queries will behave before they start executing. Yet knowing their performance characteristics --- their runtimes and resource usage --- can solve two important problems. First, every Database Vendor struggles with managing unexpectedly long-running queries. When these long-running queries can be identified before they start, they can be rejected or scheduled when they will not cause extreme resource contention for the other queries in the system. Second, deciding whether a system can complete a given workload in a given time period (or a bigger system is necessary) depends on knowing the resource requirements of the queries in that workload. We have developed a system that uses machine learning to accurately predict the performance metrics of Database queries whose execution times range from milliseconds to hours. For training and testing our system, we used both real customer queries and queries generated from an extended set of TPC-DS templates. The extensions mimic queries that caused customer problems. We used these queries to compare how accurately different techniques predict metrics such as elapsed time, records used, disk I/Os, and message bytes. The most promising technique was not only the most accurate, but also predicted these metrics simultaneously and using only information available prior to query execution. We validated the accuracy of this machine learning technique on a number of HP Neoview configurations. We were able to predict individual query elapsed time within 20% of its actual time for 85% of the test queries. Most importantly, we were able to correctly identify both the short and long-running (up to two hour) queries to inform workload management and capacity planning.

Robert Marks - One of the best experts on this subject based on the ideXlab platform.

  • a metadata driven approach to performing multi Vendor Database schema upgrades
    Engineering of Computer-Based Systems, 2012
    Co-Authors: Robert Marks
    Abstract:

    This paper discusses problems associated with relational Database schema migrations which commonly occur with major upgrade releases of enterprise software. The most prevalent method of performing a schema migration is to execute SQL script files before or after the software upgrade. This approach performs poorly with large or complex Database migrations and also requires separate script files for each supported Database Vendor. A tool was developed for a complex Database upgrade of an enterprise product which uses XML in a metadata driven approach. The key advantages include the ability to abstract complexity, provide multi-Database Vendor support and make the Database migration more manageable between software releases.

Jane L Wiene - One of the best experts on this subject based on the ideXlab platform.

  • predicting multiple metrics for queries better decisions enabled by machine learning
    International Conference on Data Engineering, 2009
    Co-Authors: Archana Ganapathi, Harumi Kuno, Umeshwa Dayal, Michael I Jorda, Jane L Wiene, David A Patterso
    Abstract:

    One of the most challenging aspects of managing a very large data warehouse is identifying how queries will behave before they start executing. Yet knowing their performance characteristics --- their runtimes and resource usage --- can solve two important problems. First, every Database Vendor struggles with managing unexpectedly long-running queries. When these long-running queries can be identified before they start, they can be rejected or scheduled when they will not cause extreme resource contention for the other queries in the system. Second, deciding whether a system can complete a given workload in a given time period (or a bigger system is necessary) depends on knowing the resource requirements of the queries in that workload. We have developed a system that uses machine learning to accurately predict the performance metrics of Database queries whose execution times range from milliseconds to hours. For training and testing our system, we used both real customer queries and queries generated from an extended set of TPC-DS templates. The extensions mimic queries that caused customer problems. We used these queries to compare how accurately different techniques predict metrics such as elapsed time, records used, disk I/Os, and message bytes. The most promising technique was not only the most accurate, but also predicted these metrics simultaneously and using only information available prior to query execution. We validated the accuracy of this machine learning technique on a number of HP Neoview configurations. We were able to predict individual query elapsed time within 20% of its actual time for 85% of the test queries. Most importantly, we were able to correctly identify both the short and long-running (up to two hour) queries to inform workload management and capacity planning.

Michael I Jorda - One of the best experts on this subject based on the ideXlab platform.

  • predicting multiple metrics for queries better decisions enabled by machine learning
    International Conference on Data Engineering, 2009
    Co-Authors: Archana Ganapathi, Harumi Kuno, Umeshwa Dayal, Michael I Jorda, Jane L Wiene, David A Patterso
    Abstract:

    One of the most challenging aspects of managing a very large data warehouse is identifying how queries will behave before they start executing. Yet knowing their performance characteristics --- their runtimes and resource usage --- can solve two important problems. First, every Database Vendor struggles with managing unexpectedly long-running queries. When these long-running queries can be identified before they start, they can be rejected or scheduled when they will not cause extreme resource contention for the other queries in the system. Second, deciding whether a system can complete a given workload in a given time period (or a bigger system is necessary) depends on knowing the resource requirements of the queries in that workload. We have developed a system that uses machine learning to accurately predict the performance metrics of Database queries whose execution times range from milliseconds to hours. For training and testing our system, we used both real customer queries and queries generated from an extended set of TPC-DS templates. The extensions mimic queries that caused customer problems. We used these queries to compare how accurately different techniques predict metrics such as elapsed time, records used, disk I/Os, and message bytes. The most promising technique was not only the most accurate, but also predicted these metrics simultaneously and using only information available prior to query execution. We validated the accuracy of this machine learning technique on a number of HP Neoview configurations. We were able to predict individual query elapsed time within 20% of its actual time for 85% of the test queries. Most importantly, we were able to correctly identify both the short and long-running (up to two hour) queries to inform workload management and capacity planning.