Distributed Databases

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 37377 Experts worldwide ranked by ideXlab platform

Rajkumar Buyya - One of the best experts on this subject based on the ideXlab platform.

  • workload aware incremental repartitioning of shared nothing Distributed Databases for scalable oltp applications
    Future Generation Computer Systems, 2016
    Co-Authors: Joarder Mohammad Mustafa Kamal, Manzur Murshed, Rajkumar Buyya
    Abstract:

    On-line Transaction Processing (OLTP) applications often rely on shared-nothing Distributed Databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-Distributed servers can adversely impact the performance of such Databases, especially when the transactions are short-lived and these require immediate responses. The k -way min-cut graph clustering based database repartitioning algorithms can be used to reduce the number of DTs with acceptable level of load balancing. Web applications, where DT profile changes over time due to dynamically varying workload patterns, frequent database repartitioning is needed to keep up with the change. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k -way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the non-DTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server data-migration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster-the many-to-one version minimising data migrations alone and the one-to-one version reducing data migration without affecting load balancing. A Distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hypergraph, and compressed hypergraph representations used in the literature. To compare the performance of any incremental repartitioning framework without any bias of the external min-cut algorithm due to graph size variations, a transaction generation model is developed that can maintain a target number of unique transactions in any arbitrary observation window, irrespective of new transaction arrival rate. The overall impact of DTs at any instance is estimated from the exponential moving average of the recurrence period of unique transactions to avoid transient fluctuations. The effectiveness and adaptability of the proposed incremental repartitioning framework for transactional workloads ?have been established with extensive simulations on both range partitioned and consistent hash partitioned Databases. We propose incremental repartitioning of Distributed OLTP Databases for high-scalability.We model two incremental repartitioning algorithm and lookup mechanism.We develop a unique transaction generation model for simulation.We derive novel impact metrics for Distributed transactions.Simulation results indicate adaptability of the methods scalable OLTP applications.

  • Workload-Aware Incremental Repartitioning of Shared-Nothing Distributed Databases for Scalable Cloud Applications
    2015
    Co-Authors: Joarder Mohammad, Mustafa Kamal, Manzur Murshed, Rajkumar Buyya
    Abstract:

    Abstract—Cloud applications often rely on shared-nothing Distributed Databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-Distributed servers can adversely impact the performance of such Databases, especially when the transactions are short-lived in and require immediate response. The k-way min-cut graph clustering algorithm has been found effective to reduce the number of DTs with acceptable level of load balancing. Benefits of such a static partitioning scheme, however, is short-lived in Cloud applications with dynamically varying workload patterns where DT profile changes over time. This pa-per addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k-way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DT

  • workload aware incremental repartitioning of shared nothing Distributed Databases for scalable cloud applications
    IEEE ACM International Conference Utility and Cloud Computing, 2014
    Co-Authors: Joarder Mohammad Mustafa Kamal, Manzur Murshed, Rajkumar Buyya
    Abstract:

    Cloud applications often rely on shared-nothing Distributed Databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-Distributed servers can adversely impact the performance of such Databases, especially when the transactions are short-lived in and require immediate response. The k-way min-cut graph clustering algorithm has been found effective to reduce the number of DTs with acceptable level of load balancing. Benefits of such a static partitioning scheme, however, is short-lived in Cloud applications with dynamically varying workload patterns where DT profile changes over time. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k-way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the non-DTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server data-migration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster -- the many-to-one version minimising data migrations alone and the one-to-one version reducing data migration without affecting load balancing. A Distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hyper graph, and compressed hyper graph representations used in the literature. Simulation results convincingly support incremental repartitioning against static partitioning.

Joarder Mohammad Mustafa Kamal - One of the best experts on this subject based on the ideXlab platform.

  • workload aware incremental repartitioning of shared nothing Distributed Databases for scalable oltp applications
    Future Generation Computer Systems, 2016
    Co-Authors: Joarder Mohammad Mustafa Kamal, Manzur Murshed, Rajkumar Buyya
    Abstract:

    On-line Transaction Processing (OLTP) applications often rely on shared-nothing Distributed Databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-Distributed servers can adversely impact the performance of such Databases, especially when the transactions are short-lived and these require immediate responses. The k -way min-cut graph clustering based database repartitioning algorithms can be used to reduce the number of DTs with acceptable level of load balancing. Web applications, where DT profile changes over time due to dynamically varying workload patterns, frequent database repartitioning is needed to keep up with the change. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k -way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the non-DTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server data-migration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster-the many-to-one version minimising data migrations alone and the one-to-one version reducing data migration without affecting load balancing. A Distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hypergraph, and compressed hypergraph representations used in the literature. To compare the performance of any incremental repartitioning framework without any bias of the external min-cut algorithm due to graph size variations, a transaction generation model is developed that can maintain a target number of unique transactions in any arbitrary observation window, irrespective of new transaction arrival rate. The overall impact of DTs at any instance is estimated from the exponential moving average of the recurrence period of unique transactions to avoid transient fluctuations. The effectiveness and adaptability of the proposed incremental repartitioning framework for transactional workloads ?have been established with extensive simulations on both range partitioned and consistent hash partitioned Databases. We propose incremental repartitioning of Distributed OLTP Databases for high-scalability.We model two incremental repartitioning algorithm and lookup mechanism.We develop a unique transaction generation model for simulation.We derive novel impact metrics for Distributed transactions.Simulation results indicate adaptability of the methods scalable OLTP applications.

  • workload aware incremental repartitioning of shared nothing Distributed Databases for scalable cloud applications
    IEEE ACM International Conference Utility and Cloud Computing, 2014
    Co-Authors: Joarder Mohammad Mustafa Kamal, Manzur Murshed, Rajkumar Buyya
    Abstract:

    Cloud applications often rely on shared-nothing Distributed Databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-Distributed servers can adversely impact the performance of such Databases, especially when the transactions are short-lived in and require immediate response. The k-way min-cut graph clustering algorithm has been found effective to reduce the number of DTs with acceptable level of load balancing. Benefits of such a static partitioning scheme, however, is short-lived in Cloud applications with dynamically varying workload patterns where DT profile changes over time. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k-way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the non-DTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server data-migration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster -- the many-to-one version minimising data migrations alone and the one-to-one version reducing data migration without affecting load balancing. A Distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hyper graph, and compressed hyper graph representations used in the literature. Simulation results convincingly support incremental repartitioning against static partitioning.

Manzur Murshed - One of the best experts on this subject based on the ideXlab platform.

  • workload aware incremental repartitioning of shared nothing Distributed Databases for scalable oltp applications
    Future Generation Computer Systems, 2016
    Co-Authors: Joarder Mohammad Mustafa Kamal, Manzur Murshed, Rajkumar Buyya
    Abstract:

    On-line Transaction Processing (OLTP) applications often rely on shared-nothing Distributed Databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-Distributed servers can adversely impact the performance of such Databases, especially when the transactions are short-lived and these require immediate responses. The k -way min-cut graph clustering based database repartitioning algorithms can be used to reduce the number of DTs with acceptable level of load balancing. Web applications, where DT profile changes over time due to dynamically varying workload patterns, frequent database repartitioning is needed to keep up with the change. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k -way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the non-DTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server data-migration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster-the many-to-one version minimising data migrations alone and the one-to-one version reducing data migration without affecting load balancing. A Distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hypergraph, and compressed hypergraph representations used in the literature. To compare the performance of any incremental repartitioning framework without any bias of the external min-cut algorithm due to graph size variations, a transaction generation model is developed that can maintain a target number of unique transactions in any arbitrary observation window, irrespective of new transaction arrival rate. The overall impact of DTs at any instance is estimated from the exponential moving average of the recurrence period of unique transactions to avoid transient fluctuations. The effectiveness and adaptability of the proposed incremental repartitioning framework for transactional workloads ?have been established with extensive simulations on both range partitioned and consistent hash partitioned Databases. We propose incremental repartitioning of Distributed OLTP Databases for high-scalability.We model two incremental repartitioning algorithm and lookup mechanism.We develop a unique transaction generation model for simulation.We derive novel impact metrics for Distributed transactions.Simulation results indicate adaptability of the methods scalable OLTP applications.

  • Workload-Aware Incremental Repartitioning of Shared-Nothing Distributed Databases for Scalable Cloud Applications
    2015
    Co-Authors: Joarder Mohammad, Mustafa Kamal, Manzur Murshed, Rajkumar Buyya
    Abstract:

    Abstract—Cloud applications often rely on shared-nothing Distributed Databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-Distributed servers can adversely impact the performance of such Databases, especially when the transactions are short-lived in and require immediate response. The k-way min-cut graph clustering algorithm has been found effective to reduce the number of DTs with acceptable level of load balancing. Benefits of such a static partitioning scheme, however, is short-lived in Cloud applications with dynamically varying workload patterns where DT profile changes over time. This pa-per addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k-way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DT

  • workload aware incremental repartitioning of shared nothing Distributed Databases for scalable cloud applications
    IEEE ACM International Conference Utility and Cloud Computing, 2014
    Co-Authors: Joarder Mohammad Mustafa Kamal, Manzur Murshed, Rajkumar Buyya
    Abstract:

    Cloud applications often rely on shared-nothing Distributed Databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-Distributed servers can adversely impact the performance of such Databases, especially when the transactions are short-lived in and require immediate response. The k-way min-cut graph clustering algorithm has been found effective to reduce the number of DTs with acceptable level of load balancing. Benefits of such a static partitioning scheme, however, is short-lived in Cloud applications with dynamically varying workload patterns where DT profile changes over time. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k-way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the non-DTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server data-migration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster -- the many-to-one version minimising data migrations alone and the one-to-one version reducing data migration without affecting load balancing. A Distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hyper graph, and compressed hyper graph representations used in the literature. Simulation results convincingly support incremental repartitioning against static partitioning.

Takashi Kuremoto - One of the best experts on this subject based on the ideXlab platform.

  • combination of genetic network programming and knapsack problem to support record clustering on Distributed Databases
    Expert Systems With Applications, 2016
    Co-Authors: Wirarama Wedashwara, Shingo Mabu, Masanao Obayashi, Takashi Kuremoto
    Abstract:

    A decision support algorithm for record clustering in Databases is proposed.Capacity limitation problem is introduced to make a general clustering application.Rule extraction from datasets is realized by the proposed evolutionary algorithm.Rule clustering considering capacity limitation is solved by knapsack problem.The simulations of record clustering show some advantages of the proposed method. This research involves implementation of genetic network programming (GNP) and standard dynamic programming to solve the knapsack problem (KP) as a decision support system for record clustering in Distributed Databases. Fragment allocation with storage capacity limitation problem is a background of the proposed method. The problem of storage capacity is to distribute sets of fragments into several sites (clusters). Total amount of fragments in each site must not exceed the capacity of site, while the distribution process must keep the relation (similarity) between fragments within each site. The objective is to distribute big data to certain sites with the limited amount of capacities by considering the similarity of Distributed data in each site. To solve this problem, GNP is used to extract rules from big data by considering characteristics (value ranges) of each attribute in a dataset. The proposed method also provides partial random rule extraction method in GNP to discover frequent patterns in a database for improving the clustering algorithm, especially for large data problems. The concept of KP is applied to the storage capacity problem and standard dynamic programming is used to distribute rules to each site by considering similarity (value) and data amount (weight) related to each rule to match the site capacities. From the simulation results, it is clarified that the proposed method shows some advantages over the conventional clustering algorithms, therefore, the proposed method provides a new clustering method with an additional storage capacity problem.

  • implementation of genetic network programming and knapsack problem for record clustering on Distributed database
    Society of Instrument and Control Engineers of Japan, 2014
    Co-Authors: Wirarama Wedashwara, Shingo Mabu, Masanao Obayashi, Takashi Kuremoto
    Abstract:

    This research involves implementation of genetic network programming (GNP) and knapsack problem (KP) to solve record clustering on Distributed Databases. The objective is to distribute big data to certain sites with the limited amount of capacities by considering the similarity of Distributed data in each site. GNP is used to extract rules from big data by considering characteristics (value ranges) of each attribute in a dataset. KP is used to distribute rules to each site by considering similarity (value) and data amount (weight) related to each rule to match the site capacities.

Jeff Scott - One of the best experts on this subject based on the ideXlab platform.

  • a large scale digital library system to integrate heterogeneous data of Distributed Databases
    European Conference on Parallel Processing, 2004
    Co-Authors: Mariella Di Giacomo, Mark L B Martinez, Jeff Scott
    Abstract:

    The Web has become the primary means for information dissemination of all kinds; our interest is in dissemination of scientific information from on-line digital libraries. We have designed a Web application, called SearchPlus, based on a Distributed, scalable, fault-tolerant, and secure architecture, to allow access to tens of millions of scientific bibliographic records and their citations, integrating information from multiple heterogeneous data sources, and making this information available for querying and analysis. A full-scale test-bed environment has been developed to assess hardware and software configuration and performance. This paper gives the motivations for building such a system, describes the architecture of our Distributed database system, and highlights performance analyses and subsequent improvements.