Google Bigtable - Explore the Science & Experts

The Experts below are selected from a list of 60 Experts worldwide ranked by ideXlab platform

Yousefi Arman - One of the best experts on this subject based on the ideXlab platform.

Competitive Data-Structure Dynamization

2020

Co-Authors: Mathieu Claire, Young, Neal E., Rajaraman Rajmohan, Yousefi Arman

Abstract:

Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read rate. In practice, merge policies must not only handle batch insertions and varying read/write ratios, they can take advantage of such non-uniformity to reduce cost on a per-input basis. To model this, we initiate the study of data-structure dynamization through the lens of competitive analysis, via two new online set-cover problems. For each, the input is a sequence of disjoint sets of weighted items. The sets are revealed one at a time. The algorithm must respond to each with a set cover that covers all items revealed so far. It obtains the cover incrementally from the previous cover by adding one or more sets and optionally removing existing sets. For each new set the algorithm incurs build cost equal to the weight of the items in the set. In the first problem the objective is to minimize total build cost plus total query cost, where the algorithm incurs a query cost at each time $t$ equal to the current cover size. In the second problem, the objective is to minimize the build cost while keeping the query cost from exceeding $k$ (a given parameter) at any time. We give deterministic online algorithms for both variants, with competitive ratios of $\Theta(\log^* n)$ and $k$, respectively. The latter ratio is optimal for the second variant.Comment: SODA 202

15 days free trial to Access Article
Competitive Data-Structure Dynamization

eScholarship University of California, 2020

Co-Authors: Mathieu Claire, Young, Neal E., Rajaraman Rajmohan, Yousefi Arman

Abstract:

Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read rate. In practice, merge policies must not only handle batch insertions and varying read/write ratios, they can take advantage of such non-uniformity to reduce cost on a per-input basis. To model this, we initiate the study of data-structure dynamization through the lens of competitive analysis, via two new online set-cover problems. For each, the input is a sequence of disjoint sets of weighted items. The sets are revealed one at a time. The algorithm must respond to each with a set cover that covers all items revealed so far. It obtains the cover incrementally from the previous cover by adding one or more sets and optionally removing existing sets. For each new set the algorithm incurs build cost equal to the weight of the items in the set. In the first problem the objective is to minimize total build cost plus total query cost, where the algorithm incurs a query cost at each time $t$ equal to the current cover size. In the second problem, the objective is to minimize the build cost while keeping the query cost from exceeding $k$ (a given parameter) at any time. We give deterministic online algorithms for both variants, with competitive ratios of $\Theta(\log^* n)$ and $k$, respectively. The latter ratio is optimal for the second variant

15 days free trial to Access Article
Bigtable Merge Compaction

2015

Co-Authors: Mathieu Claire, Staelin Carl, Young, Neal E., Yousefi Arman

Abstract:

NoSQL databases are widely used for massive data storage and real-time web applications. Yet important aspects of these data structures are not well understood. For example, NoSQL databases write most of their data to a collection of files on disk, meanwhile periodically compacting subsets of these files. A compaction policy must choose which files to compact, and when to compact them, without knowing the future workload. Although these choices can affect computational efficiency by orders of magnitude, existing literature lacks tools for designing and analyzing online compaction policies --- policies are now chosen largely by trial and error. Here we introduce tools for the design and analysis of compaction policies for Google Bigtable, propose new policies, give average-case and worst-case competitive analyses, and present preliminary empirical benchmarks

15 days free trial to Access Article

Mathieu Claire - One of the best experts on this subject based on the ideXlab platform.

Competitive Data-Structure Dynamization

2020

Co-Authors: Mathieu Claire, Young, Neal E., Rajaraman Rajmohan, Yousefi Arman

Abstract:

Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read rate. In practice, merge policies must not only handle batch insertions and varying read/write ratios, they can take advantage of such non-uniformity to reduce cost on a per-input basis. To model this, we initiate the study of data-structure dynamization through the lens of competitive analysis, via two new online set-cover problems. For each, the input is a sequence of disjoint sets of weighted items. The sets are revealed one at a time. The algorithm must respond to each with a set cover that covers all items revealed so far. It obtains the cover incrementally from the previous cover by adding one or more sets and optionally removing existing sets. For each new set the algorithm incurs build cost equal to the weight of the items in the set. In the first problem the objective is to minimize total build cost plus total query cost, where the algorithm incurs a query cost at each time $t$ equal to the current cover size. In the second problem, the objective is to minimize the build cost while keeping the query cost from exceeding $k$ (a given parameter) at any time. We give deterministic online algorithms for both variants, with competitive ratios of $\Theta(\log^* n)$ and $k$, respectively. The latter ratio is optimal for the second variant.Comment: SODA 202

15 days free trial to Access Article
Competitive Data-Structure Dynamization

eScholarship University of California, 2020

Co-Authors: Mathieu Claire, Young, Neal E., Rajaraman Rajmohan, Yousefi Arman

Abstract:

Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read rate. In practice, merge policies must not only handle batch insertions and varying read/write ratios, they can take advantage of such non-uniformity to reduce cost on a per-input basis. To model this, we initiate the study of data-structure dynamization through the lens of competitive analysis, via two new online set-cover problems. For each, the input is a sequence of disjoint sets of weighted items. The sets are revealed one at a time. The algorithm must respond to each with a set cover that covers all items revealed so far. It obtains the cover incrementally from the previous cover by adding one or more sets and optionally removing existing sets. For each new set the algorithm incurs build cost equal to the weight of the items in the set. In the first problem the objective is to minimize total build cost plus total query cost, where the algorithm incurs a query cost at each time $t$ equal to the current cover size. In the second problem, the objective is to minimize the build cost while keeping the query cost from exceeding $k$ (a given parameter) at any time. We give deterministic online algorithms for both variants, with competitive ratios of $\Theta(\log^* n)$ and $k$, respectively. The latter ratio is optimal for the second variant

15 days free trial to Access Article
Bigtable Merge Compaction

2015

Co-Authors: Mathieu Claire, Staelin Carl, Young, Neal E., Yousefi Arman

Abstract:

NoSQL databases are widely used for massive data storage and real-time web applications. Yet important aspects of these data structures are not well understood. For example, NoSQL databases write most of their data to a collection of files on disk, meanwhile periodically compacting subsets of these files. A compaction policy must choose which files to compact, and when to compact them, without knowing the future workload. Although these choices can affect computational efficiency by orders of magnitude, existing literature lacks tools for designing and analyzing online compaction policies --- policies are now chosen largely by trial and error. Here we introduce tools for the design and analysis of compaction policies for Google Bigtable, propose new policies, give average-case and worst-case competitive analyses, and present preliminary empirical benchmarks

15 days free trial to Access Article

Young, Neal E. - One of the best experts on this subject based on the ideXlab platform.

Competitive Data-Structure Dynamization

2020

Co-Authors: Mathieu Claire, Young, Neal E., Rajaraman Rajmohan, Yousefi Arman

Abstract:

Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read rate. In practice, merge policies must not only handle batch insertions and varying read/write ratios, they can take advantage of such non-uniformity to reduce cost on a per-input basis. To model this, we initiate the study of data-structure dynamization through the lens of competitive analysis, via two new online set-cover problems. For each, the input is a sequence of disjoint sets of weighted items. The sets are revealed one at a time. The algorithm must respond to each with a set cover that covers all items revealed so far. It obtains the cover incrementally from the previous cover by adding one or more sets and optionally removing existing sets. For each new set the algorithm incurs build cost equal to the weight of the items in the set. In the first problem the objective is to minimize total build cost plus total query cost, where the algorithm incurs a query cost at each time $t$ equal to the current cover size. In the second problem, the objective is to minimize the build cost while keeping the query cost from exceeding $k$ (a given parameter) at any time. We give deterministic online algorithms for both variants, with competitive ratios of $\Theta(\log^* n)$ and $k$, respectively. The latter ratio is optimal for the second variant.Comment: SODA 202

15 days free trial to Access Article
Competitive Data-Structure Dynamization

eScholarship University of California, 2020

Co-Authors: Mathieu Claire, Young, Neal E., Rajaraman Rajmohan, Yousefi Arman

Abstract:

Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read rate. In practice, merge policies must not only handle batch insertions and varying read/write ratios, they can take advantage of such non-uniformity to reduce cost on a per-input basis. To model this, we initiate the study of data-structure dynamization through the lens of competitive analysis, via two new online set-cover problems. For each, the input is a sequence of disjoint sets of weighted items. The sets are revealed one at a time. The algorithm must respond to each with a set cover that covers all items revealed so far. It obtains the cover incrementally from the previous cover by adding one or more sets and optionally removing existing sets. For each new set the algorithm incurs build cost equal to the weight of the items in the set. In the first problem the objective is to minimize total build cost plus total query cost, where the algorithm incurs a query cost at each time $t$ equal to the current cover size. In the second problem, the objective is to minimize the build cost while keeping the query cost from exceeding $k$ (a given parameter) at any time. We give deterministic online algorithms for both variants, with competitive ratios of $\Theta(\log^* n)$ and $k$, respectively. The latter ratio is optimal for the second variant

15 days free trial to Access Article
Bigtable Merge Compaction

2015

Co-Authors: Mathieu Claire, Staelin Carl, Young, Neal E., Yousefi Arman

Abstract:

NoSQL databases are widely used for massive data storage and real-time web applications. Yet important aspects of these data structures are not well understood. For example, NoSQL databases write most of their data to a collection of files on disk, meanwhile periodically compacting subsets of these files. A compaction policy must choose which files to compact, and when to compact them, without knowing the future workload. Although these choices can affect computational efficiency by orders of magnitude, existing literature lacks tools for designing and analyzing online compaction policies --- policies are now chosen largely by trial and error. Here we introduce tools for the design and analysis of compaction policies for Google Bigtable, propose new policies, give average-case and worst-case competitive analyses, and present preliminary empirical benchmarks

15 days free trial to Access Article

Rajaraman Rajmohan - One of the best experts on this subject based on the ideXlab platform.

Competitive Data-Structure Dynamization

eScholarship University of California, 2020

Co-Authors: Mathieu Claire, Young, Neal E., Rajaraman Rajmohan, Yousefi Arman

Abstract:

Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read rate. In practice, merge policies must not only handle batch insertions and varying read/write ratios, they can take advantage of such non-uniformity to reduce cost on a per-input basis. To model this, we initiate the study of data-structure dynamization through the lens of competitive analysis, via two new online set-cover problems. For each, the input is a sequence of disjoint sets of weighted items. The sets are revealed one at a time. The algorithm must respond to each with a set cover that covers all items revealed so far. It obtains the cover incrementally from the previous cover by adding one or more sets and optionally removing existing sets. For each new set the algorithm incurs build cost equal to the weight of the items in the set. In the first problem the objective is to minimize total build cost plus total query cost, where the algorithm incurs a query cost at each time $t$ equal to the current cover size. In the second problem, the objective is to minimize the build cost while keeping the query cost from exceeding $k$ (a given parameter) at any time. We give deterministic online algorithms for both variants, with competitive ratios of $\Theta(\log^* n)$ and $k$, respectively. The latter ratio is optimal for the second variant

15 days free trial to Access Article
Competitive Data-Structure Dynamization

2020

Co-Authors: Mathieu Claire, Young, Neal E., Rajaraman Rajmohan, Yousefi Arman

Abstract:

Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read rate. In practice, merge policies must not only handle batch insertions and varying read/write ratios, they can take advantage of such non-uniformity to reduce cost on a per-input basis. To model this, we initiate the study of data-structure dynamization through the lens of competitive analysis, via two new online set-cover problems. For each, the input is a sequence of disjoint sets of weighted items. The sets are revealed one at a time. The algorithm must respond to each with a set cover that covers all items revealed so far. It obtains the cover incrementally from the previous cover by adding one or more sets and optionally removing existing sets. For each new set the algorithm incurs build cost equal to the weight of the items in the set. In the first problem the objective is to minimize total build cost plus total query cost, where the algorithm incurs a query cost at each time $t$ equal to the current cover size. In the second problem, the objective is to minimize the build cost while keeping the query cost from exceeding $k$ (a given parameter) at any time. We give deterministic online algorithms for both variants, with competitive ratios of $\Theta(\log^* n)$ and $k$, respectively. The latter ratio is optimal for the second variant.Comment: SODA 202

15 days free trial to Access Article

Sogand Shirinbad - One of the best experts on this subject based on the ideXlab platform.

An Energy-Aware Adaptation Model for Big Data Platforms

2016 IEEE International Conference on Autonomic Computing (ICAC), 2016

Co-Authors: Emiliano Casalicchio, Lars Lundberg, Sogand Shirinbad

Abstract:

Platforms for big data includes mechanisms and tools to model, organize, store and access big data (e.g. Apache Cassandra, Hbase, Amazon SimpleDB, Dynamo, Google Bigtable). The resource management for those platforms is a complex task and must account also for multi-tenancy and infrastructure scalability. Human assisted control of Big data platform is unrealistic and there is a growing demand for autonomic solutions. In this paper we propose a QoS and energy-aware adaptation model designed to cope with the real case of a Cassandra-as-a-Service provider.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Google Bigtable with ideXlab!

Yousefi Arman - One of the best experts on this subject based on the ideXlab platform.

Competitive Data-Structure Dynamization

Competitive Data-Structure Dynamization

Bigtable Merge Compaction

Mathieu Claire - One of the best experts on this subject based on the ideXlab platform.

Competitive Data-Structure Dynamization

Competitive Data-Structure Dynamization

Bigtable Merge Compaction

Young, Neal E. - One of the best experts on this subject based on the ideXlab platform.

Competitive Data-Structure Dynamization

Competitive Data-Structure Dynamization

Bigtable Merge Compaction

Rajaraman Rajmohan - One of the best experts on this subject based on the ideXlab platform.

Competitive Data-Structure Dynamization

Competitive Data-Structure Dynamization

Sogand Shirinbad - One of the best experts on this subject based on the ideXlab platform.

An Energy-Aware Adaptation Model for Big Data Platforms