Join Algorithm - Explore the Science & Experts

The Experts below are selected from a list of 12984 Experts worldwide ranked by ideXlab platform

Tok Wang Ling - One of the best experts on this subject based on the ideXlab platform.

DASFAA - TwigStackList ¬: a holistic twig Join Algorithm for twig query with not-predicates on XML data

Database Systems for Advanced Applications, 2006

Co-Authors: Tok Wang Ling

Abstract:

As business and enterprises generate and exchange XML data more often, there is an increasing need for searching and querying XML data. A lot of researches have been done to match XML twig queries. However, as far as we know, very little work has examined the efficient processing of XML twig queries with not-predicates. In this paper, we propose a novel holistic twig Join Algorithm, called TwigStackList ¬, which is designed for efficient matching an XML twig pattern with negation. We show that TwigStackList ¬ can identify a large query class to guarantee the I/O optimality. Finally, we run extensive experiments that validate our Algorithm and show the efficiency and effectiveness of TwigStackList ¬.

15 days free trial to Access Article
Twigstacklist¬ : A holistic twig Join Algorithm for twig query with not-predicates on XML data

Lecture Notes in Computer Science, 2006

Co-Authors: Tok Wang Ling

Abstract:

As business and enterprises generate and exchange XML data more often, there is an increasing need for searching and querying XML data. A lot of researches have been done to match XML twig queries. However, as far as we know, very little work has examined the efficient processing of XML twig queries with not-predicates. In this paper, we propose a novel holistic twig Join Algorithm, called TwigStackList¬, which is designed for efficient matching an XML twig pattern with negation. We show that TwigStackList¬ can identify a large query class to guarantee the I/O optimality. Finally, we run extensive experiments that validate our Algorithm and show the efficiency and effectiveness of TwigStackList¬.

15 days free trial to Access Article
pathstack a holistic path Join Algorithm for path query with not predicates on xml data

Database Systems for Advanced Applications, 2005

Co-Authors: Enhua Jiao, Tok Wang Ling, Cheeyong Chan

Abstract:

The evaluation of path queries forms the basis of complex XML query processing which has attracted a lot of research attention. However, none of these works have examined the processing of more complex queries that contain not-predicates. In this paper, we present the first study on evaluating path queries with not-predicates. We propose an efficient holistic path Join Algorithm, PathStack¬, which has the following advantages: (1) it requires only one scan of the relevant data to evaluate path queries with not-predicates; (2) it does not generate any intermediate results; and (3) its memory space requirement is bounded by the longest path in the input XML document. We also present an improved variant of PathStack¬ that further minimizes unnecessary computations.

15 days free trial to Access Article

Curt J Ellmann - One of the best experts on this subject based on the ideXlab platform.

a non blocking parallel spatial Join Algorithm

International Conference on Data Engineering, 2002

Co-Authors: Gang Luo, Jeffrey F Naughton, Curt J Ellmann

Abstract:

Interest in incremental and adaptive query processing has led to the investigation of equiJoin evaluation Algorithms that are non-blocking. This investigation has yielded a number of Algorithms, including the symmetric hash Join, the XJoin, the Ripple Join, and their variants. However, to our knowledge no one has proposed a nonblocking spatial Join Algorithm. In this paper, we propose a parallel non-blocking spatial Join Algorithm that uses duplicate avoidance rather than duplicate elimination. Results from a prototype implementation in a commercial parallel object-relational DBMS show that it generates answer tuples steadily even in the presence of memory overflow, and that its rate of producing answer tuples scales with the number of processors. Also, when allowed to run to completion, its performance is comparable with the state-of-the-art blocking parallel spatial Join Algorithm.

15 days free trial to Access Article
a scalable hash ripple Join Algorithm

International Conference on Management of Data, 2002

Co-Authors: Gang Luo, Curt J Ellmann, Peter J Haas, Jeffrey F Naughton

Abstract:

Recently, Haas and Hellerstein proposed the hash ripple Join Algorithm in the context of online aggregation. Although the Algorithm rapidly gives a good estimate for many Join-aggregate problem instances, the convergence can be slow if the number of tuples that satisfy the Join predicate is small or if there are many groups in the output. Furthermore, if memory overflows (for example, because the user allows the Algorithm to run to completion for an exact answer), the Algorithm degenerates to block ripple Join and performance suffers. In this paper, we build on the work of Haas and Hellerstein and propose a new Algorithm that (a) combines parallelism with sampling to speed convergence, and (b) maintains good performance in the presence of memory overflow. Results from a prototype implementation in a parallel DBMS show that its rate of convergence scales with the number of processors, and that when allowed to run to completion, even in the presence of memory overflow, it is competitive with the traditional parallel hybrid hash Join Algorithm.

15 days free trial to Access Article
SIGMOD Conference - A scalable hash ripple Join Algorithm

Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02, 2002

Co-Authors: Gang Luo, Curt J Ellmann, Peter J Haas, Jeffrey F Naughton

Abstract:

Recently, Haas and Hellerstein proposed the hash ripple Join Algorithm in the context of online aggregation. Although the Algorithm rapidly gives a good estimate for many Join-aggregate problem instances, the convergence can be slow if the number of tuples that satisfy the Join predicate is small or if there are many groups in the output. Furthermore, if memory overflows (for example, because the user allows the Algorithm to run to completion for an exact answer), the Algorithm degenerates to block ripple Join and performance suffers. In this paper, we build on the work of Haas and Hellerstein and propose a new Algorithm that (a) combines parallelism with sampling to speed convergence, and (b) maintains good performance in the presence of memory overflow. Results from a prototype implementation in a parallel DBMS show that its rate of convergence scales with the number of processors, and that when allowed to run to completion, even in the presence of memory overflow, it is competitive with the traditional parallel hybrid hash Join Algorithm.

15 days free trial to Access Article
ICDE - A non-blocking parallel spatial Join Algorithm

Proceedings 18th International Conference on Data Engineering, 1

Co-Authors: Gang Luo, Jeffrey F Naughton, Curt J Ellmann

Abstract:

Interest in incremental and adaptive query processing has led to the investigation of equiJoin evaluation Algorithms that are non-blocking. This investigation has yielded a number of Algorithms, including the symmetric hash Join, the XJoin, the Ripple Join, and their variants. However, to our knowledge no one has proposed a nonblocking spatial Join Algorithm. In this paper, we propose a parallel non-blocking spatial Join Algorithm that uses duplicate avoidance rather than duplicate elimination. Results from a prototype implementation in a commercial parallel object-relational DBMS show that it generates answer tuples steadily even in the presence of memory overflow, and that its rate of producing answer tuples scales with the number of processors. Also, when allowed to run to completion, its performance is comparable with the state-of-the-art blocking parallel spatial Join Algorithm.

15 days free trial to Access Article

Daniel Manuel Dias - One of the best experts on this subject based on the ideXlab platform.

a parallel hash Join Algorithm for managing data skew

IEEE Transactions on Parallel and Distributed Systems, 1993

Co-Authors: Joel L Wolf, John Turek, Daniel Manuel Dias

Abstract:

Presents a parallel hash Join Algorithm that is based on the concept of hierarchical hashing, to address the problem of data skew. The proposed Algorithm splits the usual hash phase into a hash phase and an explicit transfer phase, and adds an extra scheduling phase between these two. During the scheduling phase, a heuristic optimization Algorithm, using the output of the hash phase, attempts to balance the load across the multiple processors in the subsequent Join phase. The Algorithm naturally identifies the hash partitions with the largest skew values and splits them as necessary, assigning each of them to an optimal number of processors. Assuming for concreteness a Zipf-like distribution of the values in the Join column, a Join phase which is CPU-bound, and a shared nothing environment, the Algorithm is shown to achieve good Join phase load balancing, and to be robust relative to the degree of data skew and the total number of processors. The overall speedup due to this Algorithm is compared to some existing parallel hash Join methods. The proposed method does considerably better in high skew situations. >

15 days free trial to Access Article
a parallel sort merge Join Algorithm for managing data skew

IEEE Transactions on Parallel and Distributed Systems, 1993

Co-Authors: Joel L Wolf, Daniel Manuel Dias

Abstract:

A parallel sort-merge-Join Algorithm which uses a divide-and-conquer approach to address the data skew problem is proposed. The proposed Algorithm adds an extra, low-cost scheduling phase to the usual sort, transfer, and Join phases. During the scheduling phase, a parallelizable optimization Algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent Join phase. The Algorithm naturally identifies the largest skew elements, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution of data skew, the Algorithm is demonstrated to achieve very good load balancing for the Join phase, and is shown to be very robust relative, among other things, to the degree of data skew and the total number of processors. >

15 days free trial to Access Article

A Ohara - One of the best experts on this subject based on the ideXlab platform.

hash based symmetric data structure and Join Algorithm for olap applications

International Database Engineering and Applications Symposium, 1999

Co-Authors: Motomichi Toyama, A Ohara

Abstract:

The star schema is often used in dimensional approaches applied to OLAP applications. The fact table in the star schema typically contains a huge amount of data. When some of the dimension tables are also very large, it may take too much time and storage to Join the fact table with these dimension tables. The performance of the Join Algorithm becomes critical under such a condition. The fluent Join is a Join Algorithm that operates on relations organized as multidimensional linear hash files. Like a merge Join on relations which are already sorted on the Joining key, its execution reads each page in the operand relations no more than once and does not create intermediate result files. Unlike sorting, the multi-dimensional linear hash can cluster records in several keys symmetrically. In this paper, the concept of the fluent Join is applied to an OLAP system to cluster records in each table on the Joining keys. As a result, the Algorithm yields symmetric performances on Joins with different dimension tables.

15 days free trial to Access Article
IDEAS - Hash-based symmetric data structure and Join Algorithm for OLAP applications

Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265), 1999

Co-Authors: Motomichi Toyama, A Ohara

Abstract:

The star schema is often used in dimensional approaches applied to OLAP applications. The fact table in the star schema typically contains a huge amount of data. When some of the dimension tables are also very large, it may take too much time and storage to Join the fact table with these dimension tables. The performance of the Join Algorithm becomes critical under such a condition. The fluent Join is a Join Algorithm that operates on relations organized as multidimensional linear hash files. Like a merge Join on relations which are already sorted on the Joining key, its execution reads each page in the operand relations no more than once and does not create intermediate result files. Unlike sorting, the multi-dimensional linear hash can cluster records in several keys symmetrically. In this paper, the concept of the fluent Join is applied to an OLAP system to cluster records in each table on the Joining keys. As a result, the Algorithm yields symmetric performances on Joins with different dimension tables.

15 days free trial to Access Article

Joel L Wolf - One of the best experts on this subject based on the ideXlab platform.

a parallel hash Join Algorithm for managing data skew

IEEE Transactions on Parallel and Distributed Systems, 1993

Co-Authors: Joel L Wolf, John Turek, Daniel Manuel Dias

Abstract:

Presents a parallel hash Join Algorithm that is based on the concept of hierarchical hashing, to address the problem of data skew. The proposed Algorithm splits the usual hash phase into a hash phase and an explicit transfer phase, and adds an extra scheduling phase between these two. During the scheduling phase, a heuristic optimization Algorithm, using the output of the hash phase, attempts to balance the load across the multiple processors in the subsequent Join phase. The Algorithm naturally identifies the hash partitions with the largest skew values and splits them as necessary, assigning each of them to an optimal number of processors. Assuming for concreteness a Zipf-like distribution of the values in the Join column, a Join phase which is CPU-bound, and a shared nothing environment, the Algorithm is shown to achieve good Join phase load balancing, and to be robust relative to the degree of data skew and the total number of processors. The overall speedup due to this Algorithm is compared to some existing parallel hash Join methods. The proposed method does considerably better in high skew situations. >

15 days free trial to Access Article
a parallel sort merge Join Algorithm for managing data skew

IEEE Transactions on Parallel and Distributed Systems, 1993

Co-Authors: Joel L Wolf, Daniel Manuel Dias

Abstract:

A parallel sort-merge-Join Algorithm which uses a divide-and-conquer approach to address the data skew problem is proposed. The proposed Algorithm adds an extra, low-cost scheduling phase to the usual sort, transfer, and Join phases. During the scheduling phase, a parallelizable optimization Algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent Join phase. The Algorithm naturally identifies the largest skew elements, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution of data skew, the Algorithm is demonstrated to achieve very good load balancing for the Join phase, and is shown to be very robust relative, among other things, to the degree of data skew and the total number of processors. >

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Join Algorithm with ideXlab!

Tok Wang Ling - One of the best experts on this subject based on the ideXlab platform.

DASFAA - TwigStackList ¬: a holistic twig Join Algorithm for twig query with not-predicates on XML data

Twigstacklist¬ : A holistic twig Join Algorithm for twig query with not-predicates on XML data

pathstack a holistic path Join Algorithm for path query with not predicates on xml data

Curt J Ellmann - One of the best experts on this subject based on the ideXlab platform.

a non blocking parallel spatial Join Algorithm

a scalable hash ripple Join Algorithm

SIGMOD Conference - A scalable hash ripple Join Algorithm

ICDE - A non-blocking parallel spatial Join Algorithm

Daniel Manuel Dias - One of the best experts on this subject based on the ideXlab platform.

a parallel hash Join Algorithm for managing data skew

a parallel sort merge Join Algorithm for managing data skew

A Ohara - One of the best experts on this subject based on the ideXlab platform.

hash based symmetric data structure and Join Algorithm for olap applications

IDEAS - Hash-based symmetric data structure and Join Algorithm for OLAP applications

Joel L Wolf - One of the best experts on this subject based on the ideXlab platform.

a parallel hash Join Algorithm for managing data skew

a parallel sort merge Join Algorithm for managing data skew