Join Algorithm

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 12984 Experts worldwide ranked by ideXlab platform

Tok Wang Ling - One of the best experts on this subject based on the ideXlab platform.

  • DASFAA - TwigStackList ¬: a holistic twig Join Algorithm for twig query with not-predicates on XML data
    Database Systems for Advanced Applications, 2006
    Co-Authors: Tok Wang Ling
    Abstract:

    As business and enterprises generate and exchange XML data more often, there is an increasing need for searching and querying XML data. A lot of researches have been done to match XML twig queries. However, as far as we know, very little work has examined the efficient processing of XML twig queries with not-predicates. In this paper, we propose a novel holistic twig Join Algorithm, called TwigStackList ¬, which is designed for efficient matching an XML twig pattern with negation. We show that TwigStackList ¬ can identify a large query class to guarantee the I/O optimality. Finally, we run extensive experiments that validate our Algorithm and show the efficiency and effectiveness of TwigStackList ¬.

  • Twigstacklist¬ : A holistic twig Join Algorithm for twig query with not-predicates on XML data
    Lecture Notes in Computer Science, 2006
    Co-Authors: Tok Wang Ling
    Abstract:

    As business and enterprises generate and exchange XML data more often, there is an increasing need for searching and querying XML data. A lot of researches have been done to match XML twig queries. However, as far as we know, very little work has examined the efficient processing of XML twig queries with not-predicates. In this paper, we propose a novel holistic twig Join Algorithm, called TwigStackList¬, which is designed for efficient matching an XML twig pattern with negation. We show that TwigStackList¬ can identify a large query class to guarantee the I/O optimality. Finally, we run extensive experiments that validate our Algorithm and show the efficiency and effectiveness of TwigStackList¬.

  • pathstack a holistic path Join Algorithm for path query with not predicates on xml data
    Database Systems for Advanced Applications, 2005
    Co-Authors: Enhua Jiao, Tok Wang Ling, Cheeyong Chan
    Abstract:

    The evaluation of path queries forms the basis of complex XML query processing which has attracted a lot of research attention. However, none of these works have examined the processing of more complex queries that contain not-predicates. In this paper, we present the first study on evaluating path queries with not-predicates. We propose an efficient holistic path Join Algorithm, PathStack¬, which has the following advantages: (1) it requires only one scan of the relevant data to evaluate path queries with not-predicates; (2) it does not generate any intermediate results; and (3) its memory space requirement is bounded by the longest path in the input XML document. We also present an improved variant of PathStack¬ that further minimizes unnecessary computations.

Curt J Ellmann - One of the best experts on this subject based on the ideXlab platform.

  • a non blocking parallel spatial Join Algorithm
    International Conference on Data Engineering, 2002
    Co-Authors: Gang Luo, Jeffrey F Naughton, Curt J Ellmann
    Abstract:

    Interest in incremental and adaptive query processing has led to the investigation of equiJoin evaluation Algorithms that are non-blocking. This investigation has yielded a number of Algorithms, including the symmetric hash Join, the XJoin, the Ripple Join, and their variants. However, to our knowledge no one has proposed a nonblocking spatial Join Algorithm. In this paper, we propose a parallel non-blocking spatial Join Algorithm that uses duplicate avoidance rather than duplicate elimination. Results from a prototype implementation in a commercial parallel object-relational DBMS show that it generates answer tuples steadily even in the presence of memory overflow, and that its rate of producing answer tuples scales with the number of processors. Also, when allowed to run to completion, its performance is comparable with the state-of-the-art blocking parallel spatial Join Algorithm.

  • a scalable hash ripple Join Algorithm
    International Conference on Management of Data, 2002
    Co-Authors: Gang Luo, Curt J Ellmann, Peter J Haas, Jeffrey F Naughton
    Abstract:

    Recently, Haas and Hellerstein proposed the hash ripple Join Algorithm in the context of online aggregation. Although the Algorithm rapidly gives a good estimate for many Join-aggregate problem instances, the convergence can be slow if the number of tuples that satisfy the Join predicate is small or if there are many groups in the output. Furthermore, if memory overflows (for example, because the user allows the Algorithm to run to completion for an exact answer), the Algorithm degenerates to block ripple Join and performance suffers. In this paper, we build on the work of Haas and Hellerstein and propose a new Algorithm that (a) combines parallelism with sampling to speed convergence, and (b) maintains good performance in the presence of memory overflow. Results from a prototype implementation in a parallel DBMS show that its rate of convergence scales with the number of processors, and that when allowed to run to completion, even in the presence of memory overflow, it is competitive with the traditional parallel hybrid hash Join Algorithm.

  • SIGMOD Conference - A scalable hash ripple Join Algorithm
    Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02, 2002
    Co-Authors: Gang Luo, Curt J Ellmann, Peter J Haas, Jeffrey F Naughton
    Abstract:

    Recently, Haas and Hellerstein proposed the hash ripple Join Algorithm in the context of online aggregation. Although the Algorithm rapidly gives a good estimate for many Join-aggregate problem instances, the convergence can be slow if the number of tuples that satisfy the Join predicate is small or if there are many groups in the output. Furthermore, if memory overflows (for example, because the user allows the Algorithm to run to completion for an exact answer), the Algorithm degenerates to block ripple Join and performance suffers. In this paper, we build on the work of Haas and Hellerstein and propose a new Algorithm that (a) combines parallelism with sampling to speed convergence, and (b) maintains good performance in the presence of memory overflow. Results from a prototype implementation in a parallel DBMS show that its rate of convergence scales with the number of processors, and that when allowed to run to completion, even in the presence of memory overflow, it is competitive with the traditional parallel hybrid hash Join Algorithm.

  • ICDE - A non-blocking parallel spatial Join Algorithm
    Proceedings 18th International Conference on Data Engineering, 1
    Co-Authors: Gang Luo, Jeffrey F Naughton, Curt J Ellmann
    Abstract:

    Interest in incremental and adaptive query processing has led to the investigation of equiJoin evaluation Algorithms that are non-blocking. This investigation has yielded a number of Algorithms, including the symmetric hash Join, the XJoin, the Ripple Join, and their variants. However, to our knowledge no one has proposed a nonblocking spatial Join Algorithm. In this paper, we propose a parallel non-blocking spatial Join Algorithm that uses duplicate avoidance rather than duplicate elimination. Results from a prototype implementation in a commercial parallel object-relational DBMS show that it generates answer tuples steadily even in the presence of memory overflow, and that its rate of producing answer tuples scales with the number of processors. Also, when allowed to run to completion, its performance is comparable with the state-of-the-art blocking parallel spatial Join Algorithm.

Daniel Manuel Dias - One of the best experts on this subject based on the ideXlab platform.

  • a parallel hash Join Algorithm for managing data skew
    IEEE Transactions on Parallel and Distributed Systems, 1993
    Co-Authors: Joel L Wolf, John Turek, Daniel Manuel Dias
    Abstract:

    Presents a parallel hash Join Algorithm that is based on the concept of hierarchical hashing, to address the problem of data skew. The proposed Algorithm splits the usual hash phase into a hash phase and an explicit transfer phase, and adds an extra scheduling phase between these two. During the scheduling phase, a heuristic optimization Algorithm, using the output of the hash phase, attempts to balance the load across the multiple processors in the subsequent Join phase. The Algorithm naturally identifies the hash partitions with the largest skew values and splits them as necessary, assigning each of them to an optimal number of processors. Assuming for concreteness a Zipf-like distribution of the values in the Join column, a Join phase which is CPU-bound, and a shared nothing environment, the Algorithm is shown to achieve good Join phase load balancing, and to be robust relative to the degree of data skew and the total number of processors. The overall speedup due to this Algorithm is compared to some existing parallel hash Join methods. The proposed method does considerably better in high skew situations. >

  • a parallel sort merge Join Algorithm for managing data skew
    IEEE Transactions on Parallel and Distributed Systems, 1993
    Co-Authors: Joel L Wolf, Daniel Manuel Dias
    Abstract:

    A parallel sort-merge-Join Algorithm which uses a divide-and-conquer approach to address the data skew problem is proposed. The proposed Algorithm adds an extra, low-cost scheduling phase to the usual sort, transfer, and Join phases. During the scheduling phase, a parallelizable optimization Algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent Join phase. The Algorithm naturally identifies the largest skew elements, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution of data skew, the Algorithm is demonstrated to achieve very good load balancing for the Join phase, and is shown to be very robust relative, among other things, to the degree of data skew and the total number of processors. >

A Ohara - One of the best experts on this subject based on the ideXlab platform.

  • hash based symmetric data structure and Join Algorithm for olap applications
    International Database Engineering and Applications Symposium, 1999
    Co-Authors: Motomichi Toyama, A Ohara
    Abstract:

    The star schema is often used in dimensional approaches applied to OLAP applications. The fact table in the star schema typically contains a huge amount of data. When some of the dimension tables are also very large, it may take too much time and storage to Join the fact table with these dimension tables. The performance of the Join Algorithm becomes critical under such a condition. The fluent Join is a Join Algorithm that operates on relations organized as multidimensional linear hash files. Like a merge Join on relations which are already sorted on the Joining key, its execution reads each page in the operand relations no more than once and does not create intermediate result files. Unlike sorting, the multi-dimensional linear hash can cluster records in several keys symmetrically. In this paper, the concept of the fluent Join is applied to an OLAP system to cluster records in each table on the Joining keys. As a result, the Algorithm yields symmetric performances on Joins with different dimension tables.

  • IDEAS - Hash-based symmetric data structure and Join Algorithm for OLAP applications
    Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265), 1999
    Co-Authors: Motomichi Toyama, A Ohara
    Abstract:

    The star schema is often used in dimensional approaches applied to OLAP applications. The fact table in the star schema typically contains a huge amount of data. When some of the dimension tables are also very large, it may take too much time and storage to Join the fact table with these dimension tables. The performance of the Join Algorithm becomes critical under such a condition. The fluent Join is a Join Algorithm that operates on relations organized as multidimensional linear hash files. Like a merge Join on relations which are already sorted on the Joining key, its execution reads each page in the operand relations no more than once and does not create intermediate result files. Unlike sorting, the multi-dimensional linear hash can cluster records in several keys symmetrically. In this paper, the concept of the fluent Join is applied to an OLAP system to cluster records in each table on the Joining keys. As a result, the Algorithm yields symmetric performances on Joins with different dimension tables.

Joel L Wolf - One of the best experts on this subject based on the ideXlab platform.

  • a parallel hash Join Algorithm for managing data skew
    IEEE Transactions on Parallel and Distributed Systems, 1993
    Co-Authors: Joel L Wolf, John Turek, Daniel Manuel Dias
    Abstract:

    Presents a parallel hash Join Algorithm that is based on the concept of hierarchical hashing, to address the problem of data skew. The proposed Algorithm splits the usual hash phase into a hash phase and an explicit transfer phase, and adds an extra scheduling phase between these two. During the scheduling phase, a heuristic optimization Algorithm, using the output of the hash phase, attempts to balance the load across the multiple processors in the subsequent Join phase. The Algorithm naturally identifies the hash partitions with the largest skew values and splits them as necessary, assigning each of them to an optimal number of processors. Assuming for concreteness a Zipf-like distribution of the values in the Join column, a Join phase which is CPU-bound, and a shared nothing environment, the Algorithm is shown to achieve good Join phase load balancing, and to be robust relative to the degree of data skew and the total number of processors. The overall speedup due to this Algorithm is compared to some existing parallel hash Join methods. The proposed method does considerably better in high skew situations. >

  • a parallel sort merge Join Algorithm for managing data skew
    IEEE Transactions on Parallel and Distributed Systems, 1993
    Co-Authors: Joel L Wolf, Daniel Manuel Dias
    Abstract:

    A parallel sort-merge-Join Algorithm which uses a divide-and-conquer approach to address the data skew problem is proposed. The proposed Algorithm adds an extra, low-cost scheduling phase to the usual sort, transfer, and Join phases. During the scheduling phase, a parallelizable optimization Algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent Join phase. The Algorithm naturally identifies the largest skew elements, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution of data skew, the Algorithm is demonstrated to achieve very good load balancing for the Join phase, and is shown to be very robust relative, among other things, to the degree of data skew and the total number of processors. >