Join Operation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 10098 Experts worldwide ranked by ideXlab platform

Jian Chen - One of the best experts on this subject based on the ideXlab platform.

  • heads Join efficient earth mover s distance similarity Joins on hadoop
    IEEE Transactions on Parallel and Distributed Systems, 2016
    Co-Authors: Jin Huang, Rui Zhang, Rajkumar Buyya, Jian Chen
    Abstract:

    The Earth Mover's Distance (EMD) similarity Join has a number of important applications such as near duplicate image retrieval and distributed based pattern analysis. However, the computational cost of EMD is super cubic and consequently the EMD similarity Join Operation is prohibitive for datasets of even medium size. We propose to employ the Hadoop platform to speed up the Operation. Simply porting the state-of-the-art metric distance similarity Join algorithms to Hadoop results in inefficiency because they involve excessive distance computations and are vulnerable to skewed data distributions. We propose a novel framework, named Heads-Join , which transforms data into the space of EMD lower bounds and performs pruning and partitioning at a low cost because computing these EMD lower bounds has constant or linear complexity. We investigate both range and top- $k$ Joins, and design efficient algorithms on three popular Hadoop computation paradigms, i.e., MapReduce, Bulk Synchronous Parallel, and Spark. We conduct extensive experiments on both real and synthetic datasets. The results show that Heads-Join outperforms the state-of-the-art metric similarity Join technique, i.e., QuickJoin, by up to an order of magnitude and scales out well.

  • melody Join efficient earth mover s distance similarity Joins using mapreduce
    International Conference on Data Engineering, 2014
    Co-Authors: Jin Huang, Rui Zhang, Rajkumar Buyya, Jian Chen
    Abstract:

    The Earth Mover's Distance (EMD) similarity Join retrieves pairs of records with EMD below a given threshold. It has a number of important applications such as near duplicate image retrieval and pattern analysis in probabilistic datasets. However, the computational cost of EMD is super cubic to the number of bins in the histograms used to represent the data objects. Consequently, the EMD similarity Join Operation is prohibitive for large datasets. This is the first paper that specifically addresses the EMD similarity Join and we propose to use MapReduce to approach this problem. The MapReduce algorithms designed for generic metric distance similarity Joins are inefficient for the EMD similarity Join because they involve a large number of distance computations and have unbalanced workloads on reducers when dealing with skewed datasets. We propose a novel framework, named Melody-Join, which transforms data into the space of EMD lower bounds and performs pruning and partitioning at a low cost because computing these EMD lower bounds has a constant complexity. Furthermore, we address two key problems, the limited pruning power and the unbalanced workloads, by enhancing each phase in the Melody-Join framework. We conduct extensive experiments on real datasets. The results show that Melody-Join outperforms the state-of-the-art technique by an order of magnitude, scales up better on large datasets than the state-of-the-art technique, and scales out well on distributed machines.

Nigel Martin - One of the best experts on this subject based on the ideXlab platform.

  • dbj a dynamic balancing hash Join algorithm in multiprocessor database systems
    Extending Database Technology, 1994
    Co-Authors: X Zhao, Roger Johnson, Nigel Martin
    Abstract:

    The Dynamic Balancing Hash Join (DBJ), has been proposed to handle the problem of skewed data in the Join Operation in multiprocessor database systems. The objective of this new algorithm is to avoid the high cost of preprocessing inherent in existing algorithms. The new algorithm only redistributes a small portion of the partitioned data and, thereby achieves a balanced output with little extra cost. This is achieved dynamically, without knowledge of the input distribution, nor any co-ordinating processor. A performance analysis shows that the new algorithm performs better than existing balancing hash Join algorithms for a wide degree of skew.

Jin Huang - One of the best experts on this subject based on the ideXlab platform.

  • heads Join efficient earth mover s distance similarity Joins on hadoop
    IEEE Transactions on Parallel and Distributed Systems, 2016
    Co-Authors: Jin Huang, Rui Zhang, Rajkumar Buyya, Jian Chen
    Abstract:

    The Earth Mover's Distance (EMD) similarity Join has a number of important applications such as near duplicate image retrieval and distributed based pattern analysis. However, the computational cost of EMD is super cubic and consequently the EMD similarity Join Operation is prohibitive for datasets of even medium size. We propose to employ the Hadoop platform to speed up the Operation. Simply porting the state-of-the-art metric distance similarity Join algorithms to Hadoop results in inefficiency because they involve excessive distance computations and are vulnerable to skewed data distributions. We propose a novel framework, named Heads-Join , which transforms data into the space of EMD lower bounds and performs pruning and partitioning at a low cost because computing these EMD lower bounds has constant or linear complexity. We investigate both range and top- $k$ Joins, and design efficient algorithms on three popular Hadoop computation paradigms, i.e., MapReduce, Bulk Synchronous Parallel, and Spark. We conduct extensive experiments on both real and synthetic datasets. The results show that Heads-Join outperforms the state-of-the-art metric similarity Join technique, i.e., QuickJoin, by up to an order of magnitude and scales out well.

  • melody Join efficient earth mover s distance similarity Joins using mapreduce
    International Conference on Data Engineering, 2014
    Co-Authors: Jin Huang, Rui Zhang, Rajkumar Buyya, Jian Chen
    Abstract:

    The Earth Mover's Distance (EMD) similarity Join retrieves pairs of records with EMD below a given threshold. It has a number of important applications such as near duplicate image retrieval and pattern analysis in probabilistic datasets. However, the computational cost of EMD is super cubic to the number of bins in the histograms used to represent the data objects. Consequently, the EMD similarity Join Operation is prohibitive for large datasets. This is the first paper that specifically addresses the EMD similarity Join and we propose to use MapReduce to approach this problem. The MapReduce algorithms designed for generic metric distance similarity Joins are inefficient for the EMD similarity Join because they involve a large number of distance computations and have unbalanced workloads on reducers when dealing with skewed datasets. We propose a novel framework, named Melody-Join, which transforms data into the space of EMD lower bounds and performs pruning and partitioning at a low cost because computing these EMD lower bounds has a constant complexity. Furthermore, we address two key problems, the limited pruning power and the unbalanced workloads, by enhancing each phase in the Melody-Join framework. We conduct extensive experiments on real datasets. The results show that Melody-Join outperforms the state-of-the-art technique by an order of magnitude, scales up better on large datasets than the state-of-the-art technique, and scales out well on distributed machines.

Markus Puschel - One of the best experts on this subject based on the ideXlab platform.

  • a discrete signal processing framework for meet Join lattices with applications to hypergraphs and trees
    International Conference on Acoustics Speech and Signal Processing, 2019
    Co-Authors: Markus Puschel
    Abstract:

    We introduce a novel discrete signal processing framework, called discrete-lattice SP, for signals indexed by a finite lattice. A lattice is a partially ordered set that supports a meet (or Join) Operation that returns the greatest element below two given elements. Discrete-lattice SP chooses the meet as shift Operation and derives associated notion of (meet-invariant) convolution, Fourier transform, frequency response, and a convolution theorem. Examples of lattices include sets of sets that are closed under intersection and trees. Thus our framework is applicable to certain sparse set functions, signals on sparse hypergraphs, and signals on trees. Another view on discrete-lattice SP is as an SP framework for a certain class of directed graphs. However, it is fundamentally different from the prior graph SP as it is based on more than one basic shift and all shifts are always simultaneously diagonalizable.

X Zhao - One of the best experts on this subject based on the ideXlab platform.

  • dbj a dynamic balancing hash Join algorithm in multiprocessor database systems
    Extending Database Technology, 1994
    Co-Authors: X Zhao, Roger Johnson, Nigel Martin
    Abstract:

    The Dynamic Balancing Hash Join (DBJ), has been proposed to handle the problem of skewed data in the Join Operation in multiprocessor database systems. The objective of this new algorithm is to avoid the high cost of preprocessing inherent in existing algorithms. The new algorithm only redistributes a small portion of the partitioned data and, thereby achieves a balanced output with little extra cost. This is achieved dynamically, without knowledge of the input distribution, nor any co-ordinating processor. A performance analysis shows that the new algorithm performs better than existing balancing hash Join algorithms for a wide degree of skew.