Sorting Algorithm

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 12921 Experts worldwide ranked by ideXlab platform

Paolo Ienne - One of the best experts on this subject based on the ideXlab platform.

  • High performance comparison-based Sorting Algorithm on many-core GPUs
    2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010
    Co-Authors: Xiaochun Ye, Nan Yuan, Paolo Ienne
    Abstract:

    Sorting is a kernel Algorithm for a wide range of applications. In this paper, we present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.

  • high performance comparison based Sorting Algorithm on many core gpus
    International Parallel and Distributed Processing Symposium, 2010
    Co-Authors: Xiaochun Ye, Nan Yuan, Paolo Ienne
    Abstract:

    Sorting is a kernel Algorithm for a wide range of applications. We present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.

Xiaochun Ye - One of the best experts on this subject based on the ideXlab platform.

  • High performance comparison-based Sorting Algorithm on many-core GPUs
    2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010
    Co-Authors: Xiaochun Ye, Nan Yuan, Paolo Ienne
    Abstract:

    Sorting is a kernel Algorithm for a wide range of applications. In this paper, we present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.

  • high performance comparison based Sorting Algorithm on many core gpus
    International Parallel and Distributed Processing Symposium, 2010
    Co-Authors: Xiaochun Ye, Nan Yuan, Paolo Ienne
    Abstract:

    Sorting is a kernel Algorithm for a wide range of applications. We present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.

Viktor K Prasanna - One of the best experts on this subject based on the ideXlab platform.

  • an optimal Sorting Algorithm on reconfigurable mesh
    Journal of Parallel and Distributed Computing, 1995
    Co-Authors: Juwook Jang, Viktor K Prasanna
    Abstract:

    Abstract This paper shows nontrivial ways to use the Reconfigurable Mesh to solve several basic arithmetic problems in constant time. These solutions are obtained by novel ways to represent numbers and by exploiting the reconfigurability of the architecture. In particular, a constant time Algorithm to add nk-bit numbers using an n × nk bit model of Reconfigurable Mesh is shown. Using these techniques, an optimal Sorting Algorithm on the Reconfigurable Mesh is derived. The Algorithm sorts n numbers in constant time using n × n processors. Our Algorithm uses optimal size-of the mesh to sort n numbers in constant time and satisfies the AT2 lower bound of Ω(n2) for Sorting n numbers in a variation of the word model of VLSI. The Sorting Algorithm runs on all known variations of the Reconfigurable Mesh model.

  • an optimal Sorting Algorithm on reconfigurable mesh
    International Parallel Processing Symposium, 1992
    Co-Authors: Juwook Jang, Viktor K Prasanna
    Abstract:

    An optimal Sorting Algorithm on the reconfigurable mesh is proposed. The Algorithm sorts n numbers in constant time using n*n processors. The best known previous result uses O(n*nlog/sup 2/n) processors. The presented Algorithm satisfies the AT/sup 2/ lower bound of Omega (n/sup 2/) for Sorting n numbers in the word model of VLSI. Modification to the Algorithm for area-time trade-off is shown, to achieve AT/sup 2/ optimality over 1 >

Ming Chen - One of the best experts on this subject based on the ideXlab platform.

  • A Fast Nondominated Sorting Algorithm
    2005 International Conference on Neural Networks and Brain, 2005
    Co-Authors: Ming Chen
    Abstract:

    The process of nondominated Sorting is one of main time-consuming parts of multiobjective evolutionary Algorithm (MOEA). Designing a fast nondominated Sorting Algorithm is crucial to improve the performance of MOEA. The paper uses a Better function to compare solutions, and theoretical analysis shows that the Better function has the properties of general symmetry and transitivity. Based on these properties, the Better nondominated Sorting Algorithm (BNS) is designed to reduce the comparisons among solutions distinctly. Through the simulation experiments and comparing study, the new Algorithm is found to speed up the process of nondominated Sorting in deed

Nan Yuan - One of the best experts on this subject based on the ideXlab platform.

  • High performance comparison-based Sorting Algorithm on many-core GPUs
    2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010
    Co-Authors: Xiaochun Ye, Nan Yuan, Paolo Ienne
    Abstract:

    Sorting is a kernel Algorithm for a wide range of applications. In this paper, we present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.

  • high performance comparison based Sorting Algorithm on many core gpus
    International Parallel and Distributed Processing Symposium, 2010
    Co-Authors: Xiaochun Ye, Nan Yuan, Paolo Ienne
    Abstract:

    Sorting is a kernel Algorithm for a wide range of applications. We present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.