The Experts below are selected from a list of 12921 Experts worldwide ranked by ideXlab platform
Paolo Ienne - One of the best experts on this subject based on the ideXlab platform.
-
High performance comparison-based Sorting Algorithm on many-core GPUs
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010Co-Authors: Xiaochun Ye, Nan Yuan, Paolo IenneAbstract:Sorting is a kernel Algorithm for a wide range of applications. In this paper, we present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.
-
high performance comparison based Sorting Algorithm on many core gpus
International Parallel and Distributed Processing Symposium, 2010Co-Authors: Xiaochun Ye, Nan Yuan, Paolo IenneAbstract:Sorting is a kernel Algorithm for a wide range of applications. We present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.
Xiaochun Ye - One of the best experts on this subject based on the ideXlab platform.
-
High performance comparison-based Sorting Algorithm on many-core GPUs
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010Co-Authors: Xiaochun Ye, Nan Yuan, Paolo IenneAbstract:Sorting is a kernel Algorithm for a wide range of applications. In this paper, we present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.
-
high performance comparison based Sorting Algorithm on many core gpus
International Parallel and Distributed Processing Symposium, 2010Co-Authors: Xiaochun Ye, Nan Yuan, Paolo IenneAbstract:Sorting is a kernel Algorithm for a wide range of applications. We present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.
Viktor K Prasanna - One of the best experts on this subject based on the ideXlab platform.
-
an optimal Sorting Algorithm on reconfigurable mesh
Journal of Parallel and Distributed Computing, 1995Co-Authors: Juwook Jang, Viktor K PrasannaAbstract:Abstract This paper shows nontrivial ways to use the Reconfigurable Mesh to solve several basic arithmetic problems in constant time. These solutions are obtained by novel ways to represent numbers and by exploiting the reconfigurability of the architecture. In particular, a constant time Algorithm to add nk-bit numbers using an n × nk bit model of Reconfigurable Mesh is shown. Using these techniques, an optimal Sorting Algorithm on the Reconfigurable Mesh is derived. The Algorithm sorts n numbers in constant time using n × n processors. Our Algorithm uses optimal size-of the mesh to sort n numbers in constant time and satisfies the AT2 lower bound of Ω(n2) for Sorting n numbers in a variation of the word model of VLSI. The Sorting Algorithm runs on all known variations of the Reconfigurable Mesh model.
-
an optimal Sorting Algorithm on reconfigurable mesh
International Parallel Processing Symposium, 1992Co-Authors: Juwook Jang, Viktor K PrasannaAbstract:An optimal Sorting Algorithm on the reconfigurable mesh is proposed. The Algorithm sorts n numbers in constant time using n*n processors. The best known previous result uses O(n*nlog/sup 2/n) processors. The presented Algorithm satisfies the AT/sup 2/ lower bound of Omega (n/sup 2/) for Sorting n numbers in the word model of VLSI. Modification to the Algorithm for area-time trade-off is shown, to achieve AT/sup 2/ optimality over 1 >
Ming Chen - One of the best experts on this subject based on the ideXlab platform.
-
A Fast Nondominated Sorting Algorithm
2005 International Conference on Neural Networks and Brain, 2005Co-Authors: Ming ChenAbstract:The process of nondominated Sorting is one of main time-consuming parts of multiobjective evolutionary Algorithm (MOEA). Designing a fast nondominated Sorting Algorithm is crucial to improve the performance of MOEA. The paper uses a Better function to compare solutions, and theoretical analysis shows that the Better function has the properties of general symmetry and transitivity. Based on these properties, the Better nondominated Sorting Algorithm (BNS) is designed to reduce the comparisons among solutions distinctly. Through the simulation experiments and comparing study, the new Algorithm is found to speed up the process of nondominated Sorting in deed
Nan Yuan - One of the best experts on this subject based on the ideXlab platform.
-
High performance comparison-based Sorting Algorithm on many-core GPUs
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010Co-Authors: Xiaochun Ye, Nan Yuan, Paolo IenneAbstract:Sorting is a kernel Algorithm for a wide range of applications. In this paper, we present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.
-
high performance comparison based Sorting Algorithm on many core gpus
International Parallel and Distributed Processing Symposium, 2010Co-Authors: Xiaochun Ye, Nan Yuan, Paolo IenneAbstract:Sorting is a kernel Algorithm for a wide range of applications. We present a new Algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our Algorithm achieves high performance by efficiently mapping the Sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic Sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU Sorting Algorithm on input sequences with millions of elements.