The Experts below are selected from a list of 4617 Experts worldwide ranked by ideXlab platform
Minghui Wang - One of the best experts on this subject based on the ideXlab platform.
-
A structure-preserving method for the quaternion LU Decomposition in quaternionic quantum theory☆
Computer Physics Communications, 2013Co-Authors: Minghui WangAbstract:Abstract In this paper, for the first time, the structure-preserving Gauss transformation is defined. Then by means of its real representation matrix, we present a novel structure-preserving algorithm for the LU Decomposition of a quaternion matrix. Numerical experiments show that the structure-preserving algorithm is better than that in the newest quaternion toolbox for matlab (QTFM).
Barbara Chapman - One of the best experts on this subject based on the ideXlab platform.
-
ESPM2@SC - A scalable task parallelism approach for LU Decomposition with multicore CPUs
2016Co-Authors: Verinder S. Rana, Meifeng Lin, Barbara ChapmanAbstract:Many scientific applications have linear systems A · x = b which need to be solved for different vectors b. LU Decomposition, which is a variant of Gaussian Elimination, is an efficient technique to solve a linear system. The main idea of the LU Decomposition is to factorize A into an upper (U) triangular and a lower (L) triangular matrix such that A = LU. This paper presents an OpenMP task parallel approach for the LU factorization of dense matrices. The tasking model is based on the individual computational tasks which occur during the block-wise LU factorization. We describe the right-looking variant of the LU Decomposition algorithm in the task parallel approach, and provide an efficient implementation of the algorithm for shared memory machines. We demonstrate that with the task scheduling features provided by OpenMP 4.0, the right-looking LU Decomposition can scale well. We then conduct an experimental evaLUation of the task parallel implementation in comparison with the parallel-for implementation of the Gaussian elimination with pivoting and LU Decomposition using the GNU Scientific Library on a multicore platform. From the experiments we concLUde that the proposed task-based implementation is a good soLUtion for solving large systems of linear equations using LU Decomposition.
-
A Scalable Task Parallelism Approach for LU Decomposition with Multicore CPUs
2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2), 2016Co-Authors: Verinder S. Rana, Barbara ChapmanAbstract:Many scientific applications have linear systems A · x = b which need to be solved for different vectors b. LU Decomposition, which is a variant of Gaussian Elimination, is an efficient technique to solve a linear system. The main idea of the LU Decomposition is to factorize A into an upper (U) triangular and a lower (L) triangular matrix such that A = LU. This paper presents an OpenMP task parallel approach for the LU factorization of dense matrices. The tasking model is based on the individual computational tasks which occur during the block-wise LU factorization. We describe the right-looking variant of the LU Decomposition algorithm in the task parallel approach, and provide an efficient implementation of the algorithm for shared memory machines. We demonstrate that with the task scheduling features provided by OpenMP 4.0, the right-looking LU Decomposition can scale well. We then conduct an experimental evaLUation of the task parallel implementation in comparison with the parallel-for implementation of the Gaussian elimination with pivoting and LU Decomposition using the GNU Scientific Library on a multicore platform. From the experiments we concLUde that the proposed task-based implementation is a good soLUtion for solving large systems of linear equations using LU Decomposition.
Stefan Lüpke - One of the best experts on this subject based on the ideXlab platform.
-
PARLE - LU-Decomposition on a Massively Parallel Transputer System
Lecture Notes in Computer Science, 1993Co-Authors: Stefan LüpkeAbstract:Two algorithms for LU-Decomposition on a transputer based reconfigurable MIMD parallel computer with distributed memory have been analyzed in view of the interdependence of granularity and execution time. In order to investigate this experimentally, LU-Decomposition algorithms have been implemented on a parallel computer, the Parsytec SuperCLUster 128. The results of this investigation may be summarized as follows. The LU-Decomposition algorithms are very efficient on the parallel computer, if the ratio between problem size and number of processors is not too small. No loss of efficiency is to be expected, if the number of processors is increased only proportionally to the number of elements in the matrix being decomposed.
Xi Ning - One of the best experts on this subject based on the ideXlab platform.
-
A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors
2013Co-Authors: Kai Zhang, Shuming Chen, Wei Liu, Xi NingAbstract:The LU Decomposition is a widely used method to solve the dense linear algebra in many scientific computation applications. In recent years, the single instruction multiple data (SIMD) technology has been a popular method to accelerate the LU Decomposition. However, the pipeline parallelism and memory bandwidth utilization are low when the LU Decomposition mapped onto SIMD processors. This paper proposes a fine-grained pipelined implementation of LU Decomposition on SIMD processors. The fine-grained algorithm well utilizes data dependences of the native algorithm to explore the fine-grained parallelism among all the computation resources. By transforming the non-coalesced memory access to coalesced version, the proposed algorithm can achieve the high pipeline parallelism and the high efficient memory access. Experimental results show that the proposed technology can achieve a speedup of 1.04x to 1.82x over the native algorithm and can achieve about 89% of the peak performance on the SIMD processor.
-
NPC - A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors
Lecture Notes in Computer Science, 2013Co-Authors: Kai Zhang, Shuming Chen, Wei Liu, Xi NingAbstract:The LU Decomposition is a widely used method to solve the dense linear algebra in many scientific computation applications. In recent years, the single instruction multiple data SIMD technology has been a popular method to accelerate the LU Decomposition. However, the pipeline parallelism and memory bandwidth utilization are low when the LU Decomposition mapped onto SIMD processors. This paper proposes a fine-grained pipelined implementation of LU Decomposition on SIMD processors. The fine-grained algorithm well utilizes data dependences of the native algorithm to explore the fine-grained parallelism among all the computation resources. By transforming the non-coalesced memory access to coalesced version, the proposed algorithm can achieve the high pipeline parallelism and the high efficient memory access. Experimental results show that the proposed technology can achieve a speedup of 1.04x to 1.82x over the native algorithm and can achieve about 89% of the peak performance on the SIMD processor.
H. Sugihara - One of the best experts on this subject based on the ideXlab platform.
-
Complete LU Decomposition conjugate residual method and its performance for large-scale circuit simulation
1988. IEEE International Symposium on Circuits and Systems, 1Co-Authors: A. Yajima, F. Yamamoto, T. Morioka, H. SugiharaAbstract:A method for solving large unsymmetric (LU) systems of linear equations arising from circuit transient analysis is proposed. This approach is based on the conjugate residual method, but is reinforced by the stability of LU Decomposition. Unlike other preconditioned iterative methods, complete LU Decomposition of a matrix at a previous time point is taken as a preconditioner of the current matrix to be solved. Only after the iterative process is judged to be nonconvergent is the current matrix decomposed. A novel test for the halt residual reduction is used to detect such a situation. Using this method, 10 LSI circuits with matrices ranging from 35 to 3668 equations have been analyzed. Correct transient soLUtions were obtained with only one to seven LU Decompositions per hundred Newton-Raphson iterations and 8 to 40 iterations of conjugate residual on the average. >