LU Decomposition

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 4617 Experts worldwide ranked by ideXlab platform

Minghui Wang - One of the best experts on this subject based on the ideXlab platform.

Barbara Chapman - One of the best experts on this subject based on the ideXlab platform.

  • ESPM2@SC - A scalable task parallelism approach for LU Decomposition with multicore CPUs
    2016
    Co-Authors: Verinder S. Rana, Meifeng Lin, Barbara Chapman
    Abstract:

    Many scientific applications have linear systems A · x = b which need to be solved for different vectors b. LU Decomposition, which is a variant of Gaussian Elimination, is an efficient technique to solve a linear system. The main idea of the LU Decomposition is to factorize A into an upper (U) triangular and a lower (L) triangular matrix such that A = LU. This paper presents an OpenMP task parallel approach for the LU factorization of dense matrices. The tasking model is based on the individual computational tasks which occur during the block-wise LU factorization. We describe the right-looking variant of the LU Decomposition algorithm in the task parallel approach, and provide an efficient implementation of the algorithm for shared memory machines. We demonstrate that with the task scheduling features provided by OpenMP 4.0, the right-looking LU Decomposition can scale well. We then conduct an experimental evaLUation of the task parallel implementation in comparison with the parallel-for implementation of the Gaussian elimination with pivoting and LU Decomposition using the GNU Scientific Library on a multicore platform. From the experiments we concLUde that the proposed task-based implementation is a good soLUtion for solving large systems of linear equations using LU Decomposition.

  • A Scalable Task Parallelism Approach for LU Decomposition with Multicore CPUs
    2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2), 2016
    Co-Authors: Verinder S. Rana, Barbara Chapman
    Abstract:

    Many scientific applications have linear systems A · x = b which need to be solved for different vectors b. LU Decomposition, which is a variant of Gaussian Elimination, is an efficient technique to solve a linear system. The main idea of the LU Decomposition is to factorize A into an upper (U) triangular and a lower (L) triangular matrix such that A = LU. This paper presents an OpenMP task parallel approach for the LU factorization of dense matrices. The tasking model is based on the individual computational tasks which occur during the block-wise LU factorization. We describe the right-looking variant of the LU Decomposition algorithm in the task parallel approach, and provide an efficient implementation of the algorithm for shared memory machines. We demonstrate that with the task scheduling features provided by OpenMP 4.0, the right-looking LU Decomposition can scale well. We then conduct an experimental evaLUation of the task parallel implementation in comparison with the parallel-for implementation of the Gaussian elimination with pivoting and LU Decomposition using the GNU Scientific Library on a multicore platform. From the experiments we concLUde that the proposed task-based implementation is a good soLUtion for solving large systems of linear equations using LU Decomposition.

Stefan Lüpke - One of the best experts on this subject based on the ideXlab platform.

  • PARLE - LU-Decomposition on a Massively Parallel Transputer System
    Lecture Notes in Computer Science, 1993
    Co-Authors: Stefan Lüpke
    Abstract:

    Two algorithms for LU-Decomposition on a transputer based reconfigurable MIMD parallel computer with distributed memory have been analyzed in view of the interdependence of granularity and execution time. In order to investigate this experimentally, LU-Decomposition algorithms have been implemented on a parallel computer, the Parsytec SuperCLUster 128. The results of this investigation may be summarized as follows. The LU-Decomposition algorithms are very efficient on the parallel computer, if the ratio between problem size and number of processors is not too small. No loss of efficiency is to be expected, if the number of processors is increased only proportionally to the number of elements in the matrix being decomposed.

Xi Ning - One of the best experts on this subject based on the ideXlab platform.

  • A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors
    2013
    Co-Authors: Kai Zhang, Shuming Chen, Wei Liu, Xi Ning
    Abstract:

    The LU Decomposition is a widely used method to solve the dense linear algebra in many scientific computation applications. In recent years, the single instruction multiple data (SIMD) technology has been a popular method to accelerate the LU Decomposition. However, the pipeline parallelism and memory bandwidth utilization are low when the LU Decomposition mapped onto SIMD processors. This paper proposes a fine-grained pipelined implementation of LU Decomposition on SIMD processors. The fine-grained algorithm well utilizes data dependences of the native algorithm to explore the fine-grained parallelism among all the computation resources. By transforming the non-coalesced memory access to coalesced version, the proposed algorithm can achieve the high pipeline parallelism and the high efficient memory access. Experimental results show that the proposed technology can achieve a speedup of 1.04x to 1.82x over the native algorithm and can achieve about 89% of the peak performance on the SIMD processor.

  • NPC - A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors
    Lecture Notes in Computer Science, 2013
    Co-Authors: Kai Zhang, Shuming Chen, Wei Liu, Xi Ning
    Abstract:

    The LU Decomposition is a widely used method to solve the dense linear algebra in many scientific computation applications. In recent years, the single instruction multiple data SIMD technology has been a popular method to accelerate the LU Decomposition. However, the pipeline parallelism and memory bandwidth utilization are low when the LU Decomposition mapped onto SIMD processors. This paper proposes a fine-grained pipelined implementation of LU Decomposition on SIMD processors. The fine-grained algorithm well utilizes data dependences of the native algorithm to explore the fine-grained parallelism among all the computation resources. By transforming the non-coalesced memory access to coalesced version, the proposed algorithm can achieve the high pipeline parallelism and the high efficient memory access. Experimental results show that the proposed technology can achieve a speedup of 1.04x to 1.82x over the native algorithm and can achieve about 89% of the peak performance on the SIMD processor.

H. Sugihara - One of the best experts on this subject based on the ideXlab platform.

  • Complete LU Decomposition conjugate residual method and its performance for large-scale circuit simulation
    1988. IEEE International Symposium on Circuits and Systems, 1
    Co-Authors: A. Yajima, F. Yamamoto, T. Morioka, H. Sugihara
    Abstract:

    A method for solving large unsymmetric (LU) systems of linear equations arising from circuit transient analysis is proposed. This approach is based on the conjugate residual method, but is reinforced by the stability of LU Decomposition. Unlike other preconditioned iterative methods, complete LU Decomposition of a matrix at a previous time point is taken as a preconditioner of the current matrix to be solved. Only after the iterative process is judged to be nonconvergent is the current matrix decomposed. A novel test for the halt residual reduction is used to detect such a situation. Using this method, 10 LSI circuits with matrices ranging from 35 to 3668 equations have been analyzed. Correct transient soLUtions were obtained with only one to seven LU Decompositions per hundred Newton-Raphson iterations and 8 to 40 iterations of conjugate residual on the average. >