Tridiagonal System

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 252 Experts worldwide ranked by ideXlab platform

Holger Bischof - One of the best experts on this subject based on the ideXlab platform.

  • A cost-optimal parallel implementation of a Tridiagonal System solver using skeletons
    Future Generation Computer Systems, 2005
    Co-Authors: Holger Bischof, Sergei Gorlatch
    Abstract:

    We address the task of Systematically designing efficient programs for parallel machines. Our approach starts with a sequential algorithm and proceeds by expressing it in terms of standard, pre-implemented parallel components called skeletons. We demonstrate the skeleton-based design process using a Tridiagonal System solver as our example application. We develop a cost-optimal parallel version of our application and implement it in message passing interface (MPI). The performance of our solution is demonstrated experimentally on a Cray T3E machine.

  • design and implementation of a cost optimal parallel Tridiagonal System solver using skeletons
    Parallel Computing Technologies, 2003
    Co-Authors: Holger Bischof, Sergei Gorlatch, Emanuel Kitzelmann
    Abstract:

    We address the problem of Systematically designing correct parallel programs and developing their efficient implementations on parallel machines. The design process starts with an intuitive, sequential algorithm and proceeds by expressing it in terms of well-defined, pre-implemented parallel components called skeletons. We demonstrate the skeleton-based design process using the Tridiagonal System solver as our example application. We develop step by step three provably correct, parallel versions of our application, and finally arrive at a cost-optimal implementation in MPI (Message Passing Interface). The performance of our solutions is demonstrated experimentally on a Cray T3E machine.

  • cost optimality and predictability of parallel programming with skeletons
    European Conference on Parallel Processing, 2003
    Co-Authors: Holger Bischof, Sergei Gorlatch, Emanuel Kitzelmann
    Abstract:

    Skeletons are reusable, parameterized components with well-defined semantics and pre-packaged efficient parallel implementation. This paper develops a new, provably cost-optimal implementation of the DS (double-scan) skeleton for the divide-and-conquer paradigm. Our implementation is based on a novel data structure called plist (pointed list); implementation’s performance is estimated using an analytical model. We demonstrate the use of the DS skeleton for parallelizing a Tridiagonal System solver and report experimental results for its MPI implementation on a Cray T3E and a Linux cluster: they confirm the performance improvement achieved by the cost-optimal implementation and demonstrate its good predictability by our performance model.

  • PaCT - Design and Implementation of a Cost-Optimal Parallel Tridiagonal System Solver Using Skeletons
    Lecture Notes in Computer Science, 2003
    Co-Authors: Holger Bischof, Sergei Gorlatch, Emanuel Kitzelmann
    Abstract:

    We address the problem of Systematically designing correct parallel programs and developing their efficient implementations on parallel machines. The design process starts with an intuitive, sequential algorithm and proceeds by expressing it in terms of well-defined, pre-implemented parallel components called skeletons. We demonstrate the skeleton-based design process using the Tridiagonal System solver as our example application. We develop step by step three provably correct, parallel versions of our application, and finally arrive at a cost-optimal implementation in MPI (Message Passing Interface). The performance of our solutions is demonstrated experimentally on a Cray T3E machine.

Emanuel Kitzelmann - One of the best experts on this subject based on the ideXlab platform.

  • design and implementation of a cost optimal parallel Tridiagonal System solver using skeletons
    Parallel Computing Technologies, 2003
    Co-Authors: Holger Bischof, Sergei Gorlatch, Emanuel Kitzelmann
    Abstract:

    We address the problem of Systematically designing correct parallel programs and developing their efficient implementations on parallel machines. The design process starts with an intuitive, sequential algorithm and proceeds by expressing it in terms of well-defined, pre-implemented parallel components called skeletons. We demonstrate the skeleton-based design process using the Tridiagonal System solver as our example application. We develop step by step three provably correct, parallel versions of our application, and finally arrive at a cost-optimal implementation in MPI (Message Passing Interface). The performance of our solutions is demonstrated experimentally on a Cray T3E machine.

  • cost optimality and predictability of parallel programming with skeletons
    European Conference on Parallel Processing, 2003
    Co-Authors: Holger Bischof, Sergei Gorlatch, Emanuel Kitzelmann
    Abstract:

    Skeletons are reusable, parameterized components with well-defined semantics and pre-packaged efficient parallel implementation. This paper develops a new, provably cost-optimal implementation of the DS (double-scan) skeleton for the divide-and-conquer paradigm. Our implementation is based on a novel data structure called plist (pointed list); implementation’s performance is estimated using an analytical model. We demonstrate the use of the DS skeleton for parallelizing a Tridiagonal System solver and report experimental results for its MPI implementation on a Cray T3E and a Linux cluster: they confirm the performance improvement achieved by the cost-optimal implementation and demonstrate its good predictability by our performance model.

  • PaCT - Design and Implementation of a Cost-Optimal Parallel Tridiagonal System Solver Using Skeletons
    Lecture Notes in Computer Science, 2003
    Co-Authors: Holger Bischof, Sergei Gorlatch, Emanuel Kitzelmann
    Abstract:

    We address the problem of Systematically designing correct parallel programs and developing their efficient implementations on parallel machines. The design process starts with an intuitive, sequential algorithm and proceeds by expressing it in terms of well-defined, pre-implemented parallel components called skeletons. We demonstrate the skeleton-based design process using the Tridiagonal System solver as our example application. We develop step by step three provably correct, parallel versions of our application, and finally arrive at a cost-optimal implementation in MPI (Message Passing Interface). The performance of our solutions is demonstrated experimentally on a Cray T3E machine.

Sergei Gorlatch - One of the best experts on this subject based on the ideXlab platform.

  • A cost-optimal parallel implementation of a Tridiagonal System solver using skeletons
    Future Generation Computer Systems, 2005
    Co-Authors: Holger Bischof, Sergei Gorlatch
    Abstract:

    We address the task of Systematically designing efficient programs for parallel machines. Our approach starts with a sequential algorithm and proceeds by expressing it in terms of standard, pre-implemented parallel components called skeletons. We demonstrate the skeleton-based design process using a Tridiagonal System solver as our example application. We develop a cost-optimal parallel version of our application and implement it in message passing interface (MPI). The performance of our solution is demonstrated experimentally on a Cray T3E machine.

  • design and implementation of a cost optimal parallel Tridiagonal System solver using skeletons
    Parallel Computing Technologies, 2003
    Co-Authors: Holger Bischof, Sergei Gorlatch, Emanuel Kitzelmann
    Abstract:

    We address the problem of Systematically designing correct parallel programs and developing their efficient implementations on parallel machines. The design process starts with an intuitive, sequential algorithm and proceeds by expressing it in terms of well-defined, pre-implemented parallel components called skeletons. We demonstrate the skeleton-based design process using the Tridiagonal System solver as our example application. We develop step by step three provably correct, parallel versions of our application, and finally arrive at a cost-optimal implementation in MPI (Message Passing Interface). The performance of our solutions is demonstrated experimentally on a Cray T3E machine.

  • cost optimality and predictability of parallel programming with skeletons
    European Conference on Parallel Processing, 2003
    Co-Authors: Holger Bischof, Sergei Gorlatch, Emanuel Kitzelmann
    Abstract:

    Skeletons are reusable, parameterized components with well-defined semantics and pre-packaged efficient parallel implementation. This paper develops a new, provably cost-optimal implementation of the DS (double-scan) skeleton for the divide-and-conquer paradigm. Our implementation is based on a novel data structure called plist (pointed list); implementation’s performance is estimated using an analytical model. We demonstrate the use of the DS skeleton for parallelizing a Tridiagonal System solver and report experimental results for its MPI implementation on a Cray T3E and a Linux cluster: they confirm the performance improvement achieved by the cost-optimal implementation and demonstrate its good predictability by our performance model.

  • PaCT - Design and Implementation of a Cost-Optimal Parallel Tridiagonal System Solver Using Skeletons
    Lecture Notes in Computer Science, 2003
    Co-Authors: Holger Bischof, Sergei Gorlatch, Emanuel Kitzelmann
    Abstract:

    We address the problem of Systematically designing correct parallel programs and developing their efficient implementations on parallel machines. The design process starts with an intuitive, sequential algorithm and proceeds by expressing it in terms of well-defined, pre-implemented parallel components called skeletons. We demonstrate the skeleton-based design process using the Tridiagonal System solver as our example application. We develop step by step three provably correct, parallel versions of our application, and finally arrive at a cost-optimal implementation in MPI (Message Passing Interface). The performance of our solutions is demonstrated experimentally on a Cray T3E machine.

Ramón Doallo - One of the best experts on this subject based on the ideXlab platform.

  • Parallel prefix operations on GPU: Tridiagonal System solvers and scan operators
    The Journal of Supercomputing, 2019
    Co-Authors: Adrián P. Diéguez, Margarita Amor, Ramón Doallo
    Abstract:

    Modern GPUs can achieve high computing power at low cost, but still requires much time and effort. Tridiagonal System and scan solvers are one example of widely used algorithms which can take advantage of these devices. In this article, one Tridiagonal System solver and two scan primitive operators are implemented on CUDA GPUs . To do so, a tuning strategy based on three phases is developed. Additionally, a performance analysis is performed for two different CUDA GPU architectures, resulting in a huge improvement with respect to the state of the art.

  • A Fast Solver for Large Tridiagonal Systems on Multi-Core Processors (Lass Library)
    IEEE Access, 2019
    Co-Authors: Pedro Valero-lara, Diego Andrade, Raül Sirvent, Jesús Labarta, Basilio B. Fraguela, Ramón Doallo
    Abstract:

    Many problems of industrial and scientific interest require the solving of Tridiagonal linear Systems. This paper presents several implementations for the parallel solving of large Tridiagonal Systems on multi-core architectures, using the OmpSs programming model. The strategy used for the parallelization is based on the combination of two different existing algorithms, PCR and Thomas. The Thomas algorithm, which cannot be parallelized, requires the fewest number of floating point operations. The PCR algorithm is the most popular parallel method, but it is more computationally expensive than Thomas. The method proposed in this paper starts applying the PCR algorithm to break down one large Tridiagonal System into a set of smaller and independent ones. In a second step, these independent Systems are concurrently solved using Thomas. The paper also contains an analytical study of which is the best point to switch from PCR to Thomas. Also, the paper addresses the main performance issues of combining PCR and Thomas proposing a set of alternative implementations, some of them even imply algorithmic changes. The performance evaluation shows that the best implementation achieves a peak speedup of 4 with respect to the Intel MKL counterpart routine and 2.5 with respect to a single-threaded Thomas.

  • Solving Large Problem Sizes of Index-Digit Algorithms on GPU: FFT and Tridiagonal System Solvers
    IEEE Transactions on Computers, 2018
    Co-Authors: Adrian Perez Dieguez, Margarita Amor, Jacobo Lobeiras, Ramón Doallo
    Abstract:

    Current Graphics Processing Units (GPUs) are capable of obtaining high computational performance in scientific applications. Nevertheless, programmers have to use suitable parallel algorithms for these architectures and usually have to consider optimization techniques in the implementation in order to achieve said performance. There are many efficient proposals for limited-size problems which fit directly in the shared memory of CUDA GPUs, however, there are few GPU proposals that tackle the design of efficient algorithms for large problem sizes that exceed shared memory storage capacity. In this work, we present a tuning strategy that addresses this problem for some parallel prefix algorithms that can be represented according to a set of common permutations of the digits of each of its element indices [1] , denoted as Index-Digit (ID) algorithms. Specifically, our strategy has been applied to develop flexible Multi-Stage (MS) algorithms for the Fast Fourier Transform (FFT) algorithm ( MS-ID-FFT ) and a Tridiagonal System solver ( MS-ID-TS ) on the GPU. The resulting implementation is compact and outperforms other well-known and commonly used state-of-the-art libraries, with an improvement of up to 1.47x with respect to NVIDIA's complex CUFFT , and up to 33.2x in comparison with NVIDIA's CUSPARSE for real data Tridiagonal Systems.

Keqin Li - One of the best experts on this subject based on the ideXlab platform.

  • A Hybrid Parallel Solving Algorithm on GPU for Quasi-Tridiagonal System of Linear Equations
    IEEE Transactions on Parallel and Distributed Systems, 2016
    Co-Authors: Kenli Li, Wangdong Yang, Keqin Li
    Abstract:

    There are some quasi-Tridiagonal System of linear equations arising from numerical simulations, and some solving algorithms encounter great challenge on solving quasi-Tridiagonal System of linear equations with more than millions of dimensions as the scale of problems increases. We present a solving method which mixes direct and iterative methods, and our method needs less storage space in a computing process. A quasi-Tridiagonal matrix is split into a Tridiagonal matrix and a sparse matrix using our method and then the Tridiagonal equation can be solved by the direct methods in the iteration processes. Because the approximate solutions obtained by the direct methods are closer to the exact solutions, the convergence speed of solving the quasi-Tridiagonal System of linear equations can be improved. Furthermore, we present an improved cyclic reduction algorithm using a partition strategy to solve Tridiagonal equations on GPU, and the intermediate data in computing are stored in shared memory so as to significantly reduce the latency of memory access. According to our experiments on 10 test cases, the average number of iterations is reduced significantly by using our method compared with Jacobi, GS, GMRES, and BiCG respectively, and close to those of BiCGSTAB, BiCRSTAB, and TFQMR. For parallel mode, the parallel computing efficiency of our method is raised by partition strategy, and the performance using our method is better than those of the commonly used iterative and direct methods because of less amount of calculation in an iteration.