Synchronize Thread

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 12 Experts worldwide ranked by ideXlab platform

Li Yuan - One of the best experts on this subject based on the ideXlab platform.

  • Solving Seven-Equation Model of Compressible Two-Phase Flow Using CUDA-GPU
    Communications in Computer and Information Science, 2014
    Co-Authors: Shan Liang, Wei Liu, Li Yuan
    Abstract:

    In this paper, a numerical method which combines a HLLC-type approximate Riemann solver with the third-order TVD Runge-Kutta method is presented for the two-pressure and two-velocity seven-equation model of compressible two-phase flow of Saurel and Abgrall. Based on the idea proposed by Abgrall that “a multiphase flow, uniform in pressure and velocity at t = 0, will remain uniform on the same variables during time evolution”, discretization schemes for the non-conservative terms and for the volume fraction evolution equation are derived in accordance with the adopted HLLC solver for the conservative terms. To attain high temporal accuracy, the third-order TVD Runge-Kutta method is implemented in conjunction with the operator splitting technique in a robust way by virtue of reordering the sequence of operators. Numerical tests against several one- and two-dimensional compressible two-fluid flow problems with high density and high pressure ratios demonstrate that the proposed method is accurate and robust. Besides, the above numerical algorithm is implemented on multi graphics processing units using CUDA. Appropriate data structure is adopted to maintain high memory bandwidth; skills like atom operator, counter and so on are used to Synchronize Thread blocks; overlapping domain decomposition method is applied for mission assignment. Using a single-GPU, we observe 31× speedup relative to a single-core CPU computation; linear speedup can be achieved by multi-GPU parallel computing although there might be a little decrease in single-GPU performance. abstract environment.

  • Solving seven-equation model for compressible two-phase flow using multiple GPUs
    Computers & Fluids, 2014
    Co-Authors: Shan Liang, Wei Liu, Li Yuan
    Abstract:

    Abstract In this paper, the application of an HLLC-type approximate Riemann solver in conjunction with the third-order TVD Runge–Kutta method to the seven-equation compressible two-phase model on multiple Graphics Processing Units (GPUs) is presented. Based on the idea proposed by Abgrall et al. that “a multiphase flow, uniform in pressure and velocity at t = 0 , will remain uniform on the same variables during time evolution”, discretization schemes for the non-conservative terms and for the volume fraction evolution equation are derived in accordance with the HLLC solver used for the conservative terms. To attain high temporal accuracy, the third-order TVD Runge–Kutta method is implemented in conjunction with operator splitting technique, in which the sequence of operators is recorded in order to compute free surface problems robustly. For large scale simulations, the numerical method is implemented using MPI/PThread-CUDA parallelization paradigm for multiple GPUs. Domain decomposition method is used to distribute data to different GPUs, parallel computation inside a GPU is accomplished using CUDA, and communication between GPUs is performed via MPI or PThread. Efficient data structure and GPU memory usage are employed to maintain high memory bandwidth of the device, while a special procedure is designed to Synchronize Thread blocks so as to reduce frequencies of kernel launching. Numerical tests against several one- and two-dimensional compressible two-phase flow problems with high density and high pressure ratios demonstrate that the present method is accurate and robust. The timing tests show that the overall speedup of one NVIDIA Tesla C2075 GPU is 31 × compared with one Intel Xeon Westmere 5675 CPU core, and nearly 70% parallel efficiency can be obtained when using 8 GPUs.

Ben Lee - One of the best experts on this subject based on the ideXlab platform.

  • ISCA PDCS - Dynamic Simultaneous MultiThreaded Architecture
    2003
    Co-Authors: Daniel Ortiz-arroyo, Ben Lee
    Abstract:

    This paper presents the Dynamic Simultaneous MultiThreaded Architecture (DSMT). DSMT efficiently executes multiple Threads from a single program on a SMT processor core. To accomplish this, Threads are generated dynamically from a predictable flow of control and then executed speculatively. Data obtained during the single context nonspeculative execution phase of DSMT is used as a hint to speculate the posterior behavior of multiple Threads. DSMT employs simple mechanisms based on state bits that keep track of inter-Thread dependencies in registers and memory, Synchronize Thread execution, and control recovery from misspeculation. Moreover, DSMT utilizes a novel greedy policy for choosing those sections of code which provide the highest performance based on their past execution history. The DSMT architecture was simulated with a new cycle-accurate, execution-driven simulator. Our simulation results show that DSMT has very good potential to improve SMT performance, even when only a single program is available. However, we found that dynamic Thread behavior together with frequent misspeculation may also produce diminishing returns in performance. Therefore, the challenge is to maximize the amount of Thread-level parallelism that DSMT is capable of exploiting and at the same time reduce the frequency of misspeculations.

  • Dynamic Simultaneous MultiThreading Architecture
    2003
    Co-Authors: Daniel Ortiz-arroyo, Ben Lee
    Abstract:

    This paper presents the Dynamic Simultaneous MultiThreaded Architecture (DSMT). DSMT efficiently executes multiple Threads from a single program on a SMT processor core. To accomplish this, Threads are generated dynamically from a predictable flow of control and then executed speculatively. Data obtained during the single context nonspeculative execution phase of DSMT is used as a hint to speculate the posterior behavior of multiple Threads. DSMT employs simple mechanisms based on state bits that keep track of inter-Thread dependencies in registers and memory, Synchronize Thread execution, and control recovery from misspeculation. Moreover, DSMT utilizes a novel greedy policy for choosing those sections of code which provide the highest performance based on their past execution history. The DSMT architecture was simulated with a new cycle-accurate, execution-driven simulator. Our simulation results show that DSMT has very good potential to improve SMT performance, even when only a single program is available. However, we found that dynamic Thread behavior together with frequent misspeculation may also produce diminishing returns in performance. Therefore, the challenge is to maximize the amount of Thread-level parallelism that DSMT is capable of exploiting and at the same time reduce the frequency of misspeculations

Shan Liang - One of the best experts on this subject based on the ideXlab platform.

  • Solving Seven-Equation Model of Compressible Two-Phase Flow Using CUDA-GPU
    Communications in Computer and Information Science, 2014
    Co-Authors: Shan Liang, Wei Liu, Li Yuan
    Abstract:

    In this paper, a numerical method which combines a HLLC-type approximate Riemann solver with the third-order TVD Runge-Kutta method is presented for the two-pressure and two-velocity seven-equation model of compressible two-phase flow of Saurel and Abgrall. Based on the idea proposed by Abgrall that “a multiphase flow, uniform in pressure and velocity at t = 0, will remain uniform on the same variables during time evolution”, discretization schemes for the non-conservative terms and for the volume fraction evolution equation are derived in accordance with the adopted HLLC solver for the conservative terms. To attain high temporal accuracy, the third-order TVD Runge-Kutta method is implemented in conjunction with the operator splitting technique in a robust way by virtue of reordering the sequence of operators. Numerical tests against several one- and two-dimensional compressible two-fluid flow problems with high density and high pressure ratios demonstrate that the proposed method is accurate and robust. Besides, the above numerical algorithm is implemented on multi graphics processing units using CUDA. Appropriate data structure is adopted to maintain high memory bandwidth; skills like atom operator, counter and so on are used to Synchronize Thread blocks; overlapping domain decomposition method is applied for mission assignment. Using a single-GPU, we observe 31× speedup relative to a single-core CPU computation; linear speedup can be achieved by multi-GPU parallel computing although there might be a little decrease in single-GPU performance. abstract environment.

  • Solving seven-equation model for compressible two-phase flow using multiple GPUs
    Computers & Fluids, 2014
    Co-Authors: Shan Liang, Wei Liu, Li Yuan
    Abstract:

    Abstract In this paper, the application of an HLLC-type approximate Riemann solver in conjunction with the third-order TVD Runge–Kutta method to the seven-equation compressible two-phase model on multiple Graphics Processing Units (GPUs) is presented. Based on the idea proposed by Abgrall et al. that “a multiphase flow, uniform in pressure and velocity at t = 0 , will remain uniform on the same variables during time evolution”, discretization schemes for the non-conservative terms and for the volume fraction evolution equation are derived in accordance with the HLLC solver used for the conservative terms. To attain high temporal accuracy, the third-order TVD Runge–Kutta method is implemented in conjunction with operator splitting technique, in which the sequence of operators is recorded in order to compute free surface problems robustly. For large scale simulations, the numerical method is implemented using MPI/PThread-CUDA parallelization paradigm for multiple GPUs. Domain decomposition method is used to distribute data to different GPUs, parallel computation inside a GPU is accomplished using CUDA, and communication between GPUs is performed via MPI or PThread. Efficient data structure and GPU memory usage are employed to maintain high memory bandwidth of the device, while a special procedure is designed to Synchronize Thread blocks so as to reduce frequencies of kernel launching. Numerical tests against several one- and two-dimensional compressible two-phase flow problems with high density and high pressure ratios demonstrate that the present method is accurate and robust. The timing tests show that the overall speedup of one NVIDIA Tesla C2075 GPU is 31 × compared with one Intel Xeon Westmere 5675 CPU core, and nearly 70% parallel efficiency can be obtained when using 8 GPUs.

Daniel Ortiz-arroyo - One of the best experts on this subject based on the ideXlab platform.

  • ISCA PDCS - Dynamic Simultaneous MultiThreaded Architecture
    2003
    Co-Authors: Daniel Ortiz-arroyo, Ben Lee
    Abstract:

    This paper presents the Dynamic Simultaneous MultiThreaded Architecture (DSMT). DSMT efficiently executes multiple Threads from a single program on a SMT processor core. To accomplish this, Threads are generated dynamically from a predictable flow of control and then executed speculatively. Data obtained during the single context nonspeculative execution phase of DSMT is used as a hint to speculate the posterior behavior of multiple Threads. DSMT employs simple mechanisms based on state bits that keep track of inter-Thread dependencies in registers and memory, Synchronize Thread execution, and control recovery from misspeculation. Moreover, DSMT utilizes a novel greedy policy for choosing those sections of code which provide the highest performance based on their past execution history. The DSMT architecture was simulated with a new cycle-accurate, execution-driven simulator. Our simulation results show that DSMT has very good potential to improve SMT performance, even when only a single program is available. However, we found that dynamic Thread behavior together with frequent misspeculation may also produce diminishing returns in performance. Therefore, the challenge is to maximize the amount of Thread-level parallelism that DSMT is capable of exploiting and at the same time reduce the frequency of misspeculations.

  • Dynamic Simultaneous MultiThreading Architecture
    2003
    Co-Authors: Daniel Ortiz-arroyo, Ben Lee
    Abstract:

    This paper presents the Dynamic Simultaneous MultiThreaded Architecture (DSMT). DSMT efficiently executes multiple Threads from a single program on a SMT processor core. To accomplish this, Threads are generated dynamically from a predictable flow of control and then executed speculatively. Data obtained during the single context nonspeculative execution phase of DSMT is used as a hint to speculate the posterior behavior of multiple Threads. DSMT employs simple mechanisms based on state bits that keep track of inter-Thread dependencies in registers and memory, Synchronize Thread execution, and control recovery from misspeculation. Moreover, DSMT utilizes a novel greedy policy for choosing those sections of code which provide the highest performance based on their past execution history. The DSMT architecture was simulated with a new cycle-accurate, execution-driven simulator. Our simulation results show that DSMT has very good potential to improve SMT performance, even when only a single program is available. However, we found that dynamic Thread behavior together with frequent misspeculation may also produce diminishing returns in performance. Therefore, the challenge is to maximize the amount of Thread-level parallelism that DSMT is capable of exploiting and at the same time reduce the frequency of misspeculations

Wei Liu - One of the best experts on this subject based on the ideXlab platform.

  • Solving Seven-Equation Model of Compressible Two-Phase Flow Using CUDA-GPU
    Communications in Computer and Information Science, 2014
    Co-Authors: Shan Liang, Wei Liu, Li Yuan
    Abstract:

    In this paper, a numerical method which combines a HLLC-type approximate Riemann solver with the third-order TVD Runge-Kutta method is presented for the two-pressure and two-velocity seven-equation model of compressible two-phase flow of Saurel and Abgrall. Based on the idea proposed by Abgrall that “a multiphase flow, uniform in pressure and velocity at t = 0, will remain uniform on the same variables during time evolution”, discretization schemes for the non-conservative terms and for the volume fraction evolution equation are derived in accordance with the adopted HLLC solver for the conservative terms. To attain high temporal accuracy, the third-order TVD Runge-Kutta method is implemented in conjunction with the operator splitting technique in a robust way by virtue of reordering the sequence of operators. Numerical tests against several one- and two-dimensional compressible two-fluid flow problems with high density and high pressure ratios demonstrate that the proposed method is accurate and robust. Besides, the above numerical algorithm is implemented on multi graphics processing units using CUDA. Appropriate data structure is adopted to maintain high memory bandwidth; skills like atom operator, counter and so on are used to Synchronize Thread blocks; overlapping domain decomposition method is applied for mission assignment. Using a single-GPU, we observe 31× speedup relative to a single-core CPU computation; linear speedup can be achieved by multi-GPU parallel computing although there might be a little decrease in single-GPU performance. abstract environment.

  • Solving seven-equation model for compressible two-phase flow using multiple GPUs
    Computers & Fluids, 2014
    Co-Authors: Shan Liang, Wei Liu, Li Yuan
    Abstract:

    Abstract In this paper, the application of an HLLC-type approximate Riemann solver in conjunction with the third-order TVD Runge–Kutta method to the seven-equation compressible two-phase model on multiple Graphics Processing Units (GPUs) is presented. Based on the idea proposed by Abgrall et al. that “a multiphase flow, uniform in pressure and velocity at t = 0 , will remain uniform on the same variables during time evolution”, discretization schemes for the non-conservative terms and for the volume fraction evolution equation are derived in accordance with the HLLC solver used for the conservative terms. To attain high temporal accuracy, the third-order TVD Runge–Kutta method is implemented in conjunction with operator splitting technique, in which the sequence of operators is recorded in order to compute free surface problems robustly. For large scale simulations, the numerical method is implemented using MPI/PThread-CUDA parallelization paradigm for multiple GPUs. Domain decomposition method is used to distribute data to different GPUs, parallel computation inside a GPU is accomplished using CUDA, and communication between GPUs is performed via MPI or PThread. Efficient data structure and GPU memory usage are employed to maintain high memory bandwidth of the device, while a special procedure is designed to Synchronize Thread blocks so as to reduce frequencies of kernel launching. Numerical tests against several one- and two-dimensional compressible two-phase flow problems with high density and high pressure ratios demonstrate that the present method is accurate and robust. The timing tests show that the overall speedup of one NVIDIA Tesla C2075 GPU is 31 × compared with one Intel Xeon Westmere 5675 CPU core, and nearly 70% parallel efficiency can be obtained when using 8 GPUs.