The Experts below are selected from a list of 1779 Experts worldwide ranked by ideXlab platform
Issei Fujishiro - One of the best experts on this subject based on the ideXlab platform.
-
IWOMP - Optimization Strategies Using Hybrid MPI+OpenMP Parallelization for Large-Scale Data Visualization on Earth Simulator
Lecture Notes in Computer Science, 2007Co-Authors: Li Chen, Issei FujishiroAbstract:An efficient parallel visualization library has been developed for the Earth Simulator. Due to its SMP cluster architecture, a three-level hybrid parallel programming model, including message passing for inter-SMP node communication, loop directives by OpenMP for intra-SMP node parallelization and vectorization for each processing element (PE) was adopted. In order to get good speedup performance with OpenMP parallelization, many strategies are used in implementing the visualization modules such as thread parallelization by OpenMP considering seed point distributions and flow features for parallel streamline generation, multi-coloring reordering to Avoid Data Race of shared variables, some kinds of coherence removal, and hybrid image-space and object-space parallel for volume rendering. Experimental results on the Earth Simulator demonstrate the feasibility and effectiveness of our methods.
G.V. Lo - One of the best experts on this subject based on the ideXlab platform.
-
An SPMD-Like Algorithm for Parallelizing Molecular Dynamics Using OpenMP
Computing in Science & Engineering, 2013Co-Authors: Mingze Bai, Shixin Sun, Yusheng Dou, Hong Tang, G.V. LoAbstract:The efficiency and scalability of early efforts to parallelize molecular dynamics calculations on shared-memory systems using OpenMP have been limited by attempts to Avoid Data Race. Recent work has produced better performance, but involves significant revisions to the serial code. A new algorithm addresses these limitations.
Li Chen - One of the best experts on this subject based on the ideXlab platform.
-
IWOMP - Optimization Strategies Using Hybrid MPI+OpenMP Parallelization for Large-Scale Data Visualization on Earth Simulator
Lecture Notes in Computer Science, 2007Co-Authors: Li Chen, Issei FujishiroAbstract:An efficient parallel visualization library has been developed for the Earth Simulator. Due to its SMP cluster architecture, a three-level hybrid parallel programming model, including message passing for inter-SMP node communication, loop directives by OpenMP for intra-SMP node parallelization and vectorization for each processing element (PE) was adopted. In order to get good speedup performance with OpenMP parallelization, many strategies are used in implementing the visualization modules such as thread parallelization by OpenMP considering seed point distributions and flow features for parallel streamline generation, multi-coloring reordering to Avoid Data Race of shared variables, some kinds of coherence removal, and hybrid image-space and object-space parallel for volume rendering. Experimental results on the Earth Simulator demonstrate the feasibility and effectiveness of our methods.
Mingze Bai - One of the best experts on this subject based on the ideXlab platform.
-
An SPMD-Like Algorithm for Parallelizing Molecular Dynamics Using OpenMP
Computing in Science & Engineering, 2013Co-Authors: Mingze Bai, Shixin Sun, Yusheng Dou, Hong Tang, G.V. LoAbstract:The efficiency and scalability of early efforts to parallelize molecular dynamics calculations on shared-memory systems using OpenMP have been limited by attempts to Avoid Data Race. Recent work has produced better performance, but involves significant revisions to the serial code. A new algorithm addresses these limitations.
Mahta Moghaddam - One of the best experts on this subject based on the ideXlab platform.
-
GPU accelerated 3D nonlinear time domain inversion of realistic breast phantoms with multiparameter optimization
2013 USNC-URSI Radio Science Meeting (Joint with AP-S Symposium), 2013Co-Authors: Guanbo Chen, Mahta MoghaddamAbstract:Summary form only given. The detection of early-stage breast tumors with microwave imagers has received considerable attention in the recent years. However, reconstructing the complex dielectric profile of the realistic breast phantom remains a computationally costly challenge. This paper presents a GPU accelerated 3D time-domain nonlinear inverse scattering technique to effectively reconstruct the complex dielectric profile of realistic breast phantoms. The 3D nonlinear time domain inversion technique is based on the Born iterative method (BIM). BIM assumes that in the first iteration, the total field inside the object can be approximated by the incident field. Within each iteration of the BIM, both forward problem and inverse problem are solved once. Here the calculation of both the forward problem and the inverse problem are accelerated by the Tesla C2075 GPU from Nvidia. The acceleration method is based on the Compute Unified Device Architecture (CUDA) introduced by Nvidia to leverage the parallel computation power of its general-purpose GPU. In our method, the forward problem is solved with the Auxiliary Differential Equation Finite Difference Time Domain method (ADE FDTD) with the convolution perfectly matched layer (CPML). The main ADE FDTD algorithm to update the E and H fields, and the algorithm to update six CPML boundaries at the six sides of the domain are accelerated by different GPU kernels. Within each kernel, all the field points are calculated in parallel. However, each kernel is launched sequentially to Avoid Data Race because different kernels may update the same field in the same region considering the overlap of PML slabs. The inversion is carried out with a regularized local optimization process, wherein a multi-parameter optimization scheme is designed to accommodate the three sets of unknowns, namely the real part of permittivity, conductivity, and a dispersion parameter. This process is also accelerated with the GPU while formulating the inversion matrix and solving the matrix with the conjugate gradient method. The acceleration has achieved a speedup factor of at least 25 for solving the forward problem and a speedup factor of 5 for the inversion while reconstructing the realistic breast phantom at 2mm voxel size. The realistic Wisconsin breast phantoms derived from MRI Data are used here. The phantom provides a single-pole Debye relaxation model based complex dielectric profile of the breast tissue over our frequency of interest 0.5 to 3.7GHz. Imaging results for several phantoms will be shown and will demonstrate the reconstructed spatial distribution of the fiber glandular tissue of the breast. The comparison of the total computation expense between utilizing GPU and CPU will also be presented.