Numerical Library

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 22419 Experts worldwide ranked by ideXlab platform

Fabrice Vidal - One of the best experts on this subject based on the ideXlab platform.

  • pipelining computational stages of the tomographic reconstructor for multi object adaptive optics on a multi gpu system
    IEEE International Conference on High Performance Computing Data and Analytics, 2014
    Co-Authors: Hatem Ltaief, Arnaud Sevin, Ahmad Abdelfattah, Carine Morel, Eric Gendron, David Elliot Keyes, Damien Gratadour, Fabrice Vidal
    Abstract:

    The European Extremely Large Telescope project (E-ELT) is one of Europe's highest priorities in ground-based astronomy. ELTs are built on top of a variety of highly sensitive and critical astronomical instruments. In particular, a new instrument called MOSAIC has been proposed to perform multi-object spectroscopy using the Multi-Object Adaptive Optics (MOAO) technique. The core implementation of the simulation lies in the intensive computation of a tomographic reconstruct or (TR), which is used to drive the deformable mirror in real time from the measurements. A new Numerical algorithm is proposed (1) to capture the actual experimental noise and (2) to substantially speed up previous implementations by exposing more concurrency, while reducing the number of floating-point operations. Based on the Matrices Over Runtime System at Exascale Numerical Library (MORSE), a dynamic scheduler drives all computational stages of the tomographic reconstruct or simulation and allows to pipeline and to run tasks out-of order across different stages on heterogeneous systems, while ensuring data coherency and dependencies. The proposed TR simulation outperforms asymptotically previous state-of-the-art implementations up to 13-fold speedup. At more than 50000 unknowns, this appears to be the largest-scale AO problem submitted to computation, to date, and opens new research directions for extreme scale AO simulations.

  • pipelining computational stages of the tomographic reconstructor for multi object adaptive optics on a multi gpu system
    IEEE International Conference on High Performance Computing Data and Analytics, 2014
    Co-Authors: Hatem Ltaief, Arnaud Sevin, Ahmad Abdelfattah, Carine Morel, Eric Gendron, David Elliot Keyes, Damien Gratadour, Fabrice Vidal
    Abstract:

    The European Extremely Large Telescope project (E-ELT) is one of Europe's highest priorities in ground-based astronomy. ELTs are built on top of a variety of highly sensitive and critical astronomical instruments. In particular, a new instrument called MOSAIC has been proposed to perform multi-object spectroscopy using the Multi-Object Adaptive Optics (MOAO) technique. The core implementation of the simulation lies in the intensive computation of a tomographic reconstruct or (TR), which is used to drive the deformable mirror in real time from the measurements. A new Numerical algorithm is proposed (1) to capture the actual experimental noise and (2) to substantially speed up previous implementations by exposing more concurrency, while reducing the number of floating-point operations. Based on the Matrices Over Runtime System at Exascale Numerical Library (MORSE), a dynamic scheduler drives all computational stages of the tomographic reconstruct or simulation and allows to pipeline and to run tasks out-of order across different stages on heterogeneous systems, while ensuring data coherency and dependencies. The proposed TR simulation outperforms asymptotically previous state-of-the-art implementations up to 13-fold speedup. At more than 50000 unknowns, this appears to be the largest-scale AO problem submitted to computation, to date, and opens new research directions for extreme scale AO simulations.

Hatem Ltaief - One of the best experts on this subject based on the ideXlab platform.

  • leveraging parsec runtime support to tackle challenging 3d data sparse matrix problems
    International Parallel and Distributed Processing Symposium, 2021
    Co-Authors: Qinglei Cao, Hatem Ltaief, Yu Pei, Kadir Akbudak, George Bosilca, David E Keyes, Jack Dongarra
    Abstract:

    The task-based programming model associated with dynamic runtime systems has gained popularity for challenging problems because of workload imbalance, heterogeneous resources, or extreme concurrency. During the last decade, low-rank matrix approximations—where the main idea consists of exploiting data sparsity, typically by compressing off-diagonal tiles up to an application-specific accuracy threshold—have been adopted to address the curse of dimensionality at extreme scale. In this paper, we create a bridge between the runtime and the linear algebra by communicating knowledge of the data sparsity to the runtime. We design and implement this synergistic approach with high user productivity in mind, in the context of the PaRSEC runtime system and the HiCMA Numerical Library. This requires extending PaRSEC with new features to integrate rank information into the dataflow so that proper decisions can be made at runtime. We focus on the tile low-rank (TLR) Cholesky factorization for solving 3D data-sparse covariance matrix problems arising in environmental applications. In particular, we employ the 3D exponential model of the Mateŕn matrix kernel, which exhibits challenging nonuniform high ranks in off-diagonal tiles. We first provide dynamic data structure management driven by a performance model to reduce extra floating-point operations. Next, we optimize the memory footprint of the application by relying on a dynamic memory allocator, and supported by a rank-aware data distribution to cope with the workload imbalance. Finally, we expose further parallelism using kernel recursive formulations to shorten the critical path. Our resulting high-performance implementation outperforms existing data-sparse TLR Cholesky factorization by up to 7-fold on a large-scale distributed-memory system, while minimizing the memory footprint up to a 44-fold factor. This multidisciplinary work highlights the need to empower runtime systems beyond their original duty of task scheduling for servicing next-generation low-rank matrix algebra libraries.

  • pipelining computational stages of the tomographic reconstructor for multi object adaptive optics on a multi gpu system
    IEEE International Conference on High Performance Computing Data and Analytics, 2014
    Co-Authors: Hatem Ltaief, Arnaud Sevin, Ahmad Abdelfattah, Carine Morel, Eric Gendron, David Elliot Keyes, Damien Gratadour, Fabrice Vidal
    Abstract:

    The European Extremely Large Telescope project (E-ELT) is one of Europe's highest priorities in ground-based astronomy. ELTs are built on top of a variety of highly sensitive and critical astronomical instruments. In particular, a new instrument called MOSAIC has been proposed to perform multi-object spectroscopy using the Multi-Object Adaptive Optics (MOAO) technique. The core implementation of the simulation lies in the intensive computation of a tomographic reconstruct or (TR), which is used to drive the deformable mirror in real time from the measurements. A new Numerical algorithm is proposed (1) to capture the actual experimental noise and (2) to substantially speed up previous implementations by exposing more concurrency, while reducing the number of floating-point operations. Based on the Matrices Over Runtime System at Exascale Numerical Library (MORSE), a dynamic scheduler drives all computational stages of the tomographic reconstruct or simulation and allows to pipeline and to run tasks out-of order across different stages on heterogeneous systems, while ensuring data coherency and dependencies. The proposed TR simulation outperforms asymptotically previous state-of-the-art implementations up to 13-fold speedup. At more than 50000 unknowns, this appears to be the largest-scale AO problem submitted to computation, to date, and opens new research directions for extreme scale AO simulations.

  • pipelining computational stages of the tomographic reconstructor for multi object adaptive optics on a multi gpu system
    IEEE International Conference on High Performance Computing Data and Analytics, 2014
    Co-Authors: Hatem Ltaief, Arnaud Sevin, Ahmad Abdelfattah, Carine Morel, Eric Gendron, David Elliot Keyes, Damien Gratadour, Fabrice Vidal
    Abstract:

    The European Extremely Large Telescope project (E-ELT) is one of Europe's highest priorities in ground-based astronomy. ELTs are built on top of a variety of highly sensitive and critical astronomical instruments. In particular, a new instrument called MOSAIC has been proposed to perform multi-object spectroscopy using the Multi-Object Adaptive Optics (MOAO) technique. The core implementation of the simulation lies in the intensive computation of a tomographic reconstruct or (TR), which is used to drive the deformable mirror in real time from the measurements. A new Numerical algorithm is proposed (1) to capture the actual experimental noise and (2) to substantially speed up previous implementations by exposing more concurrency, while reducing the number of floating-point operations. Based on the Matrices Over Runtime System at Exascale Numerical Library (MORSE), a dynamic scheduler drives all computational stages of the tomographic reconstruct or simulation and allows to pipeline and to run tasks out-of order across different stages on heterogeneous systems, while ensuring data coherency and dependencies. The proposed TR simulation outperforms asymptotically previous state-of-the-art implementations up to 13-fold speedup. At more than 50000 unknowns, this appears to be the largest-scale AO problem submitted to computation, to date, and opens new research directions for extreme scale AO simulations.

Roman Iakymchuk - One of the best experts on this subject based on the ideXlab platform.

  • reproducibility accuracy and performance of the feltor code and Library on parallel computer architectures
    Computer Physics Communications, 2019
    Co-Authors: Matthias Wiesenberger, Lukas Einkemmer, Markus Held, Albert Gutierrezmilla, X Saez, Roman Iakymchuk
    Abstract:

    Abstract Feltor is a modular and free scientific software package. It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems. Feltor consists of both a Numerical Library and a collection of application codes built on top of the Library. Its main targets are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main Numerical discretization technique. We observe that Numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. However, reproducibility and accuracy alone fail to indicate correct simulation behavior. In fact, in the physical model slightly different initial conditions lead to vastly different end states. This behavior translates to its Numerical representation. Pointwise convergence, even in principle, becomes impossible for long simulation times. We briefly discuss alternative methods to ensure the correctness of results like the convergence of reduced physical quantities of interest, ensemble simulations, invariants or reduced simulation times. In a second part, we explore important performance tuning considerations. We identify latency and memory bandwidth as the main performance indicators of our routines. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in Feltor and test our model on a selection of parallel hardware architectures. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 1 0 − 1 and 1 0 3 MB. Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak).

Tomoyoshi Ito - One of the best experts on this subject based on the ideXlab platform.

  • computational wave optics Library for c cwo Library
    Computer Physics Communications, 2012
    Co-Authors: Tomoyoshi Shimobaba, Jian Tong Weng, Takahiro Sakurai, Naohisa Okada, Takashi Nishitsuji, Naoki Takada, Atsushi Shiraki, Nobuyuki Masuda, Tomoyoshi Ito
    Abstract:

    Abstract Diffraction calculations, such as the angular spectrum method and Fresnel diffractions, are used for calculating scalar light propagation. The calculations are used in wide-ranging optics fields: for example, Computer Generated Holograms (CGHs), digital holography, diffractive optical elements, microscopy, image encryption and decryption, three-dimensional analysis for optical devices and so on. However, increasing demands made by large-scale diffraction calculations have rendered the computational power of recent computers insufficient. We have already developed a Numerical Library for diffraction calculations using a Graphic Processing Unit (GPU), which was named the GWO Library. However, this GWO Library is not user-friendly, since it is based on C language and was also run only on a GPU. In this paper, we develop a new C++ class Library for diffraction and CGH calculations, which is referred to as a CWO++ Library, running on a CPU and GPU. We also describe the structure, performance, and usage examples of the CWO++ Library. Program summary Program title: CWO++ Catalogue identifier: AELL_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AELL_v1_0.html Program obtainable from: CPC Program Library, Queenʼs University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 109 809 No. of bytes in distributed program, including test data, etc.: 4 181 911 Distribution format: tar.gz Programming language: C++ Computer: General computers and general computers with NVIDIA GPUs Operating system: Windows XP, Vista, 7 Has the code been vectorized or parallelized?: Yes. 1 core processor used in CPU and many cores in GPU. RAM: 256 M bytes Classification: 18 External routines: CImg, FFTW Nature of problem: The CWO++ Library provides diffraction calculations which are useful for Computer Generated Holograms (CGHs), digital holography, diffractive optical elements, microscopy, image encryption and decryption and three-dimensional analysis for optical devices. Solution method: FFT-based diffraction calculations, computer generated holograms by direct integration. Running time: The sample runs provided take approximately 5 minutes for the C++ version and 5 seconds for the C++ with GPUs version.

Arnaud Sevin - One of the best experts on this subject based on the ideXlab platform.

  • pipelining computational stages of the tomographic reconstructor for multi object adaptive optics on a multi gpu system
    IEEE International Conference on High Performance Computing Data and Analytics, 2014
    Co-Authors: Hatem Ltaief, Arnaud Sevin, Ahmad Abdelfattah, Carine Morel, Eric Gendron, David Elliot Keyes, Damien Gratadour, Fabrice Vidal
    Abstract:

    The European Extremely Large Telescope project (E-ELT) is one of Europe's highest priorities in ground-based astronomy. ELTs are built on top of a variety of highly sensitive and critical astronomical instruments. In particular, a new instrument called MOSAIC has been proposed to perform multi-object spectroscopy using the Multi-Object Adaptive Optics (MOAO) technique. The core implementation of the simulation lies in the intensive computation of a tomographic reconstruct or (TR), which is used to drive the deformable mirror in real time from the measurements. A new Numerical algorithm is proposed (1) to capture the actual experimental noise and (2) to substantially speed up previous implementations by exposing more concurrency, while reducing the number of floating-point operations. Based on the Matrices Over Runtime System at Exascale Numerical Library (MORSE), a dynamic scheduler drives all computational stages of the tomographic reconstruct or simulation and allows to pipeline and to run tasks out-of order across different stages on heterogeneous systems, while ensuring data coherency and dependencies. The proposed TR simulation outperforms asymptotically previous state-of-the-art implementations up to 13-fold speedup. At more than 50000 unknowns, this appears to be the largest-scale AO problem submitted to computation, to date, and opens new research directions for extreme scale AO simulations.

  • pipelining computational stages of the tomographic reconstructor for multi object adaptive optics on a multi gpu system
    IEEE International Conference on High Performance Computing Data and Analytics, 2014
    Co-Authors: Hatem Ltaief, Arnaud Sevin, Ahmad Abdelfattah, Carine Morel, Eric Gendron, David Elliot Keyes, Damien Gratadour, Fabrice Vidal
    Abstract:

    The European Extremely Large Telescope project (E-ELT) is one of Europe's highest priorities in ground-based astronomy. ELTs are built on top of a variety of highly sensitive and critical astronomical instruments. In particular, a new instrument called MOSAIC has been proposed to perform multi-object spectroscopy using the Multi-Object Adaptive Optics (MOAO) technique. The core implementation of the simulation lies in the intensive computation of a tomographic reconstruct or (TR), which is used to drive the deformable mirror in real time from the measurements. A new Numerical algorithm is proposed (1) to capture the actual experimental noise and (2) to substantially speed up previous implementations by exposing more concurrency, while reducing the number of floating-point operations. Based on the Matrices Over Runtime System at Exascale Numerical Library (MORSE), a dynamic scheduler drives all computational stages of the tomographic reconstruct or simulation and allows to pipeline and to run tasks out-of order across different stages on heterogeneous systems, while ensuring data coherency and dependencies. The proposed TR simulation outperforms asymptotically previous state-of-the-art implementations up to 13-fold speedup. At more than 50000 unknowns, this appears to be the largest-scale AO problem submitted to computation, to date, and opens new research directions for extreme scale AO simulations.