Parallel Software

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 109434 Experts worldwide ranked by ideXlab platform

Wolfgang Rosenstiel - One of the best experts on this subject based on the ideXlab platform.

  • fast and accurate source level simulation of Software timing considering complex code optimizations
    Design Automation Conference, 2011
    Co-Authors: Stefan Stattelmann, Oliver Bringmann, Wolfgang Rosenstiel
    Abstract:

    This paper presents an approach for accurately estimating the execution time of Parallel Software components in complex embedded systems. Timing annotations obtained from highly optimized binary code are added to the source code of Software components which is then integrated into a SystemC transaction-level simulation. This approach allows a fast evaluation of Software execution times while being as accurate as conventional instruction set simulators. By simulating binary-level control flow in Parallel to the original functionality of the Software, even compiler optimizations heavily modifying the structure of the generated code can be modeled accurately. Experimental results show that the presented method produces timing estimates within the same level of accuracy as an established commercial tool for cycle-accurate instruction set simulation while being at least 20 times faster.

  • fast and accurate source level simulation of Software timing considering complex code optimizations
    Design Automation Conference, 2011
    Co-Authors: Stefan Stattelmann, Oliver Bringmann, Wolfgang Rosenstiel
    Abstract:

    This paper presents an approach for accurately estimating the execution time of Parallel Software components in complex embedded systems. Timing annotations obtained from highly optimized binary code are added to the source code of Software components which is then integrated into a SystemC transaction-level simulation. This approach allows a fast evaluation of Software execution times while being as accurate as conventional instruction set simulators. By simulating binary-level control flow in Parallel to the original functionality of the Software, even compiler optimizations heavily modifying the structure of the generated code can be modeled accurately. Experimental results show that the presented method produces timing estimates within the same level of accuracy as an established commercial tool for cycle-accurate instruction set simulation while being at least 20 times faster.

Markus Steinberger - One of the best experts on this subject based on the ideXlab platform.

  • on the fly vertex reuse for massively Parallel Software geometry processing
    Proceedings of the ACM on Computer Graphics and Interactive Techniques, 2018
    Co-Authors: Michael Kenzel, Bernhard Kerbl, Wolfgang Tatzgern, Elena Ivanchenko, Dieter Schmalstieg, Markus Steinberger
    Abstract:

    Due to its flexibility, compute mode is becoming more and more attractive as a way to implement many of the algorithms part of a state-of-the-art rendering pipeline. A key problem commonly encountered in graphics applications is streaming vertex and geometry processing. In a typical triangle mesh, the same vertex is on average referenced six times. To avoid redundant computation during rendering, a post-transform cache is traditionally employed to reuse vertex processing results. However, such a vertex cache can generally not be implemented efficiently in Software and does not scale well as Parallelism increases. We explore alternative strategies for reusing per-vertex results on-the-fly during massively-Parallel Software geometry processing. Given an input stream divided into batches, we analyze the effectiveness of sorting, hashing, and intra-thread-group communication for identifying and exploiting local reuse potential. We design and present four vertex reuse strategies tailored to modern GPU architectures. We demonstrate that, in a variety of applications, these strategies not only achieve effective reuse of vertex processing results, but can boost performance by up to 2-3x compared to a naive approach. Curiously, our experiments also show that our batch-based approaches exhibit behavior similar to the OpenGL implementation on current graphics hardware.

  • on the fly vertex reuse for massively Parallel Software geometry processing
    arXiv: Graphics, 2018
    Co-Authors: Michael Kenzel, Bernhard Kerbl, Wolfgang Tatzgern, Elena Ivanchenko, Dieter Schmalstieg, Markus Steinberger
    Abstract:

    Compute-mode rendering is becoming more and more attractive for non-standard rendering applications, due to the high flexibility of compute-mode execution. These newly designed pipelines often include streaming vertex and geometry processing stages. In typical triangle meshes, the same transformed vertex is on average required six times during rendering. To avoid redundant computation, a post-transform cache is traditionally suggested to enable reuse of vertex processing results. However, traditional caching neither scales well as the hardware becomes more Parallel, nor can be efficiently implemented in a Software design. We investigate alternative strategies to reusing vertex shading results on-the-fly for massively Parallel Software geometry processing. Forming static and dynamic batching on the data input stream, we analyze the effectiveness of identifying potential local reuse based on sorting, hashing, and efficient intra-thread-group communication. Altogether, we present four vertex reuse strategies, tailored to modern Parallel architectures. Our simulations showcase that our batch-based strategies significantly outperform Parallel caches in terms of reuse. On actual GPU hardware, our evaluation shows that our strategies not only lead to good reuse of processing results, but also boost performance by $2-3\times$ compared to naively ignoring reuse in a variety of practical applications.

Gaetano Zanghirati - One of the best experts on this subject based on the ideXlab platform.

  • Parallel Software for training large scale support vector machines on multiprocessor systems
    Journal of Machine Learning Research, 2006
    Co-Authors: Luca Zanni, Thomas Serafini, Gaetano Zanghirati
    Abstract:

    Parallel Software for solving the quadratic program arising in training support vector machines for classification problems is introduced. The Software implements an iterative decomposition technique and exploits both the storage and the computing resources available on multiprocessor systems, by distributing the heaviest computational tasks of each decomposition iteration. Based on a wide range of recent theoretical advances, relevant decomposition issues, such as the quadratic subproblem solution, the gradient updating, the working set selection, are systematically described and their careful combination to get an effective Parallel tool is discussed. A comparison with state-of-the-art packages on benchmark problems demonstrates the good accuracy and the remarkable time saving achieved by the proposed Software. Furthermore, challenging experiments on real-world data sets with millions training samples highlight how the Software makes large scale standard nonlinear support vector machines effectively tractable on common multiprocessor systems. This feature is not shown by any of the available codes.

Jaswinder Pal Singh - One of the best experts on this subject based on the ideXlab platform.

  • real time Parallel mpeg 2 decoding in Software
    International Parallel Processing Symposium, 1997
    Co-Authors: Angelos Bilas, Jason E Fritts, Jaswinder Pal Singh
    Abstract:

    The growing demand for high quality compressed video has led to an increasing need for real-time MPEG decoding at greater resolutions and picture sizes. With the widespread availability of small-scale multiprocessors, a Parallel Software implementation may provide an effective solution to the decoding problem. We present a Parallel decoder for the MPEG standard, implemented on a shared memory multiprocessor. Goal of this work is to provide an all-Software solution for real-time, high-quality video decoding and to investigate the important properties of this application as they pertain to multiprocessor systems. Both coarse and fine grained implementations are considered for Parallelizing the decoder. The coarse-grained approach exploits Parallelism at the group of pictures level, while the fine-grained approach Parallelizes within pictures, at the slice level. A comparative evaluation of these methods is made, with results presented in terms of speedup, memory requirements, load balance, synchronization time, and temporal and spatial locality. Both methods demonstrate very good speedups and locality properties.

Stefan Stattelmann - One of the best experts on this subject based on the ideXlab platform.

  • fast and accurate source level simulation of Software timing considering complex code optimizations
    Design Automation Conference, 2011
    Co-Authors: Stefan Stattelmann, Oliver Bringmann, Wolfgang Rosenstiel
    Abstract:

    This paper presents an approach for accurately estimating the execution time of Parallel Software components in complex embedded systems. Timing annotations obtained from highly optimized binary code are added to the source code of Software components which is then integrated into a SystemC transaction-level simulation. This approach allows a fast evaluation of Software execution times while being as accurate as conventional instruction set simulators. By simulating binary-level control flow in Parallel to the original functionality of the Software, even compiler optimizations heavily modifying the structure of the generated code can be modeled accurately. Experimental results show that the presented method produces timing estimates within the same level of accuracy as an established commercial tool for cycle-accurate instruction set simulation while being at least 20 times faster.

  • fast and accurate source level simulation of Software timing considering complex code optimizations
    Design Automation Conference, 2011
    Co-Authors: Stefan Stattelmann, Oliver Bringmann, Wolfgang Rosenstiel
    Abstract:

    This paper presents an approach for accurately estimating the execution time of Parallel Software components in complex embedded systems. Timing annotations obtained from highly optimized binary code are added to the source code of Software components which is then integrated into a SystemC transaction-level simulation. This approach allows a fast evaluation of Software execution times while being as accurate as conventional instruction set simulators. By simulating binary-level control flow in Parallel to the original functionality of the Software, even compiler optimizations heavily modifying the structure of the generated code can be modeled accurately. Experimental results show that the presented method produces timing estimates within the same level of accuracy as an established commercial tool for cycle-accurate instruction set simulation while being at least 20 times faster.