Memory Programming

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 65361 Experts worldwide ranked by ideXlab platform

Martin Schulz - One of the best experts on this subject based on the ideXlab platform.

  • smile an integrated multi paradigm software infrastructure for sci based clusters
    Future Generation Computer Systems, 2003
    Co-Authors: Martin Schulz, Jie Tao, Carsten Trinitis, Wolfgang Karl
    Abstract:

    The availability of a comprehensive software infrastructure is essential for the success of a parallel architecture. In order to allow for the greatest possible flexibility, an infrastructure has to be designed in an integrated, easy-to-use manner and with the support of multiple parallel Programming paradigms and models to address a wide code base. Shared Memory in a LAN-like Environment (SMiLE) provides such an infrastructure for SCI (Scalable Coherent Interface) based clusters. It includes support for a large range of message passing libraries as well as for almost arbitrary shared Memory Programming models. In addition, SMiLE contains initial work on an appropriate tool set for performance optimization. The complete infrastructure is closely optimized for the underlying hardware and therefore offers its benefits to the user without significant overheads.

  • smile an integrated multi paradigm software infrastructure for sci based clusters
    Cluster Computing and the Grid, 2002
    Co-Authors: Martin Schulz, Jie Tao, Carsten Trinitis, Wolfgang Karl
    Abstract:

    The availability of a comprehensive software infrastructure is essential for the success a parallel architecture. In order to allow for the greatest possible flexibility, an infrastructure has to be designed in an integrated, easy-to-use manner and with the support of multiple Programming paradigms and models to address a wide base of codes. SMiLE provides such an infrastructure for SCI (Scalable Coherent Interface) based clusters. It includes support for both a large range of message passing libraries as well as for almost arbitrary shared Memory Programming models. In addition, SMiLE also contains initial work on appropriate tool sets for performance optimizations. The complete infrastructure is implemented in way that is as closely relate d to the underlying hardware and is therefore capable of exploiting the benefits of the underlying network fabric and offering them to the user without significant overheads.

  • multithreaded Programming of pc clusters
    International Conference on Parallel Architectures and Compilation Techniques, 2000
    Co-Authors: Martin Schulz
    Abstract:

    Modern operating systems offer comprehensive and flexible thread APIs that allow the efficient implementation of multithreaded applications. These APIs can, however, only be utilized within Symmetric Multiprocessors (SMPs), which have a very limited scalability. For larger systems, which are in the PC world mostly represented as clusters of SMPs, other paradigms like message passing or handcrafted hybrid systems have to be used. These approaches are generally more difficult to program and require, in contrast to pure shared Memory Programming models, major code changes compared to sequential codes. This work presents a system that extends the native thread APIs of Windows NT and Linux to clusters of PCs, providing both ease-of-use and scalability. It is based on an efficient hybrid hardware/software DSM solution that is responsible for the creation of a global virtual Memory abstraction. On top of this global Memory, multithreaded applications can be executed directly using the well known thread API calls. This drastically eases the use of clusters and opens cluster architectures to a whole range of new potential users and applications.

  • true shared Memory Programming on sci based clusters
    SCI: Scalable Coherent Interface Architecture and Software for High-Performance Compute Clusters, 1999
    Co-Authors: Martin Schulz
    Abstract:

    Due to their excellent price–performance ratio, clusters built of commodity off–the–shelf PCs and connected with low–latency network fabrics are becoming increasingly commonplace and are even starting to replace massively parallel systems. According to their loosely coupled architecture, they are traditionally programmed using the message passing paradigm. This trend was further supported by the wide availability of high-level message passing libraries like PVM [3] and MPI [20] and intensive research in low–latency, user–level communication architectures [8, 26, 34].

Wolfgang Karl - One of the best experts on this subject based on the ideXlab platform.

  • smile an integrated multi paradigm software infrastructure for sci based clusters
    Future Generation Computer Systems, 2003
    Co-Authors: Martin Schulz, Jie Tao, Carsten Trinitis, Wolfgang Karl
    Abstract:

    The availability of a comprehensive software infrastructure is essential for the success of a parallel architecture. In order to allow for the greatest possible flexibility, an infrastructure has to be designed in an integrated, easy-to-use manner and with the support of multiple parallel Programming paradigms and models to address a wide code base. Shared Memory in a LAN-like Environment (SMiLE) provides such an infrastructure for SCI (Scalable Coherent Interface) based clusters. It includes support for a large range of message passing libraries as well as for almost arbitrary shared Memory Programming models. In addition, SMiLE contains initial work on an appropriate tool set for performance optimization. The complete infrastructure is closely optimized for the underlying hardware and therefore offers its benefits to the user without significant overheads.

  • smile an integrated multi paradigm software infrastructure for sci based clusters
    Cluster Computing and the Grid, 2002
    Co-Authors: Martin Schulz, Jie Tao, Carsten Trinitis, Wolfgang Karl
    Abstract:

    The availability of a comprehensive software infrastructure is essential for the success a parallel architecture. In order to allow for the greatest possible flexibility, an infrastructure has to be designed in an integrated, easy-to-use manner and with the support of multiple Programming paradigms and models to address a wide base of codes. SMiLE provides such an infrastructure for SCI (Scalable Coherent Interface) based clusters. It includes support for both a large range of message passing libraries as well as for almost arbitrary shared Memory Programming models. In addition, SMiLE also contains initial work on appropriate tool sets for performance optimizations. The complete infrastructure is implemented in way that is as closely relate d to the underlying hardware and is therefore capable of exploiting the benefits of the underlying network fabric and offering them to the user without significant overheads.

Mats Brorsson - One of the best experts on this subject based on the ideXlab platform.

  • a fully compliant openmp implementation on software distributed shared Memory
    IEEE International Conference on High Performance Computing Data and Analytics, 2002
    Co-Authors: Sven Karlsson, Sungwoo Lee, Mats Brorsson
    Abstract:

    OpenMP is a relatively new industry standard for Programming parallel computers with a shared Memory Programming model. Given that clusters of workstations are a cost-effective solution for building parallel platforms, it would of course be highly interesting if the OpenMPmo del could be extended to these systems as well as to the standard shared Memory architectures for which it was originally intended.

  • a fully compliant openmp implementation on software distributed shared Memory
    Lecture Notes in Computer Science, 2002
    Co-Authors: Sven Karlsson, Sungwoo Lee, Mats Brorsson
    Abstract:

    OpenMP is a relatively new industry standard for Programming parallel computers with a shared Memory Programming model. Given that clusters of workstations are a cost-effective solution for building parallel platforms, it would of course be highly interesting if the OpenMP model could be extended to these systems as well as to the standard shared Memory architectures for which it was originally intended. We present in this paper a fully compliant implementation of the OpenMP specification 1.0 for C targeting networks of workstations. We have used an experimental software distributed shared Memory system called Coherent Virtual Machine to implement a run-time library which is the target of a source-to-source OpenMP translator also developed in this project. The system has been evaluated using an OpenMP micro-benchmark suite as to evaluate the effect of some Memory coherence protocol improvements. We have also used OpenMP versions of three Splash-2 applications concluding in reasonable speedups on an IBM SP2 machine. This also is the first study to investigate the subtle mechanisms of consistency in OpenMP on software distributed shared Memory systems.

Sarita V Adve - One of the best experts on this subject based on the ideXlab platform.

  • denovond efficient hardware for disciplined nondeterminism
    IEEE Micro, 2014
    Co-Authors: Hyojin Sung, Rakesh Komuravelli, Sarita V Adve
    Abstract:

    Recent research in disciplined shared-Memory Programming models presents a unique opportunity for rethinking the multicore Memory hierarchy for better efficiency in terms of complexity, performance, and energy. The DeNovo hardware system showed that for deterministic programs written using such disciplined models, hardware can be much more efficient than the current state of the art. For DeNovo to be adopted by commercial systems, however, it is necessary to extend it to support nondeterministic applications as well; for example, applications using lock synchronization. This article proposes DeNovoND, a system that provides support for disciplined nondeterministic codes with locks while retaining the simplicity, performance, and energy benefits of DeNovo. The authors designed and implemented simple Memory consistency semantics for safe nondeterminism using distributed queue-based locks and access signatures. The resulting protocol avoids transient states, invalidation traffic, directory sharer-lists, and false sharing, which are all significant sources of inefficiency in existing protocols. Their experiments showed that DeNovoND provides comparable or better execution time for applications designed for lock synchronization. In addition, it incurs 33 percent less network traffic on average relative to a state-of-the-art invalidation-based protocol, which directly translates into energy savings.

  • denovond efficient hardware support for disciplined non determinism
    Architectural Support for Programming Languages and Operating Systems, 2013
    Co-Authors: Hyojin Sung, Rakesh Komuravelli, Sarita V Adve
    Abstract:

    Recent work has shown that disciplined shared-Memory Programming models that provide deterministic-by-default semantics can simplify both parallel software and hardware. Specifically, the DeNovo hardware system has shown that the software guarantees of such models (e.g., data-race-freedom and explicit side-effects) can enable simpler, higher performance, and more energy-efficient hardware than the current state-of-the-art for deterministic programs. Many applications, however, contain non-deterministic parts; e.g., using lock synchronization. For commercial hardware to exploit the benefits of DeNovo, it is therefore necessary to extend DeNovo to support non-deterministic applications. This paper proposes DeNovoND, a system that supports lock-based, disciplined non-determinism, with the simplicity, performance, and energy benefits of DeNovo. We use a combination of distributed queue-based locks and access signatures to implement simple Memory consistency semantics for safe non-determinism, with a coherence protocol that does not require transient states, invalidation traffic, or directories, and does not incur false sharing. The resulting system is simpler, shows comparable or better execution time, and has 33% less network traffic on average (translating directly into energy savings) relative to a state-of-the-art invalidation-based protocol for 8 applications designed for lock synchronization.

  • denovo rethinking the Memory hierarchy for disciplined parallelism
    International Conference on Parallel Architectures and Compilation Techniques, 2011
    Co-Authors: Byn Choi, Rakesh Komuravelli, Hyojin Sung, Robert Smolinski, Nima Honarmand, Sarita V Adve, Vikram Adve, Nicholas P Carter, Chingtsun Chou
    Abstract:

    For parallelism to become tractable for mass programmers, shared-Memory languages and environments must evolve to enforce disciplined practices that ban "wild shared-Memory behaviors;'' e.g., unstructured parallelism, arbitrary data races, and ubiquitous non-determinism. This software evolution is a rare opportunity for hardware designers to rethink hardware from the ground up to exploit opportunities exposed by such disciplined software models. Such a co-designed effort is more likely to achieve many-core scalability than a software-oblivious hardware evolution. This paper presents DeNovo, a hardware architecture motivated by these observations. We show how a disciplined parallel Programming model greatly simplifies cache coherence and consistency, while enabling a more efficient communication and cache architecture. The DeNovo coherence protocol is simple because it eliminates transient states -- verification using model checking shows 15X fewer reachable states than a state-of-the-art implementation of the conventional MESI protocol. The DeNovo protocol is also more extensible. Adding two sophisticated optimizations, flexible communication granularity and direct cache-to-cache transfers, did not introduce additional protocol states (unlike MESI). Finally, DeNovo shows better cache hit rates and network traffic, translating to better performance and energy. Overall, a disciplined shared-Memory Programming model allows DeNovo to seamlessly integrate message passing-like interactions within a global address space for improved design complexity, performance, and efficiency.

Barbara Chapman - One of the best experts on this subject based on the ideXlab platform.

  • open source software support for the openmp runtime api for profiling
    International Conference on Parallel Processing, 2009
    Co-Authors: Oscar R Hernandez, Barbara Chapman, Ramachandra Nanjegowda, Van Bui, Richard Kufrin
    Abstract:

    OpenMP is a defacto standard API for shared Memory Programming with widespread vendor support and a large user base. The OpenMP Architecture Review Board has sanctioned an interface specification known as the ”OpenMP Runtime API for Profiling” to enable tools to collect performance data for OpenMP programs. This paper describes the interface and our experiences implementing it in OpenUH, an open source OpenMP compiler.

  • evolving openmp in an age of extreme parallelism 5th international workshop on openmp iwomp 2009 dresden germany june 3 5 2009 proceedings
    2009
    Co-Authors: Matthias S Muller, Bronis R De Supinski, Barbara Chapman
    Abstract:

    Fifth International Workshop on OpenMP IWOMP 2009.- Parallel Simulation of Bevel Gear Cutting Processes with OpenMP Tasks.- Evaluation of Multicore Processors for Embedded Systems by Parallel Benchmark Program Using OpenMP.- Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore.- Scalability Evaluation of Barrier Algorithms for OpenMP.- Use of Cluster OpenMP with the Gaussian Quantum Chemistry Code: A Preliminary Performance Analysis.- Evaluating OpenMP 3.0 Run Time Systems on Unbalanced Task Graphs.- Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective.- Scalability of Gaussian 03 on SGI Altix: The Importance of Data Locality on CC-NUMA Architecture.- Providing Observability for OpenMP 3.0 Applications.- A Microbenchmark Suite for Mixed-Mode OpenMP/MPI.- Performance Profiling for OpenMP Tasks.- Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP.- A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures.- Identifying Inter-task Communication in Shared Memory Programming Models.

  • strategies and implementation for translating openmp code for clusters
    High Performance Computing and Communications, 2007
    Co-Authors: Deepak Eachempati, Lei Huang, Barbara Chapman
    Abstract:

    OpenMP is a portable shared Memory Programming interface that promises high programmer productivity for multithreaded applications. It is designed for small and middle sized shared Memory systems. We have developed strategies to extend OpenMP to clusters via compiler translation to a Global Arrays program. In this paper, we describe our implementation of the translation in the Open64 compiler, and we focus on the strategies to improve sequential region translations. Our work is based upon the open source Open64 compiler suite for C, C++, and Fortran90/95.

  • vienna fortran a fortran language extension for distributed Memory multiprocessors
    Parallel Computing, 1992
    Co-Authors: Barbara Chapman, Piyush Mehrotra, Hans P Zima
    Abstract:

    Abstract Exploiting the full performance potential of distributed Memory machines requires a careful distribution of data across the processors. Vienna Fortran is a language extension of Fortran which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna Fortran are written using global data references. Thus, the user has the advantages of a shared Memory Programming paradigm while explicitly controlling the placement of data. In this paper, we present the basic features of Vienna Fortran along with a set of examples illustrating the use of these features.

  • Programming in vienna fortran
    Scientific Programming, 1992
    Co-Authors: Barbara Chapman, Piyush Mehrotra, Hans P Zima
    Abstract:

    Exploiting the full performance potential of distributed Memory machines requires a careful distribution of data across the processors. Vienna Fortran is a language extension of Fortran which provides the user with a wide range of facilities for such mapping of data structures. In contrast to current Programming practice, programs in Vienna Fortran are written using global data references. Thus, the user has the advantages of a shared Memory Programming paradigm while explicitly controlling the data distribution. In this paper, we present the language features of Vienna Fortran for FORTRAN 77, together with examples illustrating the use of these features.