Multiprocessor

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 49173 Experts worldwide ranked by ideXlab platform

Sanjoy Baruah - One of the best experts on this subject based on the ideXlab platform.

  • Schedulability analysis of global edf
    2008
    Co-Authors: Sanjoy Baruah, Theodore Baker
    Abstract:

    The Multiprocessor edf scheduling of sporadic task systems is studied. A new sufficient schedulability test is presented and proved correct. It is shown that this test generalizes the previously-known exact uniprocessor edf -schedulability test, and that it offers non-trivial quantitative guarantees (including a resource augmentation bound) on Multiprocessors.

  • robustness results concerning edf scheduling upon uniform Multiprocessors
    2003
    Co-Authors: Sanjoy Baruah, Shelby Funk, Joël Goossens
    Abstract:

    Each processor in a uniform Multiprocessor machine is characterized by a speed or computing capacity, with the interpretation that a job executing on a processor with speed s for t time units completes (s /spl times/ t) units of execution. The earliest deadline first (EDF) scheduling of hard-real-time systems upon uniform Multiprocessor machines is considered. It is known that online algorithms tend to perform very poorly in scheduling such hard-real-time systems on Multiprocessors; resource-augmentation techniques are presented here that permit online algorithms in general (EDF in particular) to perform better than may be expected given these inherent limitations. It is shown that EDF scheduling upon uniform Multiprocessors is robust with respect to both job execution requirements and processor computing capacity.

  • on line scheduling on uniform Multiprocessors
    2001
    Co-Authors: Shelby Funk, Joël Goossens, Sanjoy Baruah
    Abstract:

    Each processor in a uniform Multiprocessor machine is characterized by a speed or computing capacity, with the interpretation that a job executing on a processor with speed s for t time units completes (s/spl times/t) units of execution. The on-line scheduling of hard-real-time systems, in which all jobs must complete by specified deadlines, on uniform Multiprocessor machines is considered It is known that online algorithms tend to perform very poorly in scheduling such hard-real-time systems on Multiprocessors; resource-augmentation techniques are presented here that permit online algorithms to perform better than may be expected given the inherent limitations. Results derived here are applied to the scheduling of periodic task systems on uniform Multiprocessor machines.

Mendel Rosenblum - One of the best experts on this subject based on the ideXlab platform.

  • disco running commodity operating systems on scalable Multiprocessors
    1997
    Co-Authors: Edouard Bugnion, Scott W Devine, Kinshuk Govil, Mendel Rosenblum
    Abstract:

    In this article we examine the problem of extending modern operating systems to run efficiently on large-scale shared-memory Multiprocessors without a large implementation effort. Our approach brings back an idea popular in the 1970s: virtual machine monitors. We use virtual machines to run multiple commodity operating systems on a scalable Multiprocessor. This solution addresses many of the challenges facing the system software for these machines. We demonstrate our approach with a prototype called Disco that runs multiple copies of Silicon Graphics' IRIX operating system on a Multiprocessor. Our experience shows that the overheads of the monitor are small and that the approach provides scalability as well as the ability to deal with the nonuniform memory access time of these systems. To reduce the memory overheads associated with running multiple operating systems, virtual machines transparently share major data structures such as the program code and the file system buffer cache. We use the distributed-system support of modern operating systems to export a partial single system image to the users. The overall solution achieves most of the benefits of operating systems customized for scalable Multiprocessors, yet it can be achieved with a significantly smaller implementation effort.

  • disco running commodity operating systems on scalable Multiprocessors
    1997
    Co-Authors: Edouard Bugnion, Scott W Devine, Mendel Rosenblum
    Abstract:

    In this paper we examine the problem of extending modern operating systems to run efficiently on large-scale shared memory Multiprocessors without a large implementation effort. Our approach brings back an idea popular in the 1970s, virtual machine monitors. We use virtual machines to run multiple commodity operating systems on a scalable Multiprocessor. This solution addresses many of the challenges facing the system software for these machines. We demonstrate our approach with a prototype called Disco that can run multiple copies of Silicon Graphics' IRIX operating system on a Multiprocessor. Our experience shows that the overheads of the monitor are small and that the approach provides scalability as well as the ability to deal with the non-uniform memory access time of these systems. To reduce the memory overheads associated with running multiple operating systems, we have developed techniques where the virtual machines transparently share major data structures such as the program code and the file system buffer cache. We use the distributed system support of modern operating systems to export a partial single system image to the users. The overall solution achieves most of the benefits of operating systems customized for scalable Multiprocessors yet it can be achieved with a significantly smaller implementation effort.

  • memory system performance of unix on cc numa Multiprocessors
    1995
    Co-Authors: John Chapin, Mendel Rosenblum, A Herrod, Anoop Gupta
    Abstract:

    This study characterizes the performance of a variant of UNIX SVR4 on a large shared-memory Multiprocessor and analyzes the effects of possible OS and architectural changes. We use a nonintrusive cache miss monitor to trace the execution of an OS-intensive multiprogrammed workload on the Stanford DASH, a 32-CPU CC-NUMA Multiprocessor (CC-NUMA Multiprocessors have cache-coherent shared memory that is physically distributed across the machine). We find that our version of UNIX accounts for 24% of the workload's total execution time. A surprisingly large fraction of OS time (79%) is spent on memory system stalls, divided equally between instruction and data cache miss time. In analyzing techniques to reduce instruction cache miss stall time, we find that replication of only 7% of the OS code would allow 80% of instruction cache misses to be serviced locally on a CC-NUMA machine. For data cache misses, we find that a small number of routines account for 96% of OS data cache stall time. We find that most of these misses are coherence (communication) misses, and larger caches will not necessarily help. After presenting detailed performance data, we analyze the benefits of several OS changes and predict the effects of altering the cache configuration, degree of clustering, and cache coherence mechanism of the machine. (This paper is available via http://wwwflash.stanford.edu.)

Anant Agarwal - One of the best experts on this subject based on the ideXlab platform.

  • automatic partitioning of parallel loops and data arrays for distributed shared memory Multiprocessors
    1995
    Co-Authors: Anant Agarwal, D A Kranz, V Natarajan
    Abstract:

    Presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory Multiprocessors. While several previous papers have looked at hyperplane partitioning of iteration spaces to reduce communication traffic, the problem of deriving the optimal tiling parameters for minimal communication in loops with general affine index expressions has remained open. Our paper solves this open problem by presenting a method for deriving an optimal hyperparallelepiped tiling of iteration spaces for minimal communication in Multiprocessors with caches. We show that the same theoretical framework can also be used to determine optimal tiling parameters for both data and loop partitioning in distributed memory multicomputers. Our framework uses matrices to represent iteration and data space mappings and the notion of uniformly intersecting references to capture temporal locality in array references. We introduce the notion of data footprints to estimate the communication traffic between processors and use linear algebraic methods and lattice theory to compute precisely the size of data footprints. We have implemented this framework in a compiler for Alewife, a distributed shared-memory Multiprocessor. >

  • limitless directories a scalable cache coherence scheme
    1991
    Co-Authors: David Chaiken, John Kubiatowicz, Anant Agarwal
    Abstract:

    Caches enhance the performance of Multiprocessors by reducing network traffic and average memory access latency. However, cache-based systems must address the problem of cache coherence. We propose the LimitLESS directory protocol to solve this problem. The LimitLESS scheme uses a combination of hardware and software techniques to realize the performance of a full-map directory with the memory overhead of a limited directory. This protocol is supported by Alewife, a large-scale Multiprocessor. We describe the architectural interfaces needed to implement the LimitLESS directory, and evaluate its performance through simulations of the Alewife machine.

Yi Zhang - One of the best experts on this subject based on the ideXlab platform.

  • a simple fast and scalable non blocking concurrent fifo queue for shared memory Multiprocessor systems
    2001
    Co-Authors: Philippas Tsigas, Yi Zhang
    Abstract:

    A non-blocking FIFO queue algorithm for Multiprocessor shared memory systems is presented in this paper. The algorithm is very simple, fast and scales very well in both symmetric and non-symmetric Multiprocessor shared memory systems. Experiments on a 64-node SUN Enterprise 10000 — a symmetric Multiprocessorsystem — and on a 64-node SGI Origin 2000 — a cache coherent non uniform memory access Multiprocessorsystem — indicate that our algorithm considerably outperforms the best of the known alternatives in both Multiprocessors in any level of multiprogramming. This work introduces two new, simple algorithmic mechanisms. The first lowers the contention to key variables used by the concurrent enqueue and/or dequeue operations which consequently results in the good performance of the algorithm, the second deals with the pointer recycling problem, an inconsistency problem that all non-blocking algorithms based on the compare-and-swap synchronisation primitive have to address. In our construction we selected to use compare-and-swap since compare-and-swap is an atomic primitive that scales well under contention and either is supported by modern Multiprocessors or can be implemented efficiently on them.

Karin Petersen - One of the best experts on this subject based on the ideXlab platform.

  • cache coherence for shared memory Multiprocessors based on virtual memory support
    1993
    Co-Authors: Karin Petersen
    Abstract:

    This paper presents a software cache coherence scheme that uses virtual memory (VM) support to maintain cache coherency for shared memory Multiprocessors. Traditional VM translation hardware in each processor is used to detect memory access attempts that would violate cache coherence and system software is used to enforce coherence. The implementation of this class of coherence schemes is very economical: it requires neither special Multiprocessor hardware nor compiler support, and easily incorporates different consistency models. The authors evaluated two consistency models for the VM-based approach: sequential consistency and lazy release consistency. The VM-based schemes are compared with a bus based snoopy caching architecture, and the authors' trace-driven simulation results show that the VM-based cache coherence schemes are practical for small-scale, shared memory Multiprocessors. >