Pipelining

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 534933 Experts worldwide ranked by ideXlab platform

Stephen J Allan - One of the best experts on this subject based on the ideXlab platform.

  • software Pipelining
    ACM Computing Surveys, 1995
    Co-Authors: Vicki H Allan, Reese B Jones, Stephen J Allan
    Abstract:

    Utilizing parallelism at the instruction level is an important way to improve performance. Because the time spent in loop execution dominates total execution time, a large body of optimizations focuses on decreasing the time to execute each iteration. Software Pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism. Let { ABC } n represent a loop containing operations A, B, C that is executed n times. Although the operations of a single iteration can be parallelized, more parallelism may be achieved if the entire loop is considered rather than a single iteration. The software Pipelining transformation utilizes the fact that a loop { ABC } n is equivalent to A { BCA } n −1 BC . Although the operations contained in the loop do not change, the operations are from different iterations of the original loop. Various algorithms for software Pipelining exist. A comparison of the alternative methods for software Pipelining is presented. The relationships between the methods are explored and possibilities for improvement highlighted.

Zhizhong Tang - One of the best experts on this subject based on the ideXlab platform.

  • overcoming static register pressure for software Pipelining in the itanium architecture
    Lecture Notes in Computer Science, 2003
    Co-Authors: Wenlong Li, Zhizhong Tang
    Abstract:

    Software Pipelining techniques have been shown to significantly improve the performance of loop-intensive scientific programs. The Itanium architecture contains many features to enhance parallel execution, such as support for efficient software Pipelining of loops. The drawback of software Pipelining is the high register requirements, which may lead to software Pipelining failure due to limited static general registers in Itanium. This paper evaluates the register requirements of software-pipelined loops. It then presents a novel register allocation scheme, which allocates stacked registers to serve as static registers. Experimental results show that this method gains an average 2.4% improvement, and a peak 18% improvement in performance on NAS Benchmarks.

  • APPT - Overcoming Static Register Pressure for Software Pipelining in the Itanium Architecture
    Lecture Notes in Computer Science, 2003
    Co-Authors: Wenlong Li, Zhizhong Tang
    Abstract:

    Software Pipelining techniques have been shown to significantly improve the performance of loop-intensive scientific programs. The Itanium architecture contains many features to enhance parallel execution, such as support for efficient software Pipelining of loops. The drawback of software Pipelining is the high register requirements, which may lead to software Pipelining failure due to limited static general registers in Itanium. This paper evaluates the register requirements of software-pipelined loops. It then presents a novel register allocation scheme, which allocates stacked registers to serve as static registers. Experimental results show that this method gains an average 2.4% improvement, and a peak 18% improvement in performance on NAS Benchmarks.

  • An improvement on data dependence analysis supporting software Pipelining technique
    Proceedings. Advances in Parallel and Distributed Computing, 1997
    Co-Authors: Chihong Zhang, Zhizhong Tang
    Abstract:

    The accuracy of the data dependence analysis of a client program will decide in what an extent the compiler can unleash the power of the potential parallelism of the client program. Most of the current works on dependence analysis are based on the dependence equation and constraint inequalities of loop variable bounds (sometimes augmented with the direction vector). Unfortunately, they can not give an exact detection on the dependence which may greatly affect the parallel optimization of the client program when software Pipelining technique is employed. In the paper, we give a more effective constraint inequality which could reflect the characteristics of software Pipelining technique and will improve the power of dependence analysis of most of the current algorithms when applied to software Pipelining.

Krste Asanovic - One of the best experts on this subject based on the ideXlab platform.

  • power optimal Pipelining in deep submicron technology
    International Symposium on Low Power Electronics and Design, 2004
    Co-Authors: Krste Asanovic
    Abstract:

    This paper explores the effectiveness of Pipelining as a power saving tool, where the reduction in logic depth per stage is used to reduce supply voltage at a fixed clock frequency. We examine power-optimal Pipelining in deep submicron technology, both analytically and by simulation. Simulation uses a 70 nm predictive process with a fanout-of-four inverter chain model including input/output flip-flops, and results are shown to match theory well. The simulation results show that power-optimal logic depth is 6 to 8 FO4 and optimal power saving varies from 55 to 80% compared to a 24 FO4 logic depth, depending on threshold voltage, activity factor, and presence of clock-gating. We decompose the power consumption of a circuit into three components, switching power, leakage power, and idle power, and present the following insights into power-optimal Pipelining. First, power-optimal logic depth decreases and optimal power savings increase for larger activity factors, where switching power dominates over leakage and idle power. Second, Pipelining is more effective with lower threshold voltages at high activity factors, but higher threshold voltages give better results at lower activity factors where leakage current dominates. Lastly, clock-gating enables deeper Pipelining and more power saving because it reduces timing element overhead when the activity factor is low.

T. Mudge - One of the best experts on this subject based on the ideXlab platform.

  • Total power-optimal Pipelining and parallel processing under process variations in nanometer technology
    ICCAD-2005. IEEE ACM International Conference on Computer-Aided Design 2005., 2005
    Co-Authors: Taeho Kgil, K. Bowman, V. De, T. Mudge
    Abstract:

    This paper explores the effectiveness of the simultaneous application of Pipelining and parallel processing as a total power (static plus dynamic) reduction technique in digital systems. Previous studies have been limited to either Pipelining or parallel processing, but both techniques can be used together to reduce supply voltage at a fixed throughput point. According to our first-order analyses, there exist optimal combinations of Pipelining depth and parallel processing width to minimize total power consumption. We show that the leakage power from both subthreshold and gate-oxide tunneling plays a significant role in determining the optimal combination of Pipelining depth and parallel processing width. Our experiments are conducted with timing information derived from a 65nm technology and fanout-of-four (FO4) inverter chains. The experiments show that the optimal combinations of both Pipelining and parallel processing - 8 /spl sim/ 12 /spl times/ FO4 logic depth Pipelining with 2 /spl sim/ 3-wide parallel processing - can reduce the total power by as much as 40% compared to an optimal system using only Pipelining or parallel processing alone. We extend our study to show how process parameter variations - an increasingly important factor in nanometer technologies - affects these results. Our analyses reveal that the variations shift the optimal points to shallower Pipelining and narrower parallel processing - 12 /spl times/ FO4 logic depth with 2-wide parallel processing - at a fixed yield point.

Vicki H Allan - One of the best experts on this subject based on the ideXlab platform.

  • software Pipelining
    ACM Computing Surveys, 1995
    Co-Authors: Vicki H Allan, Reese B Jones, Stephen J Allan
    Abstract:

    Utilizing parallelism at the instruction level is an important way to improve performance. Because the time spent in loop execution dominates total execution time, a large body of optimizations focuses on decreasing the time to execute each iteration. Software Pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism. Let { ABC } n represent a loop containing operations A, B, C that is executed n times. Although the operations of a single iteration can be parallelized, more parallelism may be achieved if the entire loop is considered rather than a single iteration. The software Pipelining transformation utilizes the fact that a loop { ABC } n is equivalent to A { BCA } n −1 BC . Although the operations contained in the loop do not change, the operations are from different iterations of the original loop. Various algorithms for software Pipelining exist. A comparison of the alternative methods for software Pipelining is presented. The relationships between the methods are explored and possibilities for improvement highlighted.