Register Allocation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 303 Experts worldwide ranked by ideXlab platform

Jens Palsberg - One of the best experts on this subject based on the ideXlab platform.

  • CC - SSA Elimination after Register Allocation
    Lecture Notes in Computer Science, 2009
    Co-Authors: Fernando Magno Quintão Pereira, Jens Palsberg
    Abstract:

    Compilers such as gcc use static-single-assignment (SSA) form as an intermediate representation and usually perform SSA elimination before Register Allocation. But the order could as well be the opposite: the recent approach of SSA-based Register Allocation performs SSA elimination after Register Allocation. SSA elimination before Register Allocation is straightforward and standard, while previously described approaches to SSA elimination after Register Allocation have shortcomings; in particular, they have problems with implementing copies between memory locations. We present spill-free SSA elimination , a simple and efficient algorithm for SSA elimination after Register Allocation that avoids increasing the number of spilled variables. We also present three optimizations of the core algorithm. Our experiments show that spill-free SSA elimination takes less than five percent of the total compilation time of a JIT compiler. Our optimizations reduce the number of memory accesses by more than 9% and improve the program execution time by more than 1.8%.

  • Register Allocation via coloring of chordal graphs
    Symposium on the Theory of Computing, 2007
    Co-Authors: Jens Palsberg
    Abstract:

    A revolution in Register Allocation is going on right now: compared to the state of the art, the new Register allocators are much simpler and generate code of equivalent quality. In 2005, Bouchez, Brisk et al., and Hack independently discovered that if a program is in static-single assignment (SSA) form, then a core Register Allocation problem can be solved in polynomial time. Many compilers and virtual machines represent programs in SSA form internally, including gcc version 4.0, Sun Microsystem's Java HotSpot Virtual Machine, and IBM's Java virtual machine Jikes RVM. A compiler can translate any program to SSA form in polynomial time, and amazingly thereby change the Register Allocation problem from NP-complete to polynomial time; the reason is that the transformation may change the need for Registers. Luckily, the program in SSA form needs a number of Registers which is the same or smaller than that of the original program. We have a win-win situation here: translating to SSA form changes an NP-complete problem to a simple polynomial-time problem and it decreases the need for Registers. The key insight is that a program in SSA form has a chordal interference graph. A greedy algorithm can optimally color a chordal graph in linear time in the number of edges. Completing the picture, Hack et al. have presented a new SSA-elimination algorithm that does not demand extra Registers. The combined result is a simple, optimal, polynomial-time algorithm for a core Register Allocation problem. The new SSA-elimination algorithm of Hack et al. is essential because classical SSA elimination leads to an NP-complete core Register Allocation problem, as shown by Palsberg and Pereira. Register Allocation with spilling for SSA form remains NP-complete. We can get most of the advantages of SSA form without actually transforming to SSA form: a vast majority of realistic benchmark programs have chordal interference graphs. For example, 95% of the methods in the Java 1.5 library have chordal interference graphs when compiled with the JoeQ compiler. Pereira and Palsberg have shown how to add simple heuristics for spilling and coalescing to the greedy coloring algorithm. The result is a simple and efficient Register Allocation algorithm which compares well with the iterated Register coalescing algorithm of George and Appel.

  • CATS - Register Allocation via coloring of chordal graphs
    2007
    Co-Authors: Jens Palsberg
    Abstract:

    A revolution in Register Allocation is going on right now: compared to the state of the art, the new Register allocators are much simpler and generate code of equivalent quality. In 2005, Bouchez, Brisk et al., and Hack independently discovered that if a program is in static-single assignment (SSA) form, then a core Register Allocation problem can be solved in polynomial time. Many compilers and virtual machines represent programs in SSA form internally, including gcc version 4.0, Sun Microsystem's Java HotSpot Virtual Machine, and IBM's Java virtual machine Jikes RVM. A compiler can translate any program to SSA form in polynomial time, and amazingly thereby change the Register Allocation problem from NP-complete to polynomial time; the reason is that the transformation may change the need for Registers. Luckily, the program in SSA form needs a number of Registers which is the same or smaller than that of the original program. We have a win-win situation here: translating to SSA form changes an NP-complete problem to a simple polynomial-time problem and it decreases the need for Registers. The key insight is that a program in SSA form has a chordal interference graph. A greedy algorithm can optimally color a chordal graph in linear time in the number of edges. Completing the picture, Hack et al. have presented a new SSA-elimination algorithm that does not demand extra Registers. The combined result is a simple, optimal, polynomial-time algorithm for a core Register Allocation problem. The new SSA-elimination algorithm of Hack et al. is essential because classical SSA elimination leads to an NP-complete core Register Allocation problem, as shown by Palsberg and Pereira. Register Allocation with spilling for SSA form remains NP-complete. We can get most of the advantages of SSA form without actually transforming to SSA form: a vast majority of realistic benchmark programs have chordal interference graphs. For example, 95% of the methods in the Java 1.5 library have chordal interference graphs when compiled with the JoeQ compiler. Pereira and Palsberg have shown how to add simple heuristics for spilling and coalescing to the greedy coloring algorithm. The result is a simple and efficient Register Allocation algorithm which compares well with the iterated Register coalescing algorithm of George and Appel.

  • Register Allocation after classical ssa elimination is np complete
    Foundations of Software Science and Computation Structure, 2006
    Co-Authors: Fernando Magno Quintão Pereira, Jens Palsberg
    Abstract:

    Chaitin proved that Register Allocation is equivalent to graph coloring and hence NP-complete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can we use the result of Bouchez et al. to do Register Allocation in polynomial time by first transforming the program to SSA form, then performing Register Allocation, and finally doing the classical SSA elimination that replaces φ-functions with copy instructions? In this paper we show that the answer is no, unless P = NP: Register Allocation after classical SSA elimination is NP-complete. Chaitin's proof technique does not work for programs after classical SSA elimination; instead we use a reduction from the graph coloring problem for circular arc graphs.

  • FoSSaCS - Register Allocation after classical SSA elimination is NP-Complete
    Lecture Notes in Computer Science, 2006
    Co-Authors: Fernando Magno Quintão Pereira, Jens Palsberg
    Abstract:

    Chaitin proved that Register Allocation is equivalent to graph coloring and hence NP-complete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can we use the result of Bouchez et al. to do Register Allocation in polynomial time by first transforming the program to SSA form, then performing Register Allocation, and finally doing the classical SSA elimination that replaces φ-functions with copy instructions? In this paper we show that the answer is no, unless P = NP: Register Allocation after classical SSA elimination is NP-complete. Chaitin's proof technique does not work for programs after classical SSA elimination; instead we use a reduction from the graph coloring problem for circular arc graphs.

Guang R. Gao - One of the best experts on this subject based on the ideXlab platform.

  • Register Allocation for software pipelined multidimensional loops
    ACM Transactions on Programming Languages and Systems, 2008
    Co-Authors: Hongbo Rong, Alban Douillet, Guang R. Gao
    Abstract:

    This article investigates Register Allocation for software pipelined multidimensional loops where the execution of successive iterations from an n-dimensional loop is overlapped. For single loop software pipelining, the lifetimes of a loop variable in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology [Rau 1992]) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multidimensional loop is considerably more complex and so are the vector lifetimes in it. In this article, we develop a way to normalize and represent the vector lifetimes, which captures their complexity, while exposing their regularity that enables a simple solution. The problem is formulated as bin-packing of the multidimensional vector lifetimes on the surface of a space-time cylinder. A metric, called distance, is calculated either conservatively or aggressively to guide the bin-packing process, so that there is no overlapping between any two vector lifetimes, and the Register requirement is minimized. This approach subsumes the classical Register Allocation for software pipelined single loops as a special case. The method has been implemented in the ORC compiler and produced code for the IA-64 architecture. Experimental results show the effectiveness. Several strategies for Register Allocation are compared and analyzed.

  • Register Allocation for software pipelined multi-dimensional loops
    2005
    Co-Authors: Hongbo Rong, Alban Douillet, Guang R. Gao
    Abstract:

    Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper investigates Register Allocation for software pipelined multi-dimensional loops. For single loop software pipelining, the lifetime instances of a loop variant in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau’s terminology) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multi-dimensional loop is considerably more complex, and so are the vector lifetimes in it. In this paper, we develop a way to normalize and represent vector lifetimes in multi-dimensional loop software pipelining, which capture their complexity, while exposing their regularity that enables us to develop a simple, yet powerful solution. Our algorithm is based on the development of a metric, called distance, that quantitatively determines the degree of potential overlapping (conflicts) between two vector lifetimes. We show how to calculate and use the distance, conservatively or aggressively, to guide the Register Allocation of the vector lifetimes under a bin-packing algorithm framework. The classical Register Allocation for software pipelined single loops is subsumed by our method as a special case. The method has been implemented in the ORC compiler and produced code for the Itanium architecture. We report the effectiveness of our method on 134 loop nests with 348 loop levels. Several strategies for Register Allocation are compared and analyzed

  • PLDI - Register Allocation for software pipelined multi-dimensional loops
    Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation - PLDI '05, 2005
    Co-Authors: Hongbo Rong, Alban Douillet, Guang R. Gao
    Abstract:

    Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper investigates Register Allocation for software pipelined multi-dimensional loops.For single loop software pipelining, the lifetime instances of a loop variant in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multi-dimensional loop is considerably more complex, and so are the vector lifetimes in it.In this paper, we develop a way to normalize and represent vector lifetimes in multi-dimensional loop software pipelining, which capture their complexity, while exposing their regularity that enables us to develop a simple, yet powerful solution. Our algorithm is based on the development of a metric, called distance, that quantitatively determines the degree of potential overlapping (conflicts) between two vector lifetimes. We show how to calculate and use the distance, conservatively or aggressively, to guide the Register Allocation of the vector lifetimes under a bin-packing algorithm framework. The classical Register Allocation for software pipelined single loops is subsumed by our method as a special case.The method has been implemented in the ORC compiler and produced code for the Itanium architecture. We report the effectiveness of our method on 134 loop nests with 348 loop levels. Several strategies for Register Allocation are compared and analyzed.

Hongbo Rong - One of the best experts on this subject based on the ideXlab platform.

  • MICRO - Tree Register Allocation
    Proceedings of the 42nd Annual IEEE ACM International Symposium on Microarchitecture - Micro-42, 2009
    Co-Authors: Hongbo Rong
    Abstract:

    This paper presents tree Register Allocation, which maps the lifetimes of the variables in a program into a set of trees, colors each tree in a greedy style, which is optimal when there is no spilling, and connects dataflow between and within the trees afterward. This approach generalizes and subsumes as special cases SSA-based, linear scan, and local Register Allocation. It keeps their simplicity and low throughput cost, and exposes a wide solution space beyond them. Its flexibility enables control flow structure and/or profile information to be better reflected in the trees. This approach has been prototyped in the Phoenix production compiler framework. Preliminary experiments suggest this is a promising direction with great potential. Register Allocation based on two special kinds of trees, extended basic blocks and the maximal spanning tree, are found to be competitive alternatives to SSA-based Register Allocation, and they all tend to generate better code than linear scan.

  • Register Allocation for software pipelined multidimensional loops
    ACM Transactions on Programming Languages and Systems, 2008
    Co-Authors: Hongbo Rong, Alban Douillet, Guang R. Gao
    Abstract:

    This article investigates Register Allocation for software pipelined multidimensional loops where the execution of successive iterations from an n-dimensional loop is overlapped. For single loop software pipelining, the lifetimes of a loop variable in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology [Rau 1992]) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multidimensional loop is considerably more complex and so are the vector lifetimes in it. In this article, we develop a way to normalize and represent the vector lifetimes, which captures their complexity, while exposing their regularity that enables a simple solution. The problem is formulated as bin-packing of the multidimensional vector lifetimes on the surface of a space-time cylinder. A metric, called distance, is calculated either conservatively or aggressively to guide the bin-packing process, so that there is no overlapping between any two vector lifetimes, and the Register requirement is minimized. This approach subsumes the classical Register Allocation for software pipelined single loops as a special case. The method has been implemented in the ORC compiler and produced code for the IA-64 architecture. Experimental results show the effectiveness. Several strategies for Register Allocation are compared and analyzed.

  • Register Allocation for software pipelined multi-dimensional loops
    2005
    Co-Authors: Hongbo Rong, Alban Douillet, Guang R. Gao
    Abstract:

    Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper investigates Register Allocation for software pipelined multi-dimensional loops. For single loop software pipelining, the lifetime instances of a loop variant in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau’s terminology) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multi-dimensional loop is considerably more complex, and so are the vector lifetimes in it. In this paper, we develop a way to normalize and represent vector lifetimes in multi-dimensional loop software pipelining, which capture their complexity, while exposing their regularity that enables us to develop a simple, yet powerful solution. Our algorithm is based on the development of a metric, called distance, that quantitatively determines the degree of potential overlapping (conflicts) between two vector lifetimes. We show how to calculate and use the distance, conservatively or aggressively, to guide the Register Allocation of the vector lifetimes under a bin-packing algorithm framework. The classical Register Allocation for software pipelined single loops is subsumed by our method as a special case. The method has been implemented in the ORC compiler and produced code for the Itanium architecture. We report the effectiveness of our method on 134 loop nests with 348 loop levels. Several strategies for Register Allocation are compared and analyzed

  • PLDI - Register Allocation for software pipelined multi-dimensional loops
    Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation - PLDI '05, 2005
    Co-Authors: Hongbo Rong, Alban Douillet, Guang R. Gao
    Abstract:

    Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper investigates Register Allocation for software pipelined multi-dimensional loops.For single loop software pipelining, the lifetime instances of a loop variant in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multi-dimensional loop is considerably more complex, and so are the vector lifetimes in it.In this paper, we develop a way to normalize and represent vector lifetimes in multi-dimensional loop software pipelining, which capture their complexity, while exposing their regularity that enables us to develop a simple, yet powerful solution. Our algorithm is based on the development of a metric, called distance, that quantitatively determines the degree of potential overlapping (conflicts) between two vector lifetimes. We show how to calculate and use the distance, conservatively or aggressively, to guide the Register Allocation of the vector lifetimes under a bin-packing algorithm framework. The classical Register Allocation for software pipelined single loops is subsumed by our method as a special case.The method has been implemented in the ORC compiler and produced code for the Itanium architecture. We report the effectiveness of our method on 134 loop nests with 348 loop levels. Several strategies for Register Allocation are compared and analyzed.

M. S. Schlansker - One of the best experts on this subject based on the ideXlab platform.

  • Register Allocation for software pipelined loops
    ACM SIGPLAN Notices, 1992
    Co-Authors: B. R. Rau, M. Lee, P. P. Tirumalai, M. S. Schlansker
    Abstract:

    Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. This paper studies the task of Register Allocation for software pipelined loops, both with and without hardware features that are specifically aimed at supporting software pipelines. Register Allocation for software pipelines presents certain novel problems leading to unconventional solutions, especially in the presence of hardware support. This paper formulates these novel problems and presents a number of alternative solution strategies. These alternatives are comprehensively tested against over one thousand loops to determine the best Register Allocation strategy, both with and without the hardware support for software pipelining.

  • PLDI - Register Allocation for software pipelined loops
    Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation - PLDI '92, 1992
    Co-Authors: B. R. Rau, M. Lee, P. P. Tirumalai, M. S. Schlansker
    Abstract:

    Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. This paper studies the task of Register Allocation for software pipelined loops, both with and without hardware features that are specifically aimed at supporting software pipelines. Register Allocation for software pipelines presents certain novel problems leading to unconventional solutions, especially in the presence of hardware support. This paper formulates these novel problems and presents a number of alternative solution strategies. These alternatives are comprehensively tested against over one thousand loops to determine the best Register Allocation strategy, both with and without the hardware support for software pipelining.

  • Register Allocation for Modulo Scheduled Loops: Strategies, Algorithms and Heuristics
    1992
    Co-Authors: B. R. Rau, M. Lee, P. P. Tirumalai, M. S. Schlansker
    Abstract:

    Register Allocation, modulo scheduling, software pipelining, instruction scheduling, code generation, instruction-level parallelism, multiple operation issue, VLIW processors, "ery Long Instruction word processors Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. This technical report studies the task of Register Allocation for software pipelined loops, both with and without hardware features that are specifically aimed at supporting software pipelines. Register Allocation for software pipelines presents certain novel problems leading to unconventional solutions, especially in the presence of hardware support. This technical report formulates these novel problems and presents a number of alternative solution strategies. These alternatives are comprehensively tested against over one thousand loops to determine the best Register Allocation strategy, both with and without the hardware support for software pipelining.

Fernando Magno Quintão Pereira - One of the best experts on this subject based on the ideXlab platform.

  • CC - SSA Elimination after Register Allocation
    Lecture Notes in Computer Science, 2009
    Co-Authors: Fernando Magno Quintão Pereira, Jens Palsberg
    Abstract:

    Compilers such as gcc use static-single-assignment (SSA) form as an intermediate representation and usually perform SSA elimination before Register Allocation. But the order could as well be the opposite: the recent approach of SSA-based Register Allocation performs SSA elimination after Register Allocation. SSA elimination before Register Allocation is straightforward and standard, while previously described approaches to SSA elimination after Register Allocation have shortcomings; in particular, they have problems with implementing copies between memory locations. We present spill-free SSA elimination , a simple and efficient algorithm for SSA elimination after Register Allocation that avoids increasing the number of spilled variables. We also present three optimizations of the core algorithm. Our experiments show that spill-free SSA elimination takes less than five percent of the total compilation time of a JIT compiler. Our optimizations reduce the number of memory accesses by more than 9% and improve the program execution time by more than 1.8%.

  • Register Allocation after classical ssa elimination is np complete
    Foundations of Software Science and Computation Structure, 2006
    Co-Authors: Fernando Magno Quintão Pereira, Jens Palsberg
    Abstract:

    Chaitin proved that Register Allocation is equivalent to graph coloring and hence NP-complete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can we use the result of Bouchez et al. to do Register Allocation in polynomial time by first transforming the program to SSA form, then performing Register Allocation, and finally doing the classical SSA elimination that replaces φ-functions with copy instructions? In this paper we show that the answer is no, unless P = NP: Register Allocation after classical SSA elimination is NP-complete. Chaitin's proof technique does not work for programs after classical SSA elimination; instead we use a reduction from the graph coloring problem for circular arc graphs.

  • FoSSaCS - Register Allocation after classical SSA elimination is NP-Complete
    Lecture Notes in Computer Science, 2006
    Co-Authors: Fernando Magno Quintão Pereira, Jens Palsberg
    Abstract:

    Chaitin proved that Register Allocation is equivalent to graph coloring and hence NP-complete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can we use the result of Bouchez et al. to do Register Allocation in polynomial time by first transforming the program to SSA form, then performing Register Allocation, and finally doing the classical SSA elimination that replaces φ-functions with copy instructions? In this paper we show that the answer is no, unless P = NP: Register Allocation after classical SSA elimination is NP-complete. Chaitin's proof technique does not work for programs after classical SSA elimination; instead we use a reduction from the graph coloring problem for circular arc graphs.