The Experts below are selected from a list of 303 Experts worldwide ranked by ideXlab platform
Jens Palsberg - One of the best experts on this subject based on the ideXlab platform.
-
CC - SSA Elimination after Register Allocation
Lecture Notes in Computer Science, 2009Co-Authors: Fernando Magno Quintão Pereira, Jens PalsbergAbstract:Compilers such as gcc use static-single-assignment (SSA) form as an intermediate representation and usually perform SSA elimination before Register Allocation. But the order could as well be the opposite: the recent approach of SSA-based Register Allocation performs SSA elimination after Register Allocation. SSA elimination before Register Allocation is straightforward and standard, while previously described approaches to SSA elimination after Register Allocation have shortcomings; in particular, they have problems with implementing copies between memory locations. We present spill-free SSA elimination , a simple and efficient algorithm for SSA elimination after Register Allocation that avoids increasing the number of spilled variables. We also present three optimizations of the core algorithm. Our experiments show that spill-free SSA elimination takes less than five percent of the total compilation time of a JIT compiler. Our optimizations reduce the number of memory accesses by more than 9% and improve the program execution time by more than 1.8%.
-
Register Allocation via coloring of chordal graphs
Symposium on the Theory of Computing, 2007Co-Authors: Jens PalsbergAbstract:A revolution in Register Allocation is going on right now: compared to the state of the art, the new Register allocators are much simpler and generate code of equivalent quality. In 2005, Bouchez, Brisk et al., and Hack independently discovered that if a program is in static-single assignment (SSA) form, then a core Register Allocation problem can be solved in polynomial time. Many compilers and virtual machines represent programs in SSA form internally, including gcc version 4.0, Sun Microsystem's Java HotSpot Virtual Machine, and IBM's Java virtual machine Jikes RVM. A compiler can translate any program to SSA form in polynomial time, and amazingly thereby change the Register Allocation problem from NP-complete to polynomial time; the reason is that the transformation may change the need for Registers. Luckily, the program in SSA form needs a number of Registers which is the same or smaller than that of the original program. We have a win-win situation here: translating to SSA form changes an NP-complete problem to a simple polynomial-time problem and it decreases the need for Registers. The key insight is that a program in SSA form has a chordal interference graph. A greedy algorithm can optimally color a chordal graph in linear time in the number of edges. Completing the picture, Hack et al. have presented a new SSA-elimination algorithm that does not demand extra Registers. The combined result is a simple, optimal, polynomial-time algorithm for a core Register Allocation problem. The new SSA-elimination algorithm of Hack et al. is essential because classical SSA elimination leads to an NP-complete core Register Allocation problem, as shown by Palsberg and Pereira. Register Allocation with spilling for SSA form remains NP-complete. We can get most of the advantages of SSA form without actually transforming to SSA form: a vast majority of realistic benchmark programs have chordal interference graphs. For example, 95% of the methods in the Java 1.5 library have chordal interference graphs when compiled with the JoeQ compiler. Pereira and Palsberg have shown how to add simple heuristics for spilling and coalescing to the greedy coloring algorithm. The result is a simple and efficient Register Allocation algorithm which compares well with the iterated Register coalescing algorithm of George and Appel.
-
CATS - Register Allocation via coloring of chordal graphs
2007Co-Authors: Jens PalsbergAbstract:A revolution in Register Allocation is going on right now: compared to the state of the art, the new Register allocators are much simpler and generate code of equivalent quality. In 2005, Bouchez, Brisk et al., and Hack independently discovered that if a program is in static-single assignment (SSA) form, then a core Register Allocation problem can be solved in polynomial time. Many compilers and virtual machines represent programs in SSA form internally, including gcc version 4.0, Sun Microsystem's Java HotSpot Virtual Machine, and IBM's Java virtual machine Jikes RVM. A compiler can translate any program to SSA form in polynomial time, and amazingly thereby change the Register Allocation problem from NP-complete to polynomial time; the reason is that the transformation may change the need for Registers. Luckily, the program in SSA form needs a number of Registers which is the same or smaller than that of the original program. We have a win-win situation here: translating to SSA form changes an NP-complete problem to a simple polynomial-time problem and it decreases the need for Registers. The key insight is that a program in SSA form has a chordal interference graph. A greedy algorithm can optimally color a chordal graph in linear time in the number of edges. Completing the picture, Hack et al. have presented a new SSA-elimination algorithm that does not demand extra Registers. The combined result is a simple, optimal, polynomial-time algorithm for a core Register Allocation problem. The new SSA-elimination algorithm of Hack et al. is essential because classical SSA elimination leads to an NP-complete core Register Allocation problem, as shown by Palsberg and Pereira. Register Allocation with spilling for SSA form remains NP-complete. We can get most of the advantages of SSA form without actually transforming to SSA form: a vast majority of realistic benchmark programs have chordal interference graphs. For example, 95% of the methods in the Java 1.5 library have chordal interference graphs when compiled with the JoeQ compiler. Pereira and Palsberg have shown how to add simple heuristics for spilling and coalescing to the greedy coloring algorithm. The result is a simple and efficient Register Allocation algorithm which compares well with the iterated Register coalescing algorithm of George and Appel.
-
Register Allocation after classical ssa elimination is np complete
Foundations of Software Science and Computation Structure, 2006Co-Authors: Fernando Magno Quintão Pereira, Jens PalsbergAbstract:Chaitin proved that Register Allocation is equivalent to graph coloring and hence NP-complete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can we use the result of Bouchez et al. to do Register Allocation in polynomial time by first transforming the program to SSA form, then performing Register Allocation, and finally doing the classical SSA elimination that replaces φ-functions with copy instructions? In this paper we show that the answer is no, unless P = NP: Register Allocation after classical SSA elimination is NP-complete. Chaitin's proof technique does not work for programs after classical SSA elimination; instead we use a reduction from the graph coloring problem for circular arc graphs.
-
FoSSaCS - Register Allocation after classical SSA elimination is NP-Complete
Lecture Notes in Computer Science, 2006Co-Authors: Fernando Magno Quintão Pereira, Jens PalsbergAbstract:Chaitin proved that Register Allocation is equivalent to graph coloring and hence NP-complete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can we use the result of Bouchez et al. to do Register Allocation in polynomial time by first transforming the program to SSA form, then performing Register Allocation, and finally doing the classical SSA elimination that replaces φ-functions with copy instructions? In this paper we show that the answer is no, unless P = NP: Register Allocation after classical SSA elimination is NP-complete. Chaitin's proof technique does not work for programs after classical SSA elimination; instead we use a reduction from the graph coloring problem for circular arc graphs.
Guang R. Gao - One of the best experts on this subject based on the ideXlab platform.
-
Register Allocation for software pipelined multidimensional loops
ACM Transactions on Programming Languages and Systems, 2008Co-Authors: Hongbo Rong, Alban Douillet, Guang R. GaoAbstract:This article investigates Register Allocation for software pipelined multidimensional loops where the execution of successive iterations from an n-dimensional loop is overlapped. For single loop software pipelining, the lifetimes of a loop variable in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology [Rau 1992]) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multidimensional loop is considerably more complex and so are the vector lifetimes in it. In this article, we develop a way to normalize and represent the vector lifetimes, which captures their complexity, while exposing their regularity that enables a simple solution. The problem is formulated as bin-packing of the multidimensional vector lifetimes on the surface of a space-time cylinder. A metric, called distance, is calculated either conservatively or aggressively to guide the bin-packing process, so that there is no overlapping between any two vector lifetimes, and the Register requirement is minimized. This approach subsumes the classical Register Allocation for software pipelined single loops as a special case. The method has been implemented in the ORC compiler and produced code for the IA-64 architecture. Experimental results show the effectiveness. Several strategies for Register Allocation are compared and analyzed.
-
Register Allocation for software pipelined multi-dimensional loops
2005Co-Authors: Hongbo Rong, Alban Douillet, Guang R. GaoAbstract:Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper investigates Register Allocation for software pipelined multi-dimensional loops. For single loop software pipelining, the lifetime instances of a loop variant in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau’s terminology) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multi-dimensional loop is considerably more complex, and so are the vector lifetimes in it. In this paper, we develop a way to normalize and represent vector lifetimes in multi-dimensional loop software pipelining, which capture their complexity, while exposing their regularity that enables us to develop a simple, yet powerful solution. Our algorithm is based on the development of a metric, called distance, that quantitatively determines the degree of potential overlapping (conflicts) between two vector lifetimes. We show how to calculate and use the distance, conservatively or aggressively, to guide the Register Allocation of the vector lifetimes under a bin-packing algorithm framework. The classical Register Allocation for software pipelined single loops is subsumed by our method as a special case. The method has been implemented in the ORC compiler and produced code for the Itanium architecture. We report the effectiveness of our method on 134 loop nests with 348 loop levels. Several strategies for Register Allocation are compared and analyzed
-
PLDI - Register Allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation - PLDI '05, 2005Co-Authors: Hongbo Rong, Alban Douillet, Guang R. GaoAbstract:Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper investigates Register Allocation for software pipelined multi-dimensional loops.For single loop software pipelining, the lifetime instances of a loop variant in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multi-dimensional loop is considerably more complex, and so are the vector lifetimes in it.In this paper, we develop a way to normalize and represent vector lifetimes in multi-dimensional loop software pipelining, which capture their complexity, while exposing their regularity that enables us to develop a simple, yet powerful solution. Our algorithm is based on the development of a metric, called distance, that quantitatively determines the degree of potential overlapping (conflicts) between two vector lifetimes. We show how to calculate and use the distance, conservatively or aggressively, to guide the Register Allocation of the vector lifetimes under a bin-packing algorithm framework. The classical Register Allocation for software pipelined single loops is subsumed by our method as a special case.The method has been implemented in the ORC compiler and produced code for the Itanium architecture. We report the effectiveness of our method on 134 loop nests with 348 loop levels. Several strategies for Register Allocation are compared and analyzed.
Hongbo Rong - One of the best experts on this subject based on the ideXlab platform.
-
MICRO - Tree Register Allocation
Proceedings of the 42nd Annual IEEE ACM International Symposium on Microarchitecture - Micro-42, 2009Co-Authors: Hongbo RongAbstract:This paper presents tree Register Allocation, which maps the lifetimes of the variables in a program into a set of trees, colors each tree in a greedy style, which is optimal when there is no spilling, and connects dataflow between and within the trees afterward. This approach generalizes and subsumes as special cases SSA-based, linear scan, and local Register Allocation. It keeps their simplicity and low throughput cost, and exposes a wide solution space beyond them. Its flexibility enables control flow structure and/or profile information to be better reflected in the trees. This approach has been prototyped in the Phoenix production compiler framework. Preliminary experiments suggest this is a promising direction with great potential. Register Allocation based on two special kinds of trees, extended basic blocks and the maximal spanning tree, are found to be competitive alternatives to SSA-based Register Allocation, and they all tend to generate better code than linear scan.
-
Register Allocation for software pipelined multidimensional loops
ACM Transactions on Programming Languages and Systems, 2008Co-Authors: Hongbo Rong, Alban Douillet, Guang R. GaoAbstract:This article investigates Register Allocation for software pipelined multidimensional loops where the execution of successive iterations from an n-dimensional loop is overlapped. For single loop software pipelining, the lifetimes of a loop variable in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology [Rau 1992]) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multidimensional loop is considerably more complex and so are the vector lifetimes in it. In this article, we develop a way to normalize and represent the vector lifetimes, which captures their complexity, while exposing their regularity that enables a simple solution. The problem is formulated as bin-packing of the multidimensional vector lifetimes on the surface of a space-time cylinder. A metric, called distance, is calculated either conservatively or aggressively to guide the bin-packing process, so that there is no overlapping between any two vector lifetimes, and the Register requirement is minimized. This approach subsumes the classical Register Allocation for software pipelined single loops as a special case. The method has been implemented in the ORC compiler and produced code for the IA-64 architecture. Experimental results show the effectiveness. Several strategies for Register Allocation are compared and analyzed.
-
Register Allocation for software pipelined multi-dimensional loops
2005Co-Authors: Hongbo Rong, Alban Douillet, Guang R. GaoAbstract:Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper investigates Register Allocation for software pipelined multi-dimensional loops. For single loop software pipelining, the lifetime instances of a loop variant in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau’s terminology) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multi-dimensional loop is considerably more complex, and so are the vector lifetimes in it. In this paper, we develop a way to normalize and represent vector lifetimes in multi-dimensional loop software pipelining, which capture their complexity, while exposing their regularity that enables us to develop a simple, yet powerful solution. Our algorithm is based on the development of a metric, called distance, that quantitatively determines the degree of potential overlapping (conflicts) between two vector lifetimes. We show how to calculate and use the distance, conservatively or aggressively, to guide the Register Allocation of the vector lifetimes under a bin-packing algorithm framework. The classical Register Allocation for software pipelined single loops is subsumed by our method as a special case. The method has been implemented in the ORC compiler and produced code for the Itanium architecture. We report the effectiveness of our method on 134 loop nests with 348 loop levels. Several strategies for Register Allocation are compared and analyzed
-
PLDI - Register Allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation - PLDI '05, 2005Co-Authors: Hongbo Rong, Alban Douillet, Guang R. GaoAbstract:Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper investigates Register Allocation for software pipelined multi-dimensional loops.For single loop software pipelining, the lifetime instances of a loop variant in successive iterations of the loop form a repetitive pattern. An effective Register Allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology) and map it to rotating Registers. Unfortunately, the software pipelined schedule of a multi-dimensional loop is considerably more complex, and so are the vector lifetimes in it.In this paper, we develop a way to normalize and represent vector lifetimes in multi-dimensional loop software pipelining, which capture their complexity, while exposing their regularity that enables us to develop a simple, yet powerful solution. Our algorithm is based on the development of a metric, called distance, that quantitatively determines the degree of potential overlapping (conflicts) between two vector lifetimes. We show how to calculate and use the distance, conservatively or aggressively, to guide the Register Allocation of the vector lifetimes under a bin-packing algorithm framework. The classical Register Allocation for software pipelined single loops is subsumed by our method as a special case.The method has been implemented in the ORC compiler and produced code for the Itanium architecture. We report the effectiveness of our method on 134 loop nests with 348 loop levels. Several strategies for Register Allocation are compared and analyzed.
M. S. Schlansker - One of the best experts on this subject based on the ideXlab platform.
-
Register Allocation for software pipelined loops
ACM SIGPLAN Notices, 1992Co-Authors: B. R. Rau, M. Lee, P. P. Tirumalai, M. S. SchlanskerAbstract:Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. This paper studies the task of Register Allocation for software pipelined loops, both with and without hardware features that are specifically aimed at supporting software pipelines. Register Allocation for software pipelines presents certain novel problems leading to unconventional solutions, especially in the presence of hardware support. This paper formulates these novel problems and presents a number of alternative solution strategies. These alternatives are comprehensively tested against over one thousand loops to determine the best Register Allocation strategy, both with and without the hardware support for software pipelining.
-
PLDI - Register Allocation for software pipelined loops
Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation - PLDI '92, 1992Co-Authors: B. R. Rau, M. Lee, P. P. Tirumalai, M. S. SchlanskerAbstract:Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. This paper studies the task of Register Allocation for software pipelined loops, both with and without hardware features that are specifically aimed at supporting software pipelines. Register Allocation for software pipelines presents certain novel problems leading to unconventional solutions, especially in the presence of hardware support. This paper formulates these novel problems and presents a number of alternative solution strategies. These alternatives are comprehensively tested against over one thousand loops to determine the best Register Allocation strategy, both with and without the hardware support for software pipelining.
-
Register Allocation for Modulo Scheduled Loops: Strategies, Algorithms and Heuristics
1992Co-Authors: B. R. Rau, M. Lee, P. P. Tirumalai, M. S. SchlanskerAbstract:Register Allocation, modulo scheduling, software pipelining, instruction scheduling, code generation, instruction-level parallelism, multiple operation issue, VLIW processors, "ery Long Instruction word processors Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. This technical report studies the task of Register Allocation for software pipelined loops, both with and without hardware features that are specifically aimed at supporting software pipelines. Register Allocation for software pipelines presents certain novel problems leading to unconventional solutions, especially in the presence of hardware support. This technical report formulates these novel problems and presents a number of alternative solution strategies. These alternatives are comprehensively tested against over one thousand loops to determine the best Register Allocation strategy, both with and without the hardware support for software pipelining.
Fernando Magno Quintão Pereira - One of the best experts on this subject based on the ideXlab platform.
-
CC - SSA Elimination after Register Allocation
Lecture Notes in Computer Science, 2009Co-Authors: Fernando Magno Quintão Pereira, Jens PalsbergAbstract:Compilers such as gcc use static-single-assignment (SSA) form as an intermediate representation and usually perform SSA elimination before Register Allocation. But the order could as well be the opposite: the recent approach of SSA-based Register Allocation performs SSA elimination after Register Allocation. SSA elimination before Register Allocation is straightforward and standard, while previously described approaches to SSA elimination after Register Allocation have shortcomings; in particular, they have problems with implementing copies between memory locations. We present spill-free SSA elimination , a simple and efficient algorithm for SSA elimination after Register Allocation that avoids increasing the number of spilled variables. We also present three optimizations of the core algorithm. Our experiments show that spill-free SSA elimination takes less than five percent of the total compilation time of a JIT compiler. Our optimizations reduce the number of memory accesses by more than 9% and improve the program execution time by more than 1.8%.
-
Register Allocation after classical ssa elimination is np complete
Foundations of Software Science and Computation Structure, 2006Co-Authors: Fernando Magno Quintão Pereira, Jens PalsbergAbstract:Chaitin proved that Register Allocation is equivalent to graph coloring and hence NP-complete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can we use the result of Bouchez et al. to do Register Allocation in polynomial time by first transforming the program to SSA form, then performing Register Allocation, and finally doing the classical SSA elimination that replaces φ-functions with copy instructions? In this paper we show that the answer is no, unless P = NP: Register Allocation after classical SSA elimination is NP-complete. Chaitin's proof technique does not work for programs after classical SSA elimination; instead we use a reduction from the graph coloring problem for circular arc graphs.
-
FoSSaCS - Register Allocation after classical SSA elimination is NP-Complete
Lecture Notes in Computer Science, 2006Co-Authors: Fernando Magno Quintão Pereira, Jens PalsbergAbstract:Chaitin proved that Register Allocation is equivalent to graph coloring and hence NP-complete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can we use the result of Bouchez et al. to do Register Allocation in polynomial time by first transforming the program to SSA form, then performing Register Allocation, and finally doing the classical SSA elimination that replaces φ-functions with copy instructions? In this paper we show that the answer is no, unless P = NP: Register Allocation after classical SSA elimination is NP-complete. Chaitin's proof technique does not work for programs after classical SSA elimination; instead we use a reduction from the graph coloring problem for circular arc graphs.