Itanium

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1623 Experts worldwide ranked by ideXlab platform

D. Lavery - One of the best experts on this subject based on the ideXlab platform.

  • Optimization for the Intel/spl reg/ Itanium/spl reg/ architecture register stack
    International Symposium on Code Generation and Optimization 2003. CGO 2003., 2003
    Co-Authors: A. Settle, D.a. Connors, G. Hoflehner, D. Lavery
    Abstract:

    The Intel/spl reg/ Itanium/spl reg/ architecture contains a number of innovative compiler-controllable features designed to exploit instruction level parallelism. New code generation and optimization techniques are critical to the application of these features to improve processor performance. For instance, the Itanium/spl reg/ architecture provides a compiler-controllable virtual register stack to reduce the penalty of memory accesses associated with procedure calls. The Itanium/spl reg/ Register Stack Engine (RSE) transparently manages the register stack and saves and restores physical registers to and from memory as needed. Existing code generation techniques for the register stack aggressively allocate virtual registers without regard to the register pressure on different control-flow paths. As such, applications with large data sets may stress the RSE, and cause substantial execution delays due to the high number of register saves and restores. Since the Itanium/spl reg/ architecture is developed around Explicitly Parallel Instruction Computing (EPIC) concepts, solutions to increasing the register stack efficiency favor code generation techniques rather than hardware approaches.

  • optimization for the intel spl reg Itanium spl reg architecture register stack
    Symposium on Code Generation and Optimization, 2003
    Co-Authors: A. Settle, D.a. Connors, G. Hoflehner, D. Lavery
    Abstract:

    The Intel/spl reg/ Itanium/spl reg/ architecture contains a number of innovative compiler-controllable features designed to exploit instruction level parallelism. New code generation and optimization techniques are critical to the application of these features to improve processor performance. For instance, the Itanium/spl reg/ architecture provides a compiler-controllable virtual register stack to reduce the penalty of memory accesses associated with procedure calls. The Itanium/spl reg/ Register Stack Engine (RSE) transparently manages the register stack and saves and restores physical registers to and from memory as needed. Existing code generation techniques for the register stack aggressively allocate virtual registers without regard to the register pressure on different control-flow paths. As such, applications with large data sets may stress the RSE, and cause substantial execution delays due to the high number of register saves and restores. Since the Itanium/spl reg/ architecture is developed around Explicitly Parallel Instruction Computing (EPIC) concepts, solutions to increasing the register stack efficiency favor code generation techniques rather than hardware approaches.

  • CGO - Optimization for the Intel/spl reg/ Itanium/spl reg/ architecture register stack
    International Symposium on Code Generation and Optimization 2003. CGO 2003., 2003
    Co-Authors: A. Settle, D.a. Connors, G. Hoflehner, D. Lavery
    Abstract:

    The Intel/spl reg/ Itanium/spl reg/ architecture contains a number of innovative compiler-controllable features designed to exploit instruction level parallelism. New code generation and optimization techniques are critical to the application of these features to improve processor performance. For instance, the Itanium/spl reg/ architecture provides a compiler-controllable virtual register stack to reduce the penalty of memory accesses associated with procedure calls. The Itanium/spl reg/ Register Stack Engine (RSE) transparently manages the register stack and saves and restores physical registers to and from memory as needed. Existing code generation techniques for the register stack aggressively allocate virtual registers without regard to the register pressure on different control-flow paths. As such, applications with large data sets may stress the RSE, and cause substantial execution delays due to the high number of register saves and restores. Since the Itanium/spl reg/ architecture is developed around Explicitly Parallel Instruction Computing (EPIC) concepts, solutions to increasing the register stack efficiency favor code generation techniques rather than hardware approaches.

T. Shpeisman - One of the best experts on this subject based on the ideXlab platform.

  • IEEE PACT - Just-in-time Java compilation for the Itanium/spl reg/ processor
    Proceedings.International Conference on Parallel Architectures and Compilation Techniques, 2002
    Co-Authors: T. Shpeisman, Guei-yuan Lueh, A.-r. Adl-tabatabai
    Abstract:

    This paper describes a just-in-time (JIT) Java compiler for the Intel/spl reg/ Itanium/spl reg/ processor. The Itanium processor is an example of an Explicitly Parallel Instruction Computing (EPIC) architecture and thus relies on aggressive and expensive compiler optimizations for performance. Static compilers for Itanium use aggressive global scheduling algorithms to extract instruction-level parallelism. In a JIT compiler, however, the additional overhead of such expensive optimizations may offset any gains from the improved code. In this paper, we describe lightweight code generation techniques for generating efficient Itanium code. Our compiler relies on two basic methods to generate efficient code. First, the compiler uses inexpensive scheduling heuristics to model the Itanium microarchitecture. Second, the compiler uses the semantics of the Java virtual machine to extract instruction-level parallelism.

  • just in time java compilation for the Itanium spl reg processor
    International Conference on Parallel Architectures and Compilation Techniques, 2002
    Co-Authors: T. Shpeisman, Guei-yuan Lueh, Alireza Adltabatabai
    Abstract:

    This paper describes a just-in-time (JIT) Java compiler for the Intel/spl reg/ Itanium/spl reg/ processor. The Itanium processor is an example of an Explicitly Parallel Instruction Computing (EPIC) architecture and thus relies on aggressive and expensive compiler optimizations for performance. Static compilers for Itanium use aggressive global scheduling algorithms to extract instruction-level parallelism. In a JIT compiler, however, the additional overhead of such expensive optimizations may offset any gains from the improved code. In this paper, we describe lightweight code generation techniques for generating efficient Itanium code. Our compiler relies on two basic methods to generate efficient code. First, the compiler uses inexpensive scheduling heuristics to model the Itanium microarchitecture. Second, the compiler uses the semantics of the Java virtual machine to extract instruction-level parallelism.

  • Just-in-time Java compilation for the Itanium/spl reg/ processor
    Proceedings.International Conference on Parallel Architectures and Compilation Techniques, 2002
    Co-Authors: T. Shpeisman, Guei-yuan Lueh, A.-r. Adl-tabatabai
    Abstract:

    This paper describes a just-in-time (JIT) Java compiler for the Intel/spl reg/ Itanium/spl reg/ processor. The Itanium processor is an example of an Explicitly Parallel Instruction Computing (EPIC) architecture and thus relies on aggressive and expensive compiler optimizations for performance. Static compilers for Itanium use aggressive global scheduling algorithms to extract instruction-level parallelism. In a JIT compiler, however, the additional overhead of such expensive optimizations may offset any gains from the improved code. In this paper, we describe lightweight code generation techniques for generating efficient Itanium code. Our compiler relies on two basic methods to generate efficient code. First, the compiler uses inexpensive scheduling heuristics to model the Itanium microarchitecture. Second, the compiler uses the semantics of the Java virtual machine to extract instruction-level parallelism.

Y. Zemach - One of the best experts on this subject based on the ideXlab platform.

  • ia 32 execution layer a two phase dynamic translator designed to support ia 32 applications on Itanium spl reg based systems
    International Symposium on Microarchitecture, 2003
    Co-Authors: L. Baraz, T. Devor, A. Skaletsky, Opher Etzion, Yun Wang, Shalom Goldenberg, Y. Zemach
    Abstract:

    IA-32 execution layer (IA-32 EL) is a new technology that executes IA-32 applications on Intel Itanium processor family systems. Currently, support for IA-32 applications on Itanium-based platforms is achieved using hardware circuitry on the Itanium processors. This capability will be enhanced with IA-32 EL - software that will ship with Itanium-based operating systems and will convert IA-32 instructions into Itanium instructions via dynamic translation. In this paper, we describe aspects of the IA-32 execution layer technology, including the general two-phase translation architecture and the usage of a single translator for multiple operating systems. The paper provides details of some of the technical challenges such as precise exception, emulation of FP, MMX, and Intel streaming SIMD extension instructions, and misalignment handling. Finally, the paper presents some performance results.

  • MICRO - IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems
    22nd Digital Avionics Systems Conference. Proceedings (Cat. No.03CH37449), 2003
    Co-Authors: L. Baraz, T. Devor, A. Skaletsky, Opher Etzion, Yun Wang, Shalom Goldenberg, Y. Zemach
    Abstract:

    IA-32 execution layer (IA-32 EL) is a new technology that executes IA-32 applications on Intel Itanium processor family systems. Currently, support for IA-32 applications on Itanium-based platforms is achieved using hardware circuitry on the Itanium processors. This capability will be enhanced with IA-32 EL - software that will ship with Itanium-based operating systems and will convert IA-32 instructions into Itanium instructions via dynamic translation. In this paper, we describe aspects of the IA-32 execution layer technology, including the general two-phase translation architecture and the usage of a single translator for multiple operating systems. The paper provides details of some of the technical challenges such as precise exception, emulation of FP, MMX, and Intel streaming SIMD extension instructions, and misalignment handling. Finally, the paper presents some performance results.

  • IA-32 execution layer: A two-phase dynamic translator designed to support IA-32 applications on Itanium®-based systems
    Proceedings of the Annual International Symposium on Microarchitecture MICRO, 2003
    Co-Authors: L. Baraz, T. Devor, A. Skaletsky, Suzanne Goldenberg, Opher Etzion, Yun Wang, Y. Zemach
    Abstract:

    IA-32 execution layer (IA-32 EL) is a new technology that executes IA-32 applications on Intel Itanium processor family systems. Currently, support for IA-32 applications on Itanium-based platforms is achieved using hardware circuitry on the Itanium processors. This capability will be enhanced with IA-32 EL - software that will ship with Itanium-based operating systems and will convert IA-32 instructions into Itanium instructions via dynamic translation. In this paper, we describe aspects of the IA-32 execution layer technology, including the general two-phase translation architecture and the usage of a single translator for multiple operating systems. The paper provides details of some of the technical challenges such as precise exception, emulation of FP, MMX, and Intel streaming SIMD extension instructions, and misalignment handling. Finally, the paper presents some performance results.

  • IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems
    Proceedings. 36th Annual IEEE ACM International Symposium on Microarchitecture 2003. MICRO-36., 2003
    Co-Authors: L. Baraz, T. Devor, A. Skaletsky, Suzanne Goldenberg, Opher Etzion, Yun Wang, Y. Zemach
    Abstract:

    IA-32 execution layer (IA-32 EL) is a new technology that executes IA-32 applications on Intel Itanium processor family systems. Currently, support for IA-32 applications on Itanium-based platforms is achieved using hardware circuitry on the Itanium processors. This capability will be enhanced with IA-32 EL - software that will ship with Itanium-based operating systems and will convert IA-32 instructions into Itanium instructions via dynamic translation. In this paper, we describe aspects of the IA-32 execution layer technology, including the general two-phase translation architecture and the usage of a single translator for multiple operating systems. The paper provides details of some of the technical challenges such as precise exception, emulation of FP, MMX, and Intel streaming SIMD extension instructions, and misalignment handling. Finally, the paper presents some performance results.

A. Settle - One of the best experts on this subject based on the ideXlab platform.

  • Optimization for the Intel/spl reg/ Itanium/spl reg/ architecture register stack
    International Symposium on Code Generation and Optimization 2003. CGO 2003., 2003
    Co-Authors: A. Settle, D.a. Connors, G. Hoflehner, D. Lavery
    Abstract:

    The Intel/spl reg/ Itanium/spl reg/ architecture contains a number of innovative compiler-controllable features designed to exploit instruction level parallelism. New code generation and optimization techniques are critical to the application of these features to improve processor performance. For instance, the Itanium/spl reg/ architecture provides a compiler-controllable virtual register stack to reduce the penalty of memory accesses associated with procedure calls. The Itanium/spl reg/ Register Stack Engine (RSE) transparently manages the register stack and saves and restores physical registers to and from memory as needed. Existing code generation techniques for the register stack aggressively allocate virtual registers without regard to the register pressure on different control-flow paths. As such, applications with large data sets may stress the RSE, and cause substantial execution delays due to the high number of register saves and restores. Since the Itanium/spl reg/ architecture is developed around Explicitly Parallel Instruction Computing (EPIC) concepts, solutions to increasing the register stack efficiency favor code generation techniques rather than hardware approaches.

  • optimization for the intel spl reg Itanium spl reg architecture register stack
    Symposium on Code Generation and Optimization, 2003
    Co-Authors: A. Settle, D.a. Connors, G. Hoflehner, D. Lavery
    Abstract:

    The Intel/spl reg/ Itanium/spl reg/ architecture contains a number of innovative compiler-controllable features designed to exploit instruction level parallelism. New code generation and optimization techniques are critical to the application of these features to improve processor performance. For instance, the Itanium/spl reg/ architecture provides a compiler-controllable virtual register stack to reduce the penalty of memory accesses associated with procedure calls. The Itanium/spl reg/ Register Stack Engine (RSE) transparently manages the register stack and saves and restores physical registers to and from memory as needed. Existing code generation techniques for the register stack aggressively allocate virtual registers without regard to the register pressure on different control-flow paths. As such, applications with large data sets may stress the RSE, and cause substantial execution delays due to the high number of register saves and restores. Since the Itanium/spl reg/ architecture is developed around Explicitly Parallel Instruction Computing (EPIC) concepts, solutions to increasing the register stack efficiency favor code generation techniques rather than hardware approaches.

  • CGO - Optimization for the Intel/spl reg/ Itanium/spl reg/ architecture register stack
    International Symposium on Code Generation and Optimization 2003. CGO 2003., 2003
    Co-Authors: A. Settle, D.a. Connors, G. Hoflehner, D. Lavery
    Abstract:

    The Intel/spl reg/ Itanium/spl reg/ architecture contains a number of innovative compiler-controllable features designed to exploit instruction level parallelism. New code generation and optimization techniques are critical to the application of these features to improve processor performance. For instance, the Itanium/spl reg/ architecture provides a compiler-controllable virtual register stack to reduce the penalty of memory accesses associated with procedure calls. The Itanium/spl reg/ Register Stack Engine (RSE) transparently manages the register stack and saves and restores physical registers to and from memory as needed. Existing code generation techniques for the register stack aggressively allocate virtual registers without regard to the register pressure on different control-flow paths. As such, applications with large data sets may stress the RSE, and cause substantial execution delays due to the high number of register saves and restores. Since the Itanium/spl reg/ architecture is developed around Explicitly Parallel Instruction Computing (EPIC) concepts, solutions to increasing the register stack efficiency favor code generation techniques rather than hardware approaches.

G. Lowney - One of the best experts on this subject based on the ideXlab platform.

  • Ispike: a post-link optimizer for the Intel/spl reg/ Itanium/spl reg/ architecture
    International Symposium on Code Generation and Optimization 2004. CGO 2004., 2004
    Co-Authors: R. Muth, Harish Patil, R. Cohn, G. Lowney
    Abstract:

    Ispike is a post-link optimizer developed for the Intel/spl reg/ Itanium Processor Family (IPF) processors. The IPF architecture poses both opportunities and challenges to post-link optimizations. IPF offers a rich set of performance counters to collect detailed profile information at a low cost, which is essential to post-link optimization being practical. At the same time, the predication and bundling features on IPF make post-link code transformation more challenging than on other architectures. In Ispike, we have implemented optimizations like code layout, instruction prefetching, data layout, and data prefetching that exploit the IPF advantages, and strategies that cope with the IPF-specific challenges. Using SPEC CINT2000 as benchmarks, we show that Ispike improves performance by as much as 40% on the ltanium/spl reg/2 processor, with average improvement of 8.5% and 9.9% over executables generated by the Intel/spl reg/ Electron compiler and by the Gcc compiler, respectively. We also demonstrate that statistical profiles collected via IPF performance counters and complete profiles collected via instrumentation produce equal performance benefit, but the profiling overhead is significantly lower for performance counters.

  • ispike a post link optimizer for the intel spl reg Itanium spl reg architecture
    Symposium on Code Generation and Optimization, 2004
    Co-Authors: R. Muth, Harish Patil, R. Cohn, G. Lowney
    Abstract:

    Ispike is a post-link optimizer developed for the Intel/spl reg/ Itanium Processor Family (IPF) processors. The IPF architecture poses both opportunities and challenges to post-link optimizations. IPF offers a rich set of performance counters to collect detailed profile information at a low cost, which is essential to post-link optimization being practical. At the same time, the predication and bundling features on IPF make post-link code transformation more challenging than on other architectures. In Ispike, we have implemented optimizations like code layout, instruction prefetching, data layout, and data prefetching that exploit the IPF advantages, and strategies that cope with the IPF-specific challenges. Using SPEC CINT2000 as benchmarks, we show that Ispike improves performance by as much as 40% on the ltanium/spl reg/2 processor, with average improvement of 8.5% and 9.9% over executables generated by the Intel/spl reg/ Electron compiler and by the Gcc compiler, respectively. We also demonstrate that statistical profiles collected via IPF performance counters and complete profiles collected via instrumentation produce equal performance benefit, but the profiling overhead is significantly lower for performance counters.