The Experts below are selected from a list of 231 Experts worldwide ranked by ideXlab platform
William J Dally - One of the best experts on this subject based on the ideXlab platform.
-
PACT - A tuning framework for software-Managed Memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008Co-Authors: Ji Young Park, Mike Houston, Alex Aiken, William J DallyAbstract:Achieving good performance on a modern machine with a multi-level Memory hierarchy, and in particular on a machine with software-Managed memories, requires precise tuning of programs to the machine's particular characteristics. A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, and manually searching the resultant large, non-linear space of program parameters is a tedious process of trial-and-error. In this paper we present a general framework for automatically tuning general applications to machines with software-Managed Memory hierarchies. We evaluate our framework by measuring the performance of benchmarks that are tuned for a range of machines with different Memory hierarchy configurations: a cluster of Intel P4 Xeon processors, a single Cell processor, and a cluster of Sony Playstation3's.
-
A tuning framework for software-Managed Memory hierarchies
2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008Co-Authors: Ji Young Park, Mike Houston, Alex Aiken, William J DallyAbstract:Achieving good performance on a modern machine with a multi-level Memory hierarchy, and in particular on a machine with software-Managed memories, requires precise tuning of programs to the machine's particular characteristics. A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, and manually searching the resultant large, non-linear space of program parameters is a tedious process of trial-and-error. In this paper we present a general framework for automatically tuning general applications to machines with software-Managed Memory hierarchies. We evaluate our framework by measuring the performance of benchmarks that are tuned for a range of machines with different Memory hierarchy configurations: a cluster of Intel P4 Xeon processors, a single Cell processor, and a cluster of Sony Playstation3's.
-
PACT - Stream Scheduling: A Framework to Manage Bulk Operations in a Memory Hierarchy
2007Co-Authors: William J DallyAbstract:Summary form only given. Recently, streaming architectures such as Imagine, Merrimac and cell were demonstrated to achieve significantly higher performance and efficiency over traditional architectures by introducing an explicitly Managed on-chip storage in the Memory hierarchy. This software Managed Memory serves as a staging area for bulk amounts of data, making all functional unit references short and predictable, while data is asynchronously transferred from external Memory. The decoupling of computation from Memory accesses allows the software to statically optimize the execution pipeline, transferring the onus of latency tolerance from hardware to software.
-
compilation for explicitly Managed Memory hierarchies
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007Co-Authors: Timothy James Knight, Mike Houston, Mattan Erez, Kayvon Fatahalian, William J Dally, Alex Aiken, Ji Young Park, Pat HanrahanAbstract:We present a compiler for machines with an explicitly Managed Memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.
-
PPOPP - Compilation for explicitly Managed Memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007Co-Authors: Timothy James Knight, Mike Houston, Mattan Erez, Kayvon Fatahalian, William J Dally, Alex Aiken, Ji Young Park, Pat HanrahanAbstract:We present a compiler for machines with an explicitly Managed Memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.
Ji Young Park - One of the best experts on this subject based on the ideXlab platform.
-
PACT - A tuning framework for software-Managed Memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008Co-Authors: Ji Young Park, Mike Houston, Alex Aiken, William J DallyAbstract:Achieving good performance on a modern machine with a multi-level Memory hierarchy, and in particular on a machine with software-Managed memories, requires precise tuning of programs to the machine's particular characteristics. A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, and manually searching the resultant large, non-linear space of program parameters is a tedious process of trial-and-error. In this paper we present a general framework for automatically tuning general applications to machines with software-Managed Memory hierarchies. We evaluate our framework by measuring the performance of benchmarks that are tuned for a range of machines with different Memory hierarchy configurations: a cluster of Intel P4 Xeon processors, a single Cell processor, and a cluster of Sony Playstation3's.
-
A tuning framework for software-Managed Memory hierarchies
2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008Co-Authors: Ji Young Park, Mike Houston, Alex Aiken, William J DallyAbstract:Achieving good performance on a modern machine with a multi-level Memory hierarchy, and in particular on a machine with software-Managed memories, requires precise tuning of programs to the machine's particular characteristics. A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, and manually searching the resultant large, non-linear space of program parameters is a tedious process of trial-and-error. In this paper we present a general framework for automatically tuning general applications to machines with software-Managed Memory hierarchies. We evaluate our framework by measuring the performance of benchmarks that are tuned for a range of machines with different Memory hierarchy configurations: a cluster of Intel P4 Xeon processors, a single Cell processor, and a cluster of Sony Playstation3's.
-
compilation for explicitly Managed Memory hierarchies
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007Co-Authors: Timothy James Knight, Mike Houston, Mattan Erez, Kayvon Fatahalian, William J Dally, Alex Aiken, Ji Young Park, Pat HanrahanAbstract:We present a compiler for machines with an explicitly Managed Memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.
-
PPOPP - Compilation for explicitly Managed Memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007Co-Authors: Timothy James Knight, Mike Houston, Mattan Erez, Kayvon Fatahalian, William J Dally, Alex Aiken, Ji Young Park, Pat HanrahanAbstract:We present a compiler for machines with an explicitly Managed Memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.
Choonki Jang - One of the best experts on this subject based on the ideXlab platform.
-
an automatic code overlaying technique for multicores with explicitly Managed Memory hierarchies
Symposium on Code Generation and Optimization, 2012Co-Authors: Choonki JangAbstract:The explicitly-Managed Memory hierarchies, where a hierarchy of distinct memories is exposed to the programmer and Managed explicitly by software, are not only found in typical embedded processors but also found in a class of high performance multicore architectures. Code overlay techniques have been widely used to execute a program whose code is bigger than the available code Memory in the system. To generate an efficient overlaid executable with maximum storage savings as well as minimum performance overhead, the overlay structure should be designed carefully. In this paper, we propose an efficient code overlay technique that automatically generates an overlay structure for a given Memory size for multicores with explicitly-Managed Memory hierarchies. We observe that finding an efficient overlay structure with minimum Memory copying and run-time check overhead is similar to the problem that finds a code placement with minimum conflict misses in the instruction cache. Our algorithm exploits the temporal-ordering information between functions during program execution. The information is obtained from profiling the program. Experimental results with 11 parallel applications on the Cell BE processor indicate that our approach is effective and promising.
-
CGO - An automatic code overlaying technique for multicores with explicitly-Managed Memory hierarchies
Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12, 2012Co-Authors: Choonki JangAbstract:The explicitly-Managed Memory hierarchies, where a hierarchy of distinct memories is exposed to the programmer and Managed explicitly by software, are not only found in typical embedded processors but also found in a class of high performance multicore architectures. Code overlay techniques have been widely used to execute a program whose code is bigger than the available code Memory in the system. To generate an efficient overlaid executable with maximum storage savings as well as minimum performance overhead, the overlay structure should be designed carefully. In this paper, we propose an efficient code overlay technique that automatically generates an overlay structure for a given Memory size for multicores with explicitly-Managed Memory hierarchies. We observe that finding an efficient overlay structure with minimum Memory copying and run-time check overhead is similar to the problem that finds a code placement with minimum conflict misses in the instruction cache. Our algorithm exploits the temporal-ordering information between functions during program execution. The information is obtained from profiling the program. Experimental results with 11 parallel applications on the Cell BE processor indicate that our approach is effective and promising.
-
src an automatic code overlaying technique for multicores with explicitly Managed Memory hierarchies
International Conference on Supercomputing, 2011Co-Authors: Choonki JangAbstract:In this paper, we propose an efficient code overlay technique that automatically generates an overlay structure for a given Memory size for multicores with explicitly-Managed Memory hierarchies. We observe that finding an efficient overlay structure with minimum Memory copying overhead is similar to the problem that finds a code placement with minimum conflict misses in the instruction cache. Our algorithm exploits the temporal-ordering information between functions during program execution. Experimental results on the Cell BE processor indicate that our approach is effective and promising.
-
ICS - SRC: an automatic code overlaying technique for multicores with explicitly-Managed Memory hierarchies
Proceedings of the international conference on Supercomputing - ICS '11, 2011Co-Authors: Choonki JangAbstract:In this paper, we propose an efficient code overlay technique that automatically generates an overlay structure for a given Memory size for multicores with explicitly-Managed Memory hierarchies. We observe that finding an efficient overlay structure with minimum Memory copying overhead is similar to the problem that finds a code placement with minimum conflict misses in the instruction cache. Our algorithm exploits the temporal-ordering information between functions during program execution. Experimental results on the Cell BE processor indicate that our approach is effective and promising.
Mike Houston - One of the best experts on this subject based on the ideXlab platform.
-
PACT - A tuning framework for software-Managed Memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008Co-Authors: Ji Young Park, Mike Houston, Alex Aiken, William J DallyAbstract:Achieving good performance on a modern machine with a multi-level Memory hierarchy, and in particular on a machine with software-Managed memories, requires precise tuning of programs to the machine's particular characteristics. A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, and manually searching the resultant large, non-linear space of program parameters is a tedious process of trial-and-error. In this paper we present a general framework for automatically tuning general applications to machines with software-Managed Memory hierarchies. We evaluate our framework by measuring the performance of benchmarks that are tuned for a range of machines with different Memory hierarchy configurations: a cluster of Intel P4 Xeon processors, a single Cell processor, and a cluster of Sony Playstation3's.
-
A tuning framework for software-Managed Memory hierarchies
2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008Co-Authors: Ji Young Park, Mike Houston, Alex Aiken, William J DallyAbstract:Achieving good performance on a modern machine with a multi-level Memory hierarchy, and in particular on a machine with software-Managed memories, requires precise tuning of programs to the machine's particular characteristics. A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, and manually searching the resultant large, non-linear space of program parameters is a tedious process of trial-and-error. In this paper we present a general framework for automatically tuning general applications to machines with software-Managed Memory hierarchies. We evaluate our framework by measuring the performance of benchmarks that are tuned for a range of machines with different Memory hierarchy configurations: a cluster of Intel P4 Xeon processors, a single Cell processor, and a cluster of Sony Playstation3's.
-
compilation for explicitly Managed Memory hierarchies
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007Co-Authors: Timothy James Knight, Mike Houston, Mattan Erez, Kayvon Fatahalian, William J Dally, Alex Aiken, Ji Young Park, Pat HanrahanAbstract:We present a compiler for machines with an explicitly Managed Memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.
-
PPOPP - Compilation for explicitly Managed Memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007Co-Authors: Timothy James Knight, Mike Houston, Mattan Erez, Kayvon Fatahalian, William J Dally, Alex Aiken, Ji Young Park, Pat HanrahanAbstract:We present a compiler for machines with an explicitly Managed Memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.
Alex Aiken - One of the best experts on this subject based on the ideXlab platform.
-
PACT - A tuning framework for software-Managed Memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008Co-Authors: Ji Young Park, Mike Houston, Alex Aiken, William J DallyAbstract:Achieving good performance on a modern machine with a multi-level Memory hierarchy, and in particular on a machine with software-Managed memories, requires precise tuning of programs to the machine's particular characteristics. A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, and manually searching the resultant large, non-linear space of program parameters is a tedious process of trial-and-error. In this paper we present a general framework for automatically tuning general applications to machines with software-Managed Memory hierarchies. We evaluate our framework by measuring the performance of benchmarks that are tuned for a range of machines with different Memory hierarchy configurations: a cluster of Intel P4 Xeon processors, a single Cell processor, and a cluster of Sony Playstation3's.
-
A tuning framework for software-Managed Memory hierarchies
2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008Co-Authors: Ji Young Park, Mike Houston, Alex Aiken, William J DallyAbstract:Achieving good performance on a modern machine with a multi-level Memory hierarchy, and in particular on a machine with software-Managed memories, requires precise tuning of programs to the machine's particular characteristics. A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, and manually searching the resultant large, non-linear space of program parameters is a tedious process of trial-and-error. In this paper we present a general framework for automatically tuning general applications to machines with software-Managed Memory hierarchies. We evaluate our framework by measuring the performance of benchmarks that are tuned for a range of machines with different Memory hierarchy configurations: a cluster of Intel P4 Xeon processors, a single Cell processor, and a cluster of Sony Playstation3's.
-
compilation for explicitly Managed Memory hierarchies
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007Co-Authors: Timothy James Knight, Mike Houston, Mattan Erez, Kayvon Fatahalian, William J Dally, Alex Aiken, Ji Young Park, Pat HanrahanAbstract:We present a compiler for machines with an explicitly Managed Memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.
-
PPOPP - Compilation for explicitly Managed Memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007Co-Authors: Timothy James Knight, Mike Houston, Mattan Erez, Kayvon Fatahalian, William J Dally, Alex Aiken, Ji Young Park, Pat HanrahanAbstract:We present a compiler for machines with an explicitly Managed Memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.