Multithreading

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

John Shen - One of the best experts on this subject based on the ideXlab platform.

helper threads via virtual Multithreading on an experimental itanium 2 processor based platform

Architectural Support for Programming Languages and Operating Systems, 2004

Co-Authors: Perry Wang, Jamison D Collins, Hong Wang, Bill Greene, Kaiming Chan, Aamir B Yunus, Terry Sych, Stephen F Moore, John Shen

Abstract:

Helper threading is a technology to accelerate a program by exploiting a processor's Multithreading capability to run ``assist'' threads. Previous experiments on hyper-threaded processors have demonstrated significant speedups by using helper threads to prefetch hard-to-predict delinquent data accesses. In order to apply this technique to processors that do not have built-in hardware support for Multithreading, we introduce virtual Multithreading (VMT), a novel form of switch-on-event user-level Multithreading, capable of fly-weight multiplexing of event-driven thread executions on a single processor without additional operating system support. The compiler plays a key role in minimizing synchronization cost by judiciously partitioning register usage among the user-level threads. The VMT approach makes it possible to launch dynamic helper thread instances in response to long-latency cache miss events, and to run helper threads in the shadow of cache misses when the main thread would be otherwise stalled.The concept of VMT is prototyped on an Itanium ® 2 processor using features provided by the Processor Abstraction Layer (PAL) firmware mechanism already present in currently shipping processors. On a 4-way MP physical system equipped with VMT-enabled Itanium 2 processors, helper threading via the VMT mechanism can achieve significant performance gains for a diverse set of real-world workloads, ranging from single-threaded workstation benchmarks to heavily multithreaded large scale decision support systems (DSS) using the IBM DB2 Universal Database. We measure a wall-clock speedup of 5.8% to 38.5% for the workstation benchmarks, and 5.0% to 12.7% on various queries in the DSS workload.

15 days free trial to Access Article
a realistic study on multithreaded superscalar processor design

European Conference on Parallel Processing, 1997

Co-Authors: Yuan C Chou, Daniel P. Siewiorek, John Shen

Abstract:

Simultaneous Multithreading is a recently proposed technique in which instructions from multiple threads are dispatched and/or issued concurrently in every clock cycle. This technique has been claimed to improve the latency of multithreaded programs and the throughput of multiprogrammed workloads with a minimal increase in hardware complexity. This paper presents a realistic study on the case for simultaneous Multithreading by using extensive simulations to determine balanced configurations of a multithreaded version of the PowerPC 620, measuring their performance on multithreaded benchmarks written using the commercial P Threads API, and estimating their hardware complexity in terms of increases in die area. Our results show that a balanced 2- threaded 620 achieves a 41.6% to 71.3% speedup over the original 620 on five multithreaded benchmarks with an estimated 36.4% increase in die area and no impact on single thread performance. The balanced 4-threaded 620 achieves a 46.9% to 111.6% speedup over the original 620 with an estimated 70.4% increase in die area and a detrimental impact on single thread performance.

15 days free trial to Access Article

Perry Wang - One of the best experts on this subject based on the ideXlab platform.

helper threads via virtual Multithreading on an experimental itanium 2 processor based platform

Architectural Support for Programming Languages and Operating Systems, 2004

Co-Authors: Perry Wang, Jamison D Collins, Hong Wang, Bill Greene, Kaiming Chan, Aamir B Yunus, Terry Sych, Stephen F Moore, John Shen

Abstract:

Helper threading is a technology to accelerate a program by exploiting a processor's Multithreading capability to run ``assist'' threads. Previous experiments on hyper-threaded processors have demonstrated significant speedups by using helper threads to prefetch hard-to-predict delinquent data accesses. In order to apply this technique to processors that do not have built-in hardware support for Multithreading, we introduce virtual Multithreading (VMT), a novel form of switch-on-event user-level Multithreading, capable of fly-weight multiplexing of event-driven thread executions on a single processor without additional operating system support. The compiler plays a key role in minimizing synchronization cost by judiciously partitioning register usage among the user-level threads. The VMT approach makes it possible to launch dynamic helper thread instances in response to long-latency cache miss events, and to run helper threads in the shadow of cache misses when the main thread would be otherwise stalled.The concept of VMT is prototyped on an Itanium ® 2 processor using features provided by the Processor Abstraction Layer (PAL) firmware mechanism already present in currently shipping processors. On a 4-way MP physical system equipped with VMT-enabled Itanium 2 processors, helper threading via the VMT mechanism can achieve significant performance gains for a diverse set of real-world workloads, ranging from single-threaded workstation benchmarks to heavily multithreaded large scale decision support systems (DSS) using the IBM DB2 Universal Database. We measure a wall-clock speedup of 5.8% to 38.5% for the workstation benchmarks, and 5.0% to 12.7% on various queries in the DSS workload.

15 days free trial to Access Article

Susan J Eggers - One of the best experts on this subject based on the ideXlab platform.

An analysis of database workload performance on simultaneous multithreaded processors

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235), 1998

Co-Authors: J.l. Lo, Susan J Eggers, K. Gharachorloo, H.m. Levy, L.a. Barroso, S.s. Parekh

Abstract:

Simultaneous Multithreading (SMT) is an architectural technique in which the processor issues multiple instructions from multiple threads each cycle. While SMT has been shown to be effective on scientific workloads, its performance on database systems is still an open question. In particular, database systems have poor cache performance, and the addition of Multithreading has the potential to exacerbate cache conflicts. This paper examines database performance on SMT processors using traces of the Oracle database management system. Our research makes three contributions. First, it characterizes the memory-system behavior of database systems running on-line transaction processing and decision support system workloads. Our data show that while DBMS workloads have large memory footprints, there is substantial data reuse in a small, cacheable "critical" working set. Second, we show that the additional data cache conflicts caused by simultaneous-multithreaded instruction scheduling can be nearly eliminated by the proper choice of software-directed policies for virtual-to-physical page mapping and per-process address offsetting. Our results demonstrate that with the best policy choices, D-cache miss rates on an 8-context SMT are roughly equivalent to those on a single-threaded superscalar. Multithreading also leads to better interthread instruction cache sharing, reducing I-cache miss rates by up to 35%. Third, we show that SMT's latency tolerance is highly effective for database applications. For example, using a memory-intensive OLTP workload, an 8-context SMT processor achieves a 3-fold increase in instruction throughput over a single-threaded superscalar with similar resources.

15 days free trial to Access Article
exploiting choice instruction fetch and issue on an implementable simultaneous Multithreading processor

International Symposium on Computer Architecture, 1996

Co-Authors: Dean M Tullsen, Susan J Eggers, Joel Emer, Henry M Levy, Rebecca L Stamm

Abstract:

Simultaneous Multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous Multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous Multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous Multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous Multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of Multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.

15 days free trial to Access Article
simultaneous Multithreading maximizing on chip parallelism

International Symposium on Computer Architecture, 1995

Co-Authors: Dean M Tullsen, Susan J Eggers, Henry M Levy

Abstract:

This paper examines simultaneous Multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. We present several models of simultaneous Multithreading and compare them with alternative organizations: a wide superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing architectures. Our results show that both (single-threaded) superscalar and fine-grain multithreaded architectures are limited their ability to utilize the resources of a wide-issue processor. Simultaneous Multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain Multithreading. We evaluate several cache configurations made possible by this type of organization and evaluate tradeoffs between them. We also show that simultaneous Multithreading is an attractive alternative to single-chip multiprocessors; simultaneous multithreaded processors with a variety of organizations outperform corresponding conventional multiprocessors with similar execution resources.While simultaneous Multithreading has excellent potential to increase processor utilization, it can add substantial complexity to the design. We examine many of these complexities and evaluate alternative organizations in the design space.

15 days free trial to Access Article

Ralf S Engelschall - One of the best experts on this subject based on the ideXlab platform.

portable Multithreading the signal stack trick for user space thread creation

USENIX Annual Technical Conference, 2000

Co-Authors: Ralf S Engelschall

Abstract:

This paper describes a pragmatic but portable fallback approach for creating and dispatching between the machine contexts of multiple threads of execution on Unix systems that lack a dedicated user-space context switching facility. Such a fallback approach for implementing machine contexts is a vital part of a user-space Multithreading environment, if it has to achieve maximum portability across a wide range of Unix flavors. The approach is entirely based on standard Unix system facilities and ANSI-C language features and especially does not require any assembly code or platform specific tricks at all. The most interesting issue is the technique of creating the machine context for threads, which this paper explains in detail. The described approach closely follows the algorithm as implemented by the author for the popular user-space Multithreading library GNU Portable Threads (GNU Pth, [25]) which this way quickly gained the status of one of the most portable user-space Multithreading libraries.

15 days free trial to Access Article
the signal stack trick for user space thread creation

2000

Co-Authors: Portable Multithreading, Ralf S Engelschall

Abstract:

This paper describes a pragmatic but portable fallback approach for creating and dispatching between the machine contexts of multiple threads of execution on Unix systems that lack a dedicated user-space context switching facility. Such a fallback approach for implementing machine contexts is a vital part of a user-space Multithreading environment, if it has to achieve maximum portability across a wide range of Unix flavors. The approach is entirely based on standard Unix system facilities and ANSI-C language features and especially does not require any assembly code or platform specific tricks at all. The most interesting issue is the technique of creating the machine context for threads, which this paper explains in detail. The described approach closely follows the algorithm as implemented by the author for the popular user-space Multithreading library GNU Portable Threads (GNU Pth, [25]) which this way quickly gained the status of one of the most portable user-space Multithreading libraries.

15 days free trial to Access Article

Bill Greene - One of the best experts on this subject based on the ideXlab platform.

helper threads via virtual Multithreading on an experimental itanium 2 processor based platform

Architectural Support for Programming Languages and Operating Systems, 2004

Co-Authors: Perry Wang, Jamison D Collins, Hong Wang, Bill Greene, Kaiming Chan, Aamir B Yunus, Terry Sych, Stephen F Moore, John Shen

Abstract:

Helper threading is a technology to accelerate a program by exploiting a processor's Multithreading capability to run ``assist'' threads. Previous experiments on hyper-threaded processors have demonstrated significant speedups by using helper threads to prefetch hard-to-predict delinquent data accesses. In order to apply this technique to processors that do not have built-in hardware support for Multithreading, we introduce virtual Multithreading (VMT), a novel form of switch-on-event user-level Multithreading, capable of fly-weight multiplexing of event-driven thread executions on a single processor without additional operating system support. The compiler plays a key role in minimizing synchronization cost by judiciously partitioning register usage among the user-level threads. The VMT approach makes it possible to launch dynamic helper thread instances in response to long-latency cache miss events, and to run helper threads in the shadow of cache misses when the main thread would be otherwise stalled.The concept of VMT is prototyped on an Itanium ® 2 processor using features provided by the Processor Abstraction Layer (PAL) firmware mechanism already present in currently shipping processors. On a 4-way MP physical system equipped with VMT-enabled Itanium 2 processors, helper threading via the VMT mechanism can achieve significant performance gains for a diverse set of real-world workloads, ranging from single-threaded workstation benchmarks to heavily multithreaded large scale decision support systems (DSS) using the IBM DB2 Universal Database. We measure a wall-clock speedup of 5.8% to 38.5% for the workstation benchmarks, and 5.0% to 12.7% on various queries in the DSS workload.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

John Shen - One of the best experts on this subject based on the ideXlab platform.

helper threads via virtual Multithreading on an experimental itanium 2 processor based platform

a realistic study on multithreaded superscalar processor design

Perry Wang - One of the best experts on this subject based on the ideXlab platform.

helper threads via virtual Multithreading on an experimental itanium 2 processor based platform

Susan J Eggers - One of the best experts on this subject based on the ideXlab platform.

An analysis of database workload performance on simultaneous multithreaded processors

exploiting choice instruction fetch and issue on an implementable simultaneous Multithreading processor

simultaneous Multithreading maximizing on chip parallelism

Ralf S Engelschall - One of the best experts on this subject based on the ideXlab platform.

portable Multithreading the signal stack trick for user space thread creation

the signal stack trick for user space thread creation

Bill Greene - One of the best experts on this subject based on the ideXlab platform.

helper threads via virtual Multithreading on an experimental itanium 2 processor based platform