Barrier Synchronization

The Experts below are selected from a list of 2361 Experts worldwide ranked by ideXlab platform

Dhabaleswar K Panda - One of the best experts on this subject based on the ideXlab platform.

a reliable hardware Barrier Synchronization scheme

International Parallel Processing Symposium, 1997

Co-Authors: Rajeev Sivaram, Craig B Stunkel, Dhabaleswar K Panda

Abstract:

Barrier Synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast Barrier Synchronization through software, hardware, or a combination of these mechanisms. However few of these schemes emphasize fault-tolerant Barrier operations. In this paper, we describe inexpensive support that can be added to network switches for achieving reliable hardware-based Barrier Synchronization while recovering from lost or corrupted messages. Necessary modifications to the switch architecture and the associated fault-tolerant message-passing protocols are presented. The protocols are optimized for the no-fault case while providing means to detect the failure of any step of the operation and to recover from it. The proposed scheme shows significant potential for use in parallel systems, especially the emerging systems based on networks of workstations.

15 days free trial to Access Article
fast Barrier Synchronization in wormhole k ary n cube networks with multidestination worms

High-Performance Computer Architecture, 1995

Co-Authors: Dhabaleswar K Panda

Abstract:

This paper presents a new approach to implement fast Barrier Synchronization in wormhole k-ary n-cubes. The novelty lies in using multidestination messages instead of the traditional single destination messages. Two different multidestination worm types, gather and broadcasting, are introduced to implement the report and wake-up phases of Barrier Synchronization, respectively. Algorithms for complete and arbitrary set Barrier Synchronization are presented using these new worms. It is shown that complete Barrier Synchronization in a k-ary n-cube system with e-cube routing can be implemented with 2n communication start-ups as compared to 2n log/sub 2/ k start-ups needed with unicast-based message passing. For arbitrary set Barrier, an interesting trend is observed where the Synchronization cost keeps on reducing beyond a certain number of participating nodes. >

15 days free trial to Access Article
HPCA - Fast Barrier Synchronization in wormhole k-ary n-cube networks with multidestination worms

1995

Co-Authors: Dhabaleswar K Panda

Abstract:

This paper presents a new approach to implement fast Barrier Synchronization in wormhole k-ary n-cubes. The novelty lies in using multidestination messages instead of the traditional single destination messages. Two different multidestination worm types, gather and broadcasting, are introduced to implement the report and wake-up phases of Barrier Synchronization, respectively. Algorithms for complete and arbitrary set Barrier Synchronization are presented using these new worms. It is shown that complete Barrier Synchronization in a k-ary n-cube system with e-cube routing can be implemented with 2n communication start-ups as compared to 2n log/sub 2/ k start-ups needed with unicast-based message passing. For arbitrary set Barrier, an interesting trend is observed where the Synchronization cost keeps on reducing beyond a certain number of participating nodes. >

15 days free trial to Access Article
Barrier Synchronization in distributed memory multiprocessors using rendezvous primitives

International Parallel Processing Symposium, 1993

Co-Authors: S K S Gupta, Dhabaleswar K Panda

Abstract:

This paper deals with Barrier Synchronization in wormhole routed distributed-memory multiprocessors. New rendezvous and multirendezvous Synchronization primitives are proposed to implement a Barrier between two and multiple processors, respectively. These primitives reduce the number of communication steps required to implement a Barrier; thus, significantly reducing the Synchronization overhead for networks with high communication start-up cost. Two algorithms for Barrier Synchronization on k-ary n-cube networks are presented. The rendezvous primitive allows one to synchronize all processors in nlog/sub 2/(k) steps. The multirendezvous primitive allows one to synchronize an arbitrary subset of processors in optimal number of communication steps depending on the ratio of the communication start-up (t/sub s/) to the link-propagation (t/sub p/) cost. >

15 days free trial to Access Article
IPPS - A reliable hardware Barrier Synchronization scheme

Proceedings 11th International Parallel Processing Symposium, 1

Co-Authors: Rajeev Sivaram, Craig B Stunkel, Dhabaleswar K Panda

Abstract:

Barrier Synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast Barrier Synchronization through software, hardware, or a combination of these mechanisms. However few of these schemes emphasize fault-tolerant Barrier operations. In this paper, we describe inexpensive support that can be added to network switches for achieving reliable hardware-based Barrier Synchronization while recovering from lost or corrupted messages. Necessary modifications to the switch architecture and the associated fault-tolerant message-passing protocols are presented. The protocols are optimized for the no-fault case while providing means to detect the failure of any step of the operation and to recover from it. The proposed scheme shows significant potential for use in parallel systems, especially the emerging systems based on networks of workstations.

15 days free trial to Access Article

Mikel Lujan - One of the best experts on this subject based on the ideXlab platform.

effective Barrier Synchronization on intel xeon phi coprocessor

European Conference on Parallel Processing, 2015

Co-Authors: Andrey Rodchenko, Andy Nisbet, Antoniu Pop, Mikel Lujan

Abstract:

Barriers are a fundamental Synchronization primitive, underpinning the parallel execution models of many modern shared-memory parallel programming languages such as OpenMP, OpenCL or Cilk, and are one of the main challenges to scaling. State-of-the-art Barrier Synchronization algorithms differ in tradeoffs between critical path length, communication traffic patterns and memory footprint. In this paper, we evaluate the efficiency of five such algorithms on the Intel Xeon Phi coprocessor. In addition, we present a novel hybrid Barrier implementation that exploits the topology, the memory hierarchy and streaming stores of the Xeon Phi architecture to achieve a 3\(\times \) lower overhead than the Intel OpenMP Barrier implementation (ICC 14.0.0), thus outperforming, to the best of our knowledge, all other implementations, and which we evaluate on the CG and MG kernels from the NAS Parallel Benchmarks, the direct N-body simulation kernel and the EPCC Barrier OpenMP microbenchmark. The optimized Barriers presented in the paper are available at https://github.com/arodchen/cBarriers released as free software.

15 days free trial to Access Article
Euro-Par - Effective Barrier Synchronization on Intel Xeon Phi Coprocessor

Lecture Notes in Computer Science, 2015

Co-Authors: Andrey Rodchenko, Andy Nisbet, Antoniu Pop, Mikel Lujan

Abstract:

Barriers are a fundamental Synchronization primitive, underpinning the parallel execution models of many modern shared-memory parallel programming languages such as OpenMP, OpenCL or Cilk, and are one of the main challenges to scaling. State-of-the-art Barrier Synchronization algorithms differ in tradeoffs between critical path length, communication traffic patterns and memory footprint. In this paper, we evaluate the efficiency of five such algorithms on the Intel Xeon Phi coprocessor. In addition, we present a novel hybrid Barrier implementation that exploits the topology, the memory hierarchy and streaming stores of the Xeon Phi architecture to achieve a 3\(\times \) lower overhead than the Intel OpenMP Barrier implementation (ICC 14.0.0), thus outperforming, to the best of our knowledge, all other implementations, and which we evaluate on the CG and MG kernels from the NAS Parallel Benchmarks, the direct N-body simulation kernel and the EPCC Barrier OpenMP microbenchmark. The optimized Barriers presented in the paper are available at https://github.com/arodchen/cBarriers released as free software.

15 days free trial to Access Article

Chungta King - One of the best experts on this subject based on the ideXlab platform.

designing tree based Barrier Synchronization on 2d mesh networks

IEEE Transactions on Parallel and Distributed Systems, 1998

Co-Authors: Jenqshyan Yang, Chungta King

Abstract:

In this paper, we consider a tree-based routing scheme for supporting Barrier Synchronization on scalable parallel computers with a 2D mesh network. Based on the characteristics of a standard programming interface, the scheme builds a collective Synchronization (CS) tree among the participating nodes using a distributed algorithm. When the routers are set up properly with the CS tree information, Barrier Synchronization can be accomplished very efficiently by passing simple messages. Performance evaluations show that our proposed method performs better than previous path-based approaches and is less sensitive to variations in group size and startup delay. However, our scheme has the extra overhead of building the CS tree. Thus, it is more suitable for parallel iterative computations in which the same Barrier is invoked repetitively.

15 days free trial to Access Article
Efficient Barrier Synchronization in wormhole-routed mesh networks supporting turn model

Parallel Computing, 1998

Co-Authors: Kuo-pao Fan, Chungta King

Abstract:

Barrier is an important Synchronization operation. On scalable parallel computers, it is often implemented as a collective communication. A typical Barrier Synchronization operation consists of a reduction operation followed by a distribution operation. In this paper, we introduce a systematic way of generating efficient algorithms to perform Barrier Synchronization in mesh networks. The scheme works with any base routing algorithm that is derivable from the turn model C.J. Glass, L.M. Ni, in: Proc. Intl. Symp. Computer Architecture, pp. 278–297. It extends the turn grouping method proposed by K.P. Fan, C.T. King, Turn grouping for supporting efficient multicast in wormhole mesh networks, in: Proc. 6th Symp. on the Frontiers of Massively Parallel Computing (Frontiers '96), October 1996 with two new algorithms, Tail_to_Central and Central_to_Tail. These two algorithms schedule the transmissions of Synchronization messages in the reduction and distribution phase respectively. Performance of the proposed method is evaluated using four typical turn-model based algorithms. The simulation results show that our approach can take advantage of the adaptivity of the base routing algorithms and outperforms methods proposed previously.

15 days free trial to Access Article
ICDCS - Hardware supports for efficient Barrier Synchronization on 2-D mesh networks

Proceedings of 16th International Conference on Distributed Computing Systems, 1996

Co-Authors: Jeng-shyan Yang, Chungta King

Abstract:

In this paper, we consider a hardware scheme for supporting Barrier Synchronization on scalable systems with a 2D mesh network. Our design takes into account of the program execution path in such systems-from programming interfaces down to routers. The hardware router design will be based on the MPI-1 standard. A distributed algorithm is proposed to construct a collective Synchronization tree (CS tree) from the nodes participating in the Barrier based upon the CS tree, the status registers in the routers are set up and Synchronization messages are transmitted along the paths set by the status registers. Performance evaluations show that our proposed method has better performance for Barrier Synchronization and is less sensitive to variations in group size and startup delay than previous approaches. However our scheme has the extra overhead of building the CS tree. Thus it is more suitable for parallel iterative computations, in which the same Barrier is invoked repetitively.

15 days free trial to Access Article

Jean-philippe Diguet - One of the best experts on this subject based on the ideXlab platform.

Broadcast Mechanism Based on Hybrid Wireless/Wired NoC for Efficient Barrier Synchronization in Parallel Computing

2020

Co-Authors: Hemanta Kumar Mondal, Navonil Chatterjee, Rodrigo Cataldo, Jean-philippe Diguet

Abstract:

Parallel computing is essential to achieve the manycore architecture performance potential, since it utilizes the parallel nature provided by the hardware for its computing. These applications will inevitably have to synchronize its parallel execution: for instance, broadcast operations for Barrier Synchronization. Conventional network-on-chip architectures for broadcast operations limit the performance as the Synchronization is affected significantly due to the critical path communications that increase the network latency and degrade the performance drastically. A Wireless network-on-chip offers a promising solution to reduce the critical path communication bottlenecks of such conventional architectures by providing hardware broadcast support. We propose efficient Barrier Synchronization support using hybrid wireless/wired NoC to reduce the cost of broadcast operations. The proposed architecture reduces the Barrier Synchronization cost up to 42.79% regarding network latency and saves up to 42.65% communication energy consumption for a subset of applications from the PARSEC benchmark.

15 days free trial to Access Article
broadcast mechanism based on hybrid wireless wired noc for efficient Barrier Synchronization in parallel computing

Asia and South Pacific Design Automation Conference, 2020

Co-Authors: Hemanta Kumar Mondal, Navonil Chatterjee, Rodrigo Cataldo, Jean-philippe Diguet

Abstract:

Parallel computing is essential to achieve the manycore architecture performance potential, since it utilizes the parallel nature provided by the hardware for its computing. These applications will inevitably have to synchronize its parallel execution: for instance, broadcast operations for Barrier Synchronization. Conventional network-on-chip architectures for broadcast operations limit the performance as the Synchronization is affected significantly due to the critical path communications that increase the network latency and degrade the performance drastically. A Wireless network-on-chip offers a promising solution to reduce the critical path communication bottlenecks of such conventional architectures by providing hardware broadcast support. We propose efficient Barrier Synchronization support using hybrid wireless/wired NoC to reduce the cost of broadcast operations. The proposed architecture reduces the Barrier Synchronization cost up to 42.79% regarding network latency and saves up to 42.65% communication energy consumption for a subset of applications from the PARSEC benchmark.

15 days free trial to Access Article
ASP-DAC - Broadcast Mechanism Based on Hybrid Wireless/Wired NoC for Efficient Barrier Synchronization in Parallel Computing

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), 2020

Co-Authors: Hemanta Kumar Mondal, Navonil Chatterjee, Rodrigo Cataldo, Jean-philippe Diguet

Abstract:

Parallel computing is essential to achieve the manycore architecture performance potential, since it utilizes the parallel nature provided by the hardware for its computing. These applications will inevitably have to synchronize its parallel execution: for instance, broadcast operations for Barrier Synchronization. Conventional network-on-chip architectures for broadcast operations limit the performance as the Synchronization is affected significantly due to the critical path communications that increase the network latency and degrade the performance drastically. A Wireless network-on-chip offers a promising solution to reduce the critical path communication bottlenecks of such conventional architectures by providing hardware broadcast support. We propose efficient Barrier Synchronization support using hybrid wireless/wired NoC to reduce the cost of broadcast operations. The proposed architecture reduces the Barrier Synchronization cost up to 42.79% regarding network latency and saves up to 42.65% communication energy consumption for a subset of applications from the PARSEC benchmark.

15 days free trial to Access Article
Broadcast- and Power-Aware Wireless NoC for Barrier Synchronization in Parallel Computing

2018

Co-Authors: Hemanta Kumar Mondal, Rodrigo Cataldo, Cesar Augusto Missio Marcon, Kevin Martin, Sujay Deb, Jean-philippe Diguet

Abstract:

Efficient Synchronization is one of the basic requirements of effective parallel computing. A key operation of the POSIX Thread standard (PThread) is Barrier Synchronization, where multiple threads block on a user-specified point of execution until all of them have reached it. Conventional architectures for broadcast operations limit the achievable performance benefits as Synchronization is significantly affected due to critical path communications. This increases the network latency and degrades the performance dramatically. A Wireless Network-on-Chip (WiNoC) offers a promising solution to reduce the long distance/critical path communication bottlenecks of conventional architectures by augmenting them with single hop, long-range wireless links. In this paper, we propose a power-aware broadcast enabled WiNoC architecture to reduce the cost of broadcast operations for Barrier-based applications. The proposed architecture reduces the Barrier Synchronization cost up to 43.97% regarding network latency under the PARSEC benchmarks. It also saves up to 80.49% idle-state power consumption in WIs for a 64-core system compared with the conventional WiNoC architecture without incurring significant overhead.

15 days free trial to Access Article
SoCC - Broadcast- and Power-Aware Wireless NoC for Barrier Synchronization in Parallel Computing

2018 31st IEEE International System-on-Chip Conference (SOCC), 2018

Co-Authors: Hemanta Kumar Mondal, Rodrigo Cataldo, Kevin Martin, Sujay Deb, Cesar Marcon, Jean-philippe Diguet

Abstract:

Efficient Synchronization is one of the basic requirements of effective parallel computing. A key operation of the POSIX Thread standard (PThread) is Barrier Synchronization, where multiple threads block on a user-specified point of execution until all of them have reached it. Conventional architectures for broadcast operations limit the achievable performance benefits as Synchronization is significantly affected due to critical path communications. This increases the network latency and degrades the performance dramatically. A Wireless Network-on-Chip (WiNoC) offers a promising solution to reduce the long distance/critical path communication bottlenecks of conventional architectures by augmenting them with single hop, long-range wireless links. In this paper, we propose a power-aware broadcast enabled WiNoC architecture to reduce the cost of broadcast operations for Barrier-based applications. The proposed architecture reduces the Barrier Synchronization cost up to 43.97% regarding network latency under the PARSEC benchmarks. It also saves up to 80.49% idle-state power consumption in WIs for a 64-core system compared with the conventional WiNoC architecture without incurring significant overhead.

15 days free trial to Access Article

Philip K. Mckinley - One of the best experts on this subject based on the ideXlab platform.

efficient implementation of Barrier Synchronization in wormhole routed hypercube multicomputers

International Conference on Distributed Computing Systems, 1992

Co-Authors: Philip K. Mckinley

Abstract:

Practical and efficient implementations of Barrier Synchronization for wormhole-routed hypercube multicomputers are presented. Both broadcast and multicast Barrier Synchronization are considered. For systems that do not support hardware broadcast or multicast, a software U-cube tree is proposed. This method generalizes to n-dimensional meshes. Performance measurements for several Barrier Synchronization techniques implemented on a 64-node nCUBE-2 are given. >

15 days free trial to Access Article
Efficient implementation of Barrier Synchronization in wormhole-routed hypercube multicomputers

Journal of Parallel and Distributed Computing, 1992

Co-Authors: Philip K. Mckinley

Abstract:

Abstract Efficient implementation of Barrier Synchronization is important to the performance of many parallel algorithms. This paper addresses Barrier Synchronization in wormhole-routed hypercube multicomputers. A broadcast Barrier involves all nodes in a system, whereas the more general multicast Barrier involves an arbitrary subset of nodes. Although performance of Barrier Synchronization can benefit from hardware-supported broadcast and multicast operations, many systems support only single-destination, or unicast, communication in hardware. For such systems, a novel software tree approach, the U-cube tree, is proposed as the basis of Barrier Synchronization. An important feature of the U-cube tree is that all messages injected into the network are guaranteed to be contention-free. Performance measurements of several Barrier Synchronization techniques implemented on a 64-node nCUBE-2 are given.

15 days free trial to Access Article
ICDCS - Efficient implementation of Barrier Synchronization in wormhole-routed hypercube multicomputers

[1992] Proceedings of the 12th International Conference on Distributed Computing Systems, 1

Co-Authors: Philip K. Mckinley

Abstract:

Practical and efficient implementations of Barrier Synchronization for wormhole-routed hypercube multicomputers are presented. Both broadcast and multicast Barrier Synchronization are considered. For systems that do not support hardware broadcast or multicast, a software U-cube tree is proposed. This method generalizes to n-dimensional meshes. Performance measurements for several Barrier Synchronization techniques implemented on a 64-node nCUBE-2 are given. >

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Dhabaleswar K Panda - One of the best experts on this subject based on the ideXlab platform.

a reliable hardware Barrier Synchronization scheme

fast Barrier Synchronization in wormhole k ary n cube networks with multidestination worms

HPCA - Fast Barrier Synchronization in wormhole k-ary n-cube networks with multidestination worms

Barrier Synchronization in distributed memory multiprocessors using rendezvous primitives

IPPS - A reliable hardware Barrier Synchronization scheme

Mikel Lujan - One of the best experts on this subject based on the ideXlab platform.

effective Barrier Synchronization on intel xeon phi coprocessor

Euro-Par - Effective Barrier Synchronization on Intel Xeon Phi Coprocessor

Chungta King - One of the best experts on this subject based on the ideXlab platform.

designing tree based Barrier Synchronization on 2d mesh networks

Efficient Barrier Synchronization in wormhole-routed mesh networks supporting turn model

ICDCS - Hardware supports for efficient Barrier Synchronization on 2-D mesh networks

Jean-philippe Diguet - One of the best experts on this subject based on the ideXlab platform.

Broadcast Mechanism Based on Hybrid Wireless/Wired NoC for Efficient Barrier Synchronization in Parallel Computing

broadcast mechanism based on hybrid wireless wired noc for efficient Barrier Synchronization in parallel computing

ASP-DAC - Broadcast Mechanism Based on Hybrid Wireless/Wired NoC for Efficient Barrier Synchronization in Parallel Computing

Broadcast- and Power-Aware Wireless NoC for Barrier Synchronization in Parallel Computing

SoCC - Broadcast- and Power-Aware Wireless NoC for Barrier Synchronization in Parallel Computing

Philip K. Mckinley - One of the best experts on this subject based on the ideXlab platform.

efficient implementation of Barrier Synchronization in wormhole routed hypercube multicomputers

Efficient implementation of Barrier Synchronization in wormhole-routed hypercube multicomputers

ICDCS - Efficient implementation of Barrier Synchronization in wormhole-routed hypercube multicomputers

Barrier Synchronization

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Dhabaleswar K Panda - One of the best experts on this subject based on the ideXlab platform.

Mikel Lujan - One of the best experts on this subject based on the ideXlab platform.

Chungta King - One of the best experts on this subject based on the ideXlab platform.

Jean-philippe Diguet - One of the best experts on this subject based on the ideXlab platform.

Philip K. Mckinley - One of the best experts on this subject based on the ideXlab platform.

Related terms