Graph Algorithms - Explore the Science & Experts

The Experts below are selected from a list of 159432 Experts worldwide ranked by ideXlab platform

Lawrence Rauchwerger - One of the best experts on this subject based on the ideXlab platform.

an algorithmic approach to communication reduction in parallel Graph Algorithms

International Conference on Parallel Architectures and Compilation Techniques, 2015

Co-Authors: Adam Fidel, Nancy M Amato, Lawrence Rauchwerger

Abstract:

Graph Algorithms on distributed-memory systems typically perform heavy communication, often limiting their scalability and performance. This work presents an approach to transparently (without programmer intervention) allow fine-grained Graph Algorithms to utilize algorithmic communication reduction optimizations. In many Graph Algorithms, the same information is communicated by a vertex to its neighbors, which we coin algorithmic redundancy. Our approach exploits algorithmic redundancy to reduce communication between vertices located on different processing elements. We employ algorithm-aware coarsening of messages sent during vertex visitation, reducing both the number of messages and the absolute amount of communication in the system. To achieve this, the system structure is represented by a hierarchical Graph, facilitating communication optimizations that can take into consideration the machine's memory hierarchy. We also present an optimization for small-world scale-free Graphs wherein hub vertices (i.e., vertices of very large degree) are represented in a similar hierarchical manner, which is exploited to increase parallelism and reduce communication. Finally, we present a framework that transparently allows fine-grained Graph Algorithms to utilize our hierarchical approach without programmer intervention, while improving scalability and performance. Experimental results of our proposed approach on 131,000+ cores show improvements of up to a factor of 8 times over the non-hierarchical version for various Graph mining and Graph analytics Algorithms.

15 days free trial to Access Article
a hierarchical approach to reducing communication in parallel Graph Algorithms

ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Co-Authors: Nancy M Amato, Lawrence Rauchwerger

Abstract:

Large-scale Graph computing has become critical due to the ever-increasing size of data. However, distributed Graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web Graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many Graph Algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input Graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5×-8× improvement for a variety of Graph Algorithms at 12,000+ cores.

15 days free trial to Access Article
PACT - An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms

2015 International Conference on Parallel Architecture and Compilation (PACT), 2015

Co-Authors: Harshvardhan, Adam Fidel, Nancy M Amato, Lawrence Rauchwerger

Abstract:

Graph Algorithms on distributed-memory systems typically perform heavy communication, often limiting their scalability and performance. This work presents an approach to transparently (without programmer intervention) allow fine-grained Graph Algorithms to utilize algorithmic communication reduction optimizations. In many Graph Algorithms, the same information is communicated by a vertex to its neighbors, which we coin algorithmic redundancy. Our approach exploits algorithmic redundancy to reduce communication between vertices located on different processing elements. We employ algorithm-aware coarsening of messages sent during vertex visitation, reducing both the number of messages and the absolute amount of communication in the system. To achieve this, the system structure is represented by a hierarchical Graph, facilitating communication optimizations that can take into consideration the machine's memory hierarchy. We also present an optimization for small-world scale-free Graphs wherein hub vertices (i.e., vertices of very large degree) are represented in a similar hierarchical manner, which is exploited to increase parallelism and reduce communication. Finally, we present a framework that transparently allows fine-grained Graph Algorithms to utilize our hierarchical approach without programmer intervention, while improving scalability and performance. Experimental results of our proposed approach on 131,000+ cores show improvements of up to a factor of 8 times over the non-hierarchical version for various Graph mining and Graph analytics Algorithms.

15 days free trial to Access Article

Andrew Lumsdaine - One of the best experts on this subject based on the ideXlab platform.

HiPC - Adaptive Runtime Features for Distributed Graph Algorithms

2018 IEEE 25th International Conference on High Performance Computing (HiPC), 2018

Co-Authors: Jesun Sahariar Firoz, Marcin Zalewski, Joshua Suetterlein, Andrew Lumsdaine

Abstract:

Performance of distributed Graph Algorithms can benefit greatly by forming rapport between algorithmic abstraction and the underlying runtime system that is responsible for scheduling work and exchanging messages. However, due to their dynamic and irregular nature of computation, distributed Graph Algorithms written in different programming models impose varying degrees of workload pressure on the runtime. To cope with such vastly different workload characteristics, a runtime has to make several trade-offs. One such trade-off arises, for example, when the runtime scheduler has to choose among alternatives such as whether to execute algorithmic work, or progress the network by probing network buffers, or throttle sending messages (termed flow control). This trade-off decides between optimizing the throughput of a runtime scheduler by increasing the rate of execution of algorithmic work, and reducing the latency of the network messages. Another trade-off exists when a decision has to be made about when to send aggregated messages in buffers (message coalescing). This decision chooses between trading off latency for network bandwidth and vice versa. At any instant, such trade-offs emphasize either on improving the quantity of work being executed (by maximizing the scheduler throughput) or on improving the quality of work (by prioritizing better work). However, encoding static policies for different runtime features (such as flow control, coalescing) can prevent Graph Algorithms from achieving their full potentials, thus can under-mine the actual performance of a distributed Graph algorithm . In this paper, we investigate runtime support for distributed Graph Algorithms in the context of two paradigms: variants of well-known Bulk-Synchronous Parallel model and asynchronous programming model. We explore generic runtime features such as message coalescing (aggregation) and flow control and show that execution policies of these features need to be adjusted over time to make a positive impact on the execution time of a distributed Graph algorithm. Since synchronous and asynchronous Graph Algorithms have different workload characteristics, not all of such runtime features may be good candidates for adaptation. Each of these algorithmic paradigms may require different set of features to be adapted over time. We demonstrate which set of feature(s) can be useful in each case to achieve the right balance of work in the runtime layer. Existing implementation of different Graph Algorithms can benefit from adapting dynamic policies in the underlying runtime.

15 days free trial to Access Article
HiPC - Synchronization-Avoiding Graph Algorithms

2018 IEEE 25th International Conference on High Performance Computing (HiPC), 2018

Co-Authors: Jesun Sahariar Firoz, Marcin Zalewski, Thejaka Amila Kanewala, Andrew Lumsdaine

Abstract:

Because they were developed for optimal sequential complexity, classical Graph Algorithms as found in textbooks have strictly-defined orders of operations. Enforcing a prescribed order of operations, or even an approximate order, in a distributed memory setting requires significant amounts of synchronization, which in turn can severely limit scalability. As a result, new Algorithms are typically required to achieve scalable performance, even for solving well-known Graph problems. Yet, even in these cases, parallel Graph Algorithms are written according to parallel programming models that evolved for, e.g., scientific computing, and that still have inherent, and scalability-limiting, amounts of synchronization. In this paper we present a new approach to parallel Graph Algorithms: synchronization-avoiding Algorithms. To eliminate synchronization and its associated overhead, synchronization-avoiding Algorithms perform work in an unordered and fully asynchronous fashion in such a way that the result is constantly refined toward its final state. "Wasted" work is minimized by locally prioritizing tasks using problem-dependent task utility metrics. We classify Algorithms for Graph applications into two broad categories: Algorithms with monotonic updates (which evince global synchronization) and Algorithms with non-monotonic updates (which evince vertex-centric synchronization). We apply our approach to both classes and develop novel, synchronization-avoiding Algorithms for solving exemplar problems: SSSP and connected components for the former, Graph coloring for the latter. We demonstrate that eliminating synchronization in conjunction with effective scheduling policies and optimizations in the runtime results in improved scalability for both classes of Algorithms.

15 days free trial to Access Article
importance of runtime considerations in performance engineering of large scale distributed Graph Algorithms

European Conference on Parallel Processing, 2015

Co-Authors: Jesun Sahariar Firoz, Marcin Zalewski, Thejaka Amila Kanewala, Martina Barnas, Andrew Lumsdaine

Abstract:

Due to the ever increasing complexity of the modern supercomputers, performance analysis of irregular applications became an experimental endeavor. We show that runtime considerations are inseparable from algorithmic concerns in performance engineering of large-scale distributed Graph Algorithms, and we argue that the whole system stack, starting with the algorithm at the top down to low-level communication libraries must be considered.

15 days free trial to Access Article
Generic programming for Graph Algorithms

2000

Co-Authors: Jeremy G. Siek, Lie-quan Lee, Andrew Lumsdaine

Abstract:

The Standard Template Library has established a solid foundation for the development of reusableAlgorithms and data structures in C++. It has provided programmers with a way to think about designingreusable components (generic programming), and has demonstrated the programming techniques necessaryto build efficient implementations. However, there are many problem domains beyond those addressed bythe STL; consequently, there are many opportunities for applying generic programming. One particularlyimportant domain is that of Graph Algorithms and data structures. The Graph abstraction is widely used tomodel structures and relationships in many fields. Graph Algorithms are extremely important in such diverseapplication areas as design automation, transportation, optimization, and databases. Our own interest inGraph Algorithms originates with our work on sparse matrix ordering Algorithms for scientific computing.The domain of Graph Algorithms is ripe for the application of generic programming. There is a large existingbody of useful Algorithms, yet the number of ways that people use to represent Graphs in memory almostmatches the number of applications that use Graphs. The ability to freely interchange Graph Algorithms withGraph representations would be an important contribution to the field, and this is what generic programminghas to offer.In January, 1999, we did a survey of existing Graph libraries. Some of the libraries we looked at were LEDA(by Kurt Mehlhorn and Stefan Naeher, http:// www.mpi-sb.mpg.de/LEDA/leda.html), the Graph TemplateLibrary (GTL) (by Michael Forster, Andreas Pick, and Marcus Raitner, http://www.fmi.uni-passau.de/Graphlet/GTL/), Combinatorica (see

15 days free trial to Access Article

Martin Russling - One of the best experts on this subject based on the ideXlab platform.

Deriving a class of layer-oriented Graph Algorithms

Science of Computer Programming, 1996

Co-Authors: Martin Russling

Abstract:

AbstractWe survey an algebra of formal languages suitable to deal with Graph Algorithms. As an example of its use we derive a general scheme for layer-oriented Graph traversal. This general scheme is then applied to a reachability and a shortest path problem

15 days free trial to Access Article
Shorter paths to Graph Algorithms

Science of Computer Programming, 1994

Co-Authors: Bernhard Möller, Martin Russling

Abstract:

AbstractWe illustrate the use of formal languages and relations in compact formal derivations of some Graph Algorithms

15 days free trial to Access Article
MPC - Shorter Paths to Graph Algorithms

1992

Co-Authors: Bernhard Möller, Martin Russling

Abstract:

We illustrate the use of formal languages and relations in compact formal derivations of some Graph Algorithms.

15 days free trial to Access Article

Ramin Zabih - One of the best experts on this subject based on the ideXlab platform.

Dynamic programming and Graph Algorithms in computer vision

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011

Co-Authors: Pedro F. Felzenszwalb, Ramin Zabih

Abstract:

Optimization is a powerful paradigm for expressing and solving problems in a wide range of areas, and has been successfully applied to many vision problems. Discrete optimization techniques are especially interesting since, by carefully exploiting problem structure, they often provide nontrivial guarantees concerning solution quality. In this paper, we review dynamic programming and Graph Algorithms, and discuss representative examples of how these discrete optimization techniques have been applied to some classical vision problems. We focus on the low-level vision problem of stereo, the mid-level problem of interactive object segmentation, and the high-level problem of model-based recognition.

15 days free trial to Access Article

John Feo - One of the best experts on this subject based on the ideXlab platform.

on the architectural requirements for efficient execution of Graph Algorithms

International Conference on Parallel Processing, 2005

Co-Authors: David A Bader, Guojing Cong, John Feo

Abstract:

Combinatorial problems such as those from Graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such Algorithms. Few parallel Graph Algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two Graph Algorithms, list ranking and connected components, on two classes of shared-memory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. While previous studies have shown that parallel Graph Algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance. The MTA's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism. Since parallel Graph Algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA. We describe and give a performance model for each architecture. We analyze the performance of the two Algorithms and discuss how the features of each architecture affects algorithm development, ease of programming, performance, and scalability.

15 days free trial to Access Article
ICPP - On the architectural requirements for efficient execution of Graph Algorithms

2005 International Conference on Parallel Processing (ICPP'05), 1

Co-Authors: David A Bader, Guojing Cong, John Feo

Abstract:

Combinatorial problems such as those from Graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such Algorithms. Few parallel Graph Algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two Graph Algorithms, list ranking and connected components, on two classes of shared-memory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. While previous studies have shown that parallel Graph Algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance. The MTA's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism. Since parallel Graph Algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA. We describe and give a performance model for each architecture. We analyze the performance of the two Algorithms and discuss how the features of each architecture affects algorithm development, ease of programming, performance, and scalability.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Graph Algorithms with ideXlab!

Lawrence Rauchwerger - One of the best experts on this subject based on the ideXlab platform.

an algorithmic approach to communication reduction in parallel Graph Algorithms

a hierarchical approach to reducing communication in parallel Graph Algorithms

PACT - An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms

Andrew Lumsdaine - One of the best experts on this subject based on the ideXlab platform.

HiPC - Adaptive Runtime Features for Distributed Graph Algorithms

HiPC - Synchronization-Avoiding Graph Algorithms

importance of runtime considerations in performance engineering of large scale distributed Graph Algorithms

Generic programming for Graph Algorithms

Martin Russling - One of the best experts on this subject based on the ideXlab platform.

Deriving a class of layer-oriented Graph Algorithms

Shorter paths to Graph Algorithms

MPC - Shorter Paths to Graph Algorithms

Ramin Zabih - One of the best experts on this subject based on the ideXlab platform.

Dynamic programming and Graph Algorithms in computer vision

John Feo - One of the best experts on this subject based on the ideXlab platform.

on the architectural requirements for efficient execution of Graph Algorithms

ICPP - On the architectural requirements for efficient execution of Graph Algorithms