Graph Algorithms

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 159432 Experts worldwide ranked by ideXlab platform

Lawrence Rauchwerger - One of the best experts on this subject based on the ideXlab platform.

  • an algorithmic approach to communication reduction in parallel Graph Algorithms
    International Conference on Parallel Architectures and Compilation Techniques, 2015
    Co-Authors: Adam Fidel, Nancy M Amato, Lawrence Rauchwerger
    Abstract:

    Graph Algorithms on distributed-memory systems typically perform heavy communication, often limiting their scalability and performance. This work presents an approach to transparently (without programmer intervention) allow fine-grained Graph Algorithms to utilize algorithmic communication reduction optimizations. In many Graph Algorithms, the same information is communicated by a vertex to its neighbors, which we coin algorithmic redundancy. Our approach exploits algorithmic redundancy to reduce communication between vertices located on different processing elements. We employ algorithm-aware coarsening of messages sent during vertex visitation, reducing both the number of messages and the absolute amount of communication in the system. To achieve this, the system structure is represented by a hierarchical Graph, facilitating communication optimizations that can take into consideration the machine's memory hierarchy. We also present an optimization for small-world scale-free Graphs wherein hub vertices (i.e., vertices of very large degree) are represented in a similar hierarchical manner, which is exploited to increase parallelism and reduce communication. Finally, we present a framework that transparently allows fine-grained Graph Algorithms to utilize our hierarchical approach without programmer intervention, while improving scalability and performance. Experimental results of our proposed approach on 131,000+ cores show improvements of up to a factor of 8 times over the non-hierarchical version for various Graph mining and Graph analytics Algorithms.

  • a hierarchical approach to reducing communication in parallel Graph Algorithms
    ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015
    Co-Authors: Nancy M Amato, Lawrence Rauchwerger
    Abstract:

    Large-scale Graph computing has become critical due to the ever-increasing size of data. However, distributed Graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web Graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many Graph Algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input Graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5×-8× improvement for a variety of Graph Algorithms at 12,000+ cores.

  • PACT - An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms
    2015 International Conference on Parallel Architecture and Compilation (PACT), 2015
    Co-Authors: Harshvardhan, Adam Fidel, Nancy M Amato, Lawrence Rauchwerger
    Abstract:

    Graph Algorithms on distributed-memory systems typically perform heavy communication, often limiting their scalability and performance. This work presents an approach to transparently (without programmer intervention) allow fine-grained Graph Algorithms to utilize algorithmic communication reduction optimizations. In many Graph Algorithms, the same information is communicated by a vertex to its neighbors, which we coin algorithmic redundancy. Our approach exploits algorithmic redundancy to reduce communication between vertices located on different processing elements. We employ algorithm-aware coarsening of messages sent during vertex visitation, reducing both the number of messages and the absolute amount of communication in the system. To achieve this, the system structure is represented by a hierarchical Graph, facilitating communication optimizations that can take into consideration the machine's memory hierarchy. We also present an optimization for small-world scale-free Graphs wherein hub vertices (i.e., vertices of very large degree) are represented in a similar hierarchical manner, which is exploited to increase parallelism and reduce communication. Finally, we present a framework that transparently allows fine-grained Graph Algorithms to utilize our hierarchical approach without programmer intervention, while improving scalability and performance. Experimental results of our proposed approach on 131,000+ cores show improvements of up to a factor of 8 times over the non-hierarchical version for various Graph mining and Graph analytics Algorithms.

Andrew Lumsdaine - One of the best experts on this subject based on the ideXlab platform.

  • HiPC - Adaptive Runtime Features for Distributed Graph Algorithms
    2018 IEEE 25th International Conference on High Performance Computing (HiPC), 2018
    Co-Authors: Jesun Sahariar Firoz, Marcin Zalewski, Joshua Suetterlein, Andrew Lumsdaine
    Abstract:

    Performance of distributed Graph Algorithms can benefit greatly by forming rapport between algorithmic abstraction and the underlying runtime system that is responsible for scheduling work and exchanging messages. However, due to their dynamic and irregular nature of computation, distributed Graph Algorithms written in different programming models impose varying degrees of workload pressure on the runtime. To cope with such vastly different workload characteristics, a runtime has to make several trade-offs. One such trade-off arises, for example, when the runtime scheduler has to choose among alternatives such as whether to execute algorithmic work, or progress the network by probing network buffers, or throttle sending messages (termed flow control). This trade-off decides between optimizing the throughput of a runtime scheduler by increasing the rate of execution of algorithmic work, and reducing the latency of the network messages. Another trade-off exists when a decision has to be made about when to send aggregated messages in buffers (message coalescing). This decision chooses between trading off latency for network bandwidth and vice versa. At any instant, such trade-offs emphasize either on improving the quantity of work being executed (by maximizing the scheduler throughput) or on improving the quality of work (by prioritizing better work). However, encoding static policies for different runtime features (such as flow control, coalescing) can prevent Graph Algorithms from achieving their full potentials, thus can under-mine the actual performance of a distributed Graph algorithm . In this paper, we investigate runtime support for distributed Graph Algorithms in the context of two paradigms: variants of well-known Bulk-Synchronous Parallel model and asynchronous programming model. We explore generic runtime features such as message coalescing (aggregation) and flow control and show that execution policies of these features need to be adjusted over time to make a positive impact on the execution time of a distributed Graph algorithm. Since synchronous and asynchronous Graph Algorithms have different workload characteristics, not all of such runtime features may be good candidates for adaptation. Each of these algorithmic paradigms may require different set of features to be adapted over time. We demonstrate which set of feature(s) can be useful in each case to achieve the right balance of work in the runtime layer. Existing implementation of different Graph Algorithms can benefit from adapting dynamic policies in the underlying runtime.

  • HiPC - Synchronization-Avoiding Graph Algorithms
    2018 IEEE 25th International Conference on High Performance Computing (HiPC), 2018
    Co-Authors: Jesun Sahariar Firoz, Marcin Zalewski, Thejaka Amila Kanewala, Andrew Lumsdaine
    Abstract:

    Because they were developed for optimal sequential complexity, classical Graph Algorithms as found in textbooks have strictly-defined orders of operations. Enforcing a prescribed order of operations, or even an approximate order, in a distributed memory setting requires significant amounts of synchronization, which in turn can severely limit scalability. As a result, new Algorithms are typically required to achieve scalable performance, even for solving well-known Graph problems. Yet, even in these cases, parallel Graph Algorithms are written according to parallel programming models that evolved for, e.g., scientific computing, and that still have inherent, and scalability-limiting, amounts of synchronization. In this paper we present a new approach to parallel Graph Algorithms: synchronization-avoiding Algorithms. To eliminate synchronization and its associated overhead, synchronization-avoiding Algorithms perform work in an unordered and fully asynchronous fashion in such a way that the result is constantly refined toward its final state. "Wasted" work is minimized by locally prioritizing tasks using problem-dependent task utility metrics. We classify Algorithms for Graph applications into two broad categories: Algorithms with monotonic updates (which evince global synchronization) and Algorithms with non-monotonic updates (which evince vertex-centric synchronization). We apply our approach to both classes and develop novel, synchronization-avoiding Algorithms for solving exemplar problems: SSSP and connected components for the former, Graph coloring for the latter. We demonstrate that eliminating synchronization in conjunction with effective scheduling policies and optimizations in the runtime results in improved scalability for both classes of Algorithms.

  • importance of runtime considerations in performance engineering of large scale distributed Graph Algorithms
    European Conference on Parallel Processing, 2015
    Co-Authors: Jesun Sahariar Firoz, Marcin Zalewski, Thejaka Amila Kanewala, Martina Barnas, Andrew Lumsdaine
    Abstract:

    Due to the ever increasing complexity of the modern supercomputers, performance analysis of irregular applications became an experimental endeavor. We show that runtime considerations are inseparable from algorithmic concerns in performance engineering of large-scale distributed Graph Algorithms, and we argue that the whole system stack, starting with the algorithm at the top down to low-level communication libraries must be considered.

  • Generic programming for Graph Algorithms
    2000
    Co-Authors: Jeremy G. Siek, Lie-quan Lee, Andrew Lumsdaine
    Abstract:

    The Standard Template Library has established a solid foundation for the development of reusableAlgorithms and data structures in C++. It has provided programmers with a way to think about designingreusable components (generic programming), and has demonstrated the programming techniques necessaryto build efficient implementations. However, there are many problem domains beyond those addressed bythe STL; consequently, there are many opportunities for applying generic programming. One particularlyimportant domain is that of Graph Algorithms and data structures. The Graph abstraction is widely used tomodel structures and relationships in many fields. Graph Algorithms are extremely important in such diverseapplication areas as design automation, transportation, optimization, and databases. Our own interest inGraph Algorithms originates with our work on sparse matrix ordering Algorithms for scientific computing.The domain of Graph Algorithms is ripe for the application of generic programming. There is a large existingbody of useful Algorithms, yet the number of ways that people use to represent Graphs in memory almostmatches the number of applications that use Graphs. The ability to freely interchange Graph Algorithms withGraph representations would be an important contribution to the field, and this is what generic programminghas to offer.In January, 1999, we did a survey of existing Graph libraries. Some of the libraries we looked at were LEDA(by Kurt Mehlhorn and Stefan Naeher, http:// www.mpi-sb.mpg.de/LEDA/leda.html), the Graph TemplateLibrary (GTL) (by Michael Forster, Andreas Pick, and Marcus Raitner, http://www.fmi.uni-passau.de/Graphlet/GTL/), Combinatorica (see

Martin Russling - One of the best experts on this subject based on the ideXlab platform.

Ramin Zabih - One of the best experts on this subject based on the ideXlab platform.

  • Dynamic programming and Graph Algorithms in computer vision
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011
    Co-Authors: Pedro F. Felzenszwalb, Ramin Zabih
    Abstract:

    Optimization is a powerful paradigm for expressing and solving problems in a wide range of areas, and has been successfully applied to many vision problems. Discrete optimization techniques are especially interesting since, by carefully exploiting problem structure, they often provide nontrivial guarantees concerning solution quality. In this paper, we review dynamic programming and Graph Algorithms, and discuss representative examples of how these discrete optimization techniques have been applied to some classical vision problems. We focus on the low-level vision problem of stereo, the mid-level problem of interactive object segmentation, and the high-level problem of model-based recognition.

John Feo - One of the best experts on this subject based on the ideXlab platform.

  • on the architectural requirements for efficient execution of Graph Algorithms
    International Conference on Parallel Processing, 2005
    Co-Authors: David A Bader, Guojing Cong, John Feo
    Abstract:

    Combinatorial problems such as those from Graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such Algorithms. Few parallel Graph Algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two Graph Algorithms, list ranking and connected components, on two classes of shared-memory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. While previous studies have shown that parallel Graph Algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance. The MTA's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism. Since parallel Graph Algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA. We describe and give a performance model for each architecture. We analyze the performance of the two Algorithms and discuss how the features of each architecture affects algorithm development, ease of programming, performance, and scalability.

  • ICPP - On the architectural requirements for efficient execution of Graph Algorithms
    2005 International Conference on Parallel Processing (ICPP'05), 1
    Co-Authors: David A Bader, Guojing Cong, John Feo
    Abstract:

    Combinatorial problems such as those from Graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such Algorithms. Few parallel Graph Algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two Graph Algorithms, list ranking and connected components, on two classes of shared-memory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. While previous studies have shown that parallel Graph Algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance. The MTA's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism. Since parallel Graph Algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA. We describe and give a performance model for each architecture. We analyze the performance of the two Algorithms and discuss how the features of each architecture affects algorithm development, ease of programming, performance, and scalability.