Abstract:A directed acyclic hypergraph is a generalized concept of a directed acyclic graph, where each hyperedge can contain an arbitrary number of tails and heads. Directed hypergraphs can be used to model data flow and execution dependencies in streaming applications. Thus, hypergraph partitioning algorithms can be used to obtain efficient parallelizations for multiprocessor architectures. However, an acyclicity constraint on the partition is necessary when mapping streaming applications to embedded multiprocessors due to resource restrictions on this type of hardware. The acyclic hypergraph partitioning problem is to partition the hypernodes of a directed acyclic hypergraph into a given number of blocks of roughly equal size such that the corresponding quotient graph is acyclic while minimizing an objective function on the partition. Here, we contribute the first n-level algorithm for the acyclic hypergraph partitioning problem. Our focus is on acyclic hypergraphs where hyperedges can have one head and arbitrary many tails. Based on this, we engineer a memetic algorithm to further reduce communication cost, as well as to improve scheduling makespan on embedded multiprocessor architectures. Experiments indicate that our algorithm outperforms previous algorithms that focus on the directed acyclic graph case which have previously been employed in the application domain. Moreover, our experiments indicate that using the directed hypergraph model for this type of application yields a significantly smaller makespan.
Abstract:Directed graphs are widely used to model data flow and execution dependencies in streaming applications. This enables the utilization of graph partitioning algorithms for the problem of parallelizing computation for multiprocessor architectures. However due to resource restrictions, an acyclicity constraint on the partition is necessary when mapping streaming applications to an embedded multiprocessor. Here, we contribute a multi-level algorithm for the acyclic graph partitioning problem. Based on this, we engineer an evolutionary algorithm to further reduce communication cost, as well as to improve load balancing and the scheduling makespan on embedded multiprocessor architectures.
Abstract:Graphs are widely used to model execution dependencies in applications. In particular, the NP-complete problem of partitioning a graph under constraints receives enormous attention by researchers because of its applicability in multiprocessor scheduling. We identified the additional constraint of acyclic dependencies between blocks when mapping computer vision and imaging applications to a heterogeneous embedded multiprocessor. Existing algorithms and heuristics do not address this requirement and deliver results that are not applicable for our use-case. In this work, we show that this more constrained version of the graph partitioning problem is NP-complete and present heuristics that achieve a close approximation of the optimal solution found by an exhaustive search for small problem instances and much better scalability for larger instances. In addition, we can show a positive impact on the schedule of a real imaging application that improves communication volume and execution time.