Design of A Multithreaded Barnes-Hut Algorithm for Multicore Clusters

TitleDesign of A Multithreaded Barnes-Hut Algorithm for Multicore Clusters
Publication TypeReport
Year of Publication2013
AuthorsZhang, J, Behzad, B, Snir, M
Other NumbersANL/MCS-P4055-0313

We describe in this paper an implementation of the Barnes-Hut algorithm on multicore clusters. Based on a partitioned global address space (PGAS) library, the design integrates intranode multithreading and internode one-sided communication, exemplifying a PGAS + X programming style. Within a node, the computation is decomposed into tasks (subtasks), and multitasking is used to hide network latency. We study the tradeoffs between locality in private caches and locality in shared caches and bring into the design the insights gained. As a result, our implementation consumes less memory per core, invokes less internode communication, and enjoys better load balancing strategies. The final code achieves up to 41% performance improvement over a non-multithreaded counterpart from our previous work. Through detailed comparison, we also show its advantage over other well-known Barnes-Hut implementations, in programming complexity and performance.