J. M. Wozniak, T. G. Armstrong, K. Maheshwari, E. L. Lusk, D. S. Katz, M. Wilde, I. T. Foster, "Turbine: A Distributed-Memory Dataflow Engine for Extreme-Scale Many-Task Applications," Proceedings SWEET 2012, Scottsdale, AZ, 2012, . Also Preprint ANL/MCS-P2057-0312, March 2012. [pdf]
Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant program-ming challenge. One approach is to structure applications with an upper-layer of many loosely-coupled coarse-grained tasks, each comprising a tightly coupled parallel function or program. "Many-task" programming models such as func-tional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tighly-coupled parallelism at the lower level via multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and inter-task data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and dis-tributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with au-tomated self-distribution, and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.