DIY: Do-it-Yourself Analysis
An open-source package of scalable building blocks for data movement tailored to the needs of large-scale parallel analysis workloads
Installation (Linux, Mac, IBM and Cray supercomputers):
Download DIY with the following command:
git clone https://github.com/diatomic/diy2
and follow the instructions in the README.
Documentation can be found here.
Scalable, parallel analysis of data-intensive computational science relies on the decomposition
of the analysis problem among a large number of data-parallel subproblems, the efficient data
exchange among them, and data transport between them and the memory/storage hierarchy. The
abstraction enabling these capabilities is block-based parallelism; blocks and their message
queues are mapped onto processing elements (MPI processes or threads) and are migrated between
memory and storage by the DIY runtime. Configurable data partitioning, scalable data exchange,
and efficient parallel I/O are the main components of DIY. The current version, DIY2, has been
completely rewritten to support distributed- and shared-memory parallel algorithms that can run
both in- and out-of-core with the same code. The same program can be executed with one or more
threads per MPI process and with one or more data blocks resident in main memory.
Computational scientists, data analysis researchers, and visualization tool builders can all
benefit from these tools.
The figure above shows the overall structure of DIY and its use in higher-level libraries and applications. The I/O module provides efficient parallel algorithms for reading datasets from storage as well as writing analysis results to storage. The decomposition module supports block-structured and unstructured domain decomposition in 2D, 3D, and 4D, with flexible numbers of data blocks assigned to MPI processes. The communication module supports three configurable communication algorithms: nearest neighbor exchange, merge-based reduction, and swap-based reduction. The utilities module includes tools for creating DIY data types, lossless parallel compression, and parallel sorting.
DIY was tested in numerous analysis applications including parallel particle tracing, parallel
information theory, and parallel topological analysis; across several science domains including
fluid dynamics, astrophysics, and combustion. Results such as those above highlight a 2X performance
improvement in particle tracing, a 59% strong scaling efficiency in information theory, and a
35% end-to-end strong scaling efficiency in topological analysis. Additionally, this marks the
first time that the information entropy and Morse-Smale algorithms have been parallelized.
More information can be found in these slides and in this paper.
Please cite our LDAV'11 paper, pdf
DIY is a collaboration between Tom Peterka of Argonne National Laboratory
and Dmitriy Morozov of Lawrence Berkeley National Laboratory.