Argonne National Laboratory

Can Exascale Computers Handle Extreme-Scale Real-Time Science Workflows?

As the volume and velocity of data generated by scientific experiments increase, research success in a growing number of scientific fields depends on the ability to analyze data rapidly. In many situations, scientists and engineers want quasi-instant feedback, so that results from one experiment can guide selection of the next—or even influence the course of a single experiment.

Complicating the situation is the fact that real-time computing activities at experimental facilities such as light sources and fusion tokamaks are tightly scheduled, with timing driven by factors ranging from the physical processes involved in an experiment to the travel schedules of on-site researchers. Thus, computing must often be available at a specific time, for a specific period, with a high degree of reliability.

The question thus arises: Will exascale computers run anticipated experimental science workloads effectively?

To answer this question, we will perform simulations to understand the impact of real-time science jobs, explore key scheduling techniques, and investigate new system-level capabilities required at the exascale hardware and system-software level to support such jobs.

The objective is to provide useful input to four stakeholders: (1) experimental scientists and user communities, on the benefits of communicating detailed priorities and preferences for real-time jobs to the scheduler; (2) policy makers, on the impacts on batch jobs and system utilization of supporting real-time jobs, and on the tradeoffs involved in different allocation and charging policies; (3) developers of scheduling algorithms, on the tradeoffs involved in adopting various scheduling techniques and optimization; and (4) exascale system builders, on system-level capabilities that can help accommodate more real-time jobs and/or reduce the negative impact for batch jobs and the overall system in accommodating real-time jobs.