Publications
K. Iskra, K. Yoshii, R. Gupta, P. Beckman, "Advanced Virtual Memory for Exascale," Preprint ANL/MCS-P3010-0712, July 2012. [pdf]
By nearly every metric-cost, power consumption, capacity, bandwidth, and latency-memory is emerging as one of the most constrained resources on compute nodes of HPC systems [12]. These constraints will force future systems to abandon flat SMP memory space in favor of a more distributed NUMA design, even within a socket. With exponentially increasing numbers of cores per node, the overhead of global cache coherence is likely to exceed the usefulness of this feature, at which point cache coherence will be limited to multiple separate coherence domains within a CPU. Experimental designs such as Intel SCC [13] or NVIDIA Echelon [6] are early examples of this trend.
