P. Beckman, K. Iskra, K. Yoshii, and S. Coghlan, "The Influence of Operating Systems on the Performance of Collective Operations at Extreme Scale," Preprint ANL/MCS-P1345-0506, May 2006. [pdf]
We investigate noise introduced by the operating system, which we identify as one of the main reasons for a lack of synchronicity in parallel applications. Using a micro-benchmark, we measure the noise on several contemporary platforms, and find that even with a general-purpose operating system, noise can be quite limited. We then inject artificially generated noise into a massively parallel system and measure its influence on the performance of collective operations. Our experiments indicate that on extreme-scale platforms, the performance is correlated to the largest interruption to the application, even if that probability is extremely small. We demonstrate that synchronizing the noise can significantly reduce its negative influence.