Data corruption may arise from a wide variety of sources from aging hardware to ionizing radiation, and the risk of corruption increases with
the computation scale. Corruptions may create failures, when execution crashes; or they may be silent, when the corruption remains undetected.
I studied solutions to silent data corruptions for numerical integration solvers, which are particularly sensitive to corruptions. Numerical integration
solvers are step-by-step methods that approximate the solution of a differential equation. Corruptions are not only propagated all along the
resolution, but the solution could even diverge. In numerical integration solvers, approximation error can be estimated at a low cost. I used these error estimates for detecting silent data corruptions in two high-performance applications in fault tolerance. On the one hand, I demonstrated a new lightweight detector for solvers with a
fixed integration step size. I mathematically showed that all corruptions affecting the accuracy of a simulation are detected by our method. On the other hand, solvers with a variable integration size can naturally reject silent data corruptions during the selection of the next integration size. I showed that this mechanism alone can miss too many corruptions, and I developed a mechanism to improve it.