An Inversion-Free Estimating Equation Approach for Gaussian Process Models

TitleAn Inversion-Free Estimating Equation Approach for Gaussian Process Models
Publication TypeJournal Article
Year of Publication2013
AuthorsAnitescu, M, Chen, J, Stein, ML
Other NumbersANL/MCS-P5078-0214
AbstractGaussian Processes are a common analysis tool in statistics and uncertainty quan- tification. The covariance function of the process is generally unknown and often assumed to fall into some parameteric class. One of the scalability bottlenecks for their large-scale usage is the computation of the maximum likelihood estimates of the parameters of the covariance matrix. In a classical approach this requires a Cholesky factorization of the dense covariance matrix for each optimization iteration. Recent approaches with stochastic approximations of the score equations [1, 30, 29] require only solving linear systems with the covariance matrix, which is a significant improve- ment but continues to be a nontrivial expense. In this work, we present an estimating equation approach for the maximum likelihood estimation of parameters. The distin- guishing feature of this approach is that no linear system needs to be solved with the covariance matrix. As a result, this approach requires only a small fraction of the computational effort of maximum likelihood calculations; for certain commonly used covariance models and data configurations, this approach results in fast and scalable calculations. We prove that when the covariance matrix has a bounded condition num- ber, our approach has the same convergence rate as does maximum likelihood in that the Godambe information matrix of the resulting estimator is at least as large as a fixed fraction of the Fisher information matrix. Moreover, our approach presents additional advantages compared with the previous ones [1, 30, 29], namely, the preservation of an optimization structure and the guarantee of finding global optima for covariance mod- els that are linear in the parameters. We demonstrate the effectiveness of the proposed approach on two synthetic examples of up to one million data points.