J. W. Larson, "Can We Define Climate Using Information Theory?," Preprint ANL/MCS-P1738-0310, March 2010. [pdf]
The standard definition of climate is, by convention, based on a thirty-year sample. But why? One way to define the sampling period for constructing climatologies is to ask: What is a sufficient sample to construct probability density functions (PDF) for key meteorological variables? One method for judging the sufficiency of a sample to construct a PDF is to use information theory. I propose a framework for evaluating climatic sampling periods based on level of detail and associated uncertainties in the estimated PDF, the Shannon entropy growth curve and its discrete derivative, and Klullback-Leibler divergence-based statistics for quantifying the information gain as the sampling period is expanded by a specified amount. I apply this approach to daily data from the Central England Temperature (CET) record spanning the period 1772-2006. PDF estimation is performed by using an optimal binning technique derived from Bayesian principles to determine a uniform binning strategy that maximizes the posterior probability given the data sample; this technique identifies the known heavy truncation of the CET data and yields insight into the PDF structure with estimated uncertainties for a sampling period spanning 1-235 years. Ensemble-generated statistics from windowed resampling and Monte Carlo calculations of neighboring estimated PDFs are computed, resulting in confidence intervals for all the structural quantities in the framework. I use these statistics to compare the relative confidence associated with a number of popular sampling periods.