Prangle, Dennis and Fearnhead, Paul
(2011)
*Summary statistics and sequential methods for approximate Bayesian computation.*
PhD thesis, Lancaster University.

## Abstract

Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data to summary statistics of the observed data. This thesis looks at two related methodological issues for ABC. Firstly a method is proposed to construct appropriate summary statistics for ABC in a semi-automatic manner. The aim is to produce summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that, in some sense, optimal summary statistics are the posterior means of the parameters. While these cannot be calculated analytically, an extra stage of simulation is used to estimate how the posterior means vary as a function of the data, and these estimates are then used as summary statistics within ABC. Empirical results show that this is a robust method for choosing summary statistics, that can result in substantially more accurate ABC analyses than previous approaches in the literature. Secondly, ABC inference for multiple independent data sets is considered. If there are many such data sets, it is hard to choose summary statistics which capture the available information and are appropriate for general ABC methods. An alternative sequential ABC approach is proposed in which simulated and observed data are compared for each data set and combined to give overall results. Several algorithms are proposed and their theoretical properties studied, showing that exploiting ideas from the semi-automatic ABC theory produces consistent parameter estimation. Implementation details are discussed, with several simulation examples illustrating these and application to substantive inference problems.