Baker, Jack and Fearnhead, Paul and Nemeth, Christopher and Fox, Emily B (2019) Large-scale Bayesian computation using Stochastic Gradient Markov Chain Monte Carlo. PhD thesis, Lancaster University.
2019JackBakerPhD.pdf - Published Version
Available under License Creative Commons Attribution-NoDerivs.
Download (1MB)
Abstract
Markov chain Monte Carlo (MCMC), one of the most popular methods for inference on Bayesian models, scales poorly with dataset size. This is because it requires one or more calculations over the full dataset at each iteration. Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular MCMC method that aims to be more scalable at large datasets. It only requires a subset of the full data at each iteration. This thesis builds upon the SGMCMC literature by providing contributions that improve the efficiency of SGMCMC; providing software that improves its ease-of-use; and removes large biases in the method for an important class of model. While SGMCMC has improved per-iteration computational cost over traditional MCMC, there have been empirical results suggesting that its overall computational cost (i.e. the cost for the algorithm to reach an arbitrary level of accuracy) is still $O(N)$, where $N$ is the dataset size. In light of this, we show how control variates can be used to develop an SGMCMC algorithm of $O(1)$, subject to two one-off preprocessing steps which each require a single pass through the dataset. While SGMCMC has gained significant popularity in the machine learning community, uptake among the statistics community has been slower. We suggest this may be due to lack of software, so as part of the contributions in this thesis we provide an R software package that automates much of the procedures required to build SGMCMC algorithms. Finally, we show that current algorithms for sampling from the simplex space using SGMCMC have inherent biases, especially when some of the parameter components are close to zero. To get around this, we develop an algorithm that is provably asymptotically unbiased. We empirically demonstrate its performance on a latent Dirichlet allocation model and a Dirichlet process model.