Putcha, Srshti and Fearnhead, Paul and Nemeth, Christopher (2024) Scalable Bayesian Inference Using Stochastic Gradient Markov Chain Monte Carlo. PhD thesis, Lancaster University.
Abstract
Bayesian inference offers a flexible framework to account for uncertainty across all unobserved quantities in a model. Markov chain Monte Carlo (MCMC) is a class of sampling algorithms which simulate from the Bayesian posterior distribution. These methods are generally regarded as the go-to computational technique for practical Bayesian modelling. MCMC is well-understood, offers (asymptotically) exact inference, and can be implemented intuitively. Samplers built upon the Metropolis-Hastings algorithm can benefit from strong theoretical guarantees under reasonable conditions. Derived from discrete-time approximations of Itô diffusions, gradient-based samplers (Roberts and Rosenthal, 1998; Neal, 2011) leverage local gradient information in their proposal, allowing for efficient exploration of the posterior. The most championed of the diffusion processes are the overdamped Langevin diffusion and Hamiltonian dynamics. In large data settings, standard MCMC can falter. The per-iteration cost of calculating the loglikelihood in the Metropolis-Hastings acceptance step scales with dataset size. Gradient-based samplers are doubly afflicted in this scenario, given that a full-data gradient is computed each iteration. These issues have prompted considerable interest in developing approaches for scalable Bayesian inference. This thesis proposes novel contributions for stochastic gradient MCMC (Welling and Teh, 2011; Ma et al., 2015; Nemeth and Fearnhead, 2021). Stochastic gradient MCMC utilises data subsampling to construct a noisy, unbiased estimate of the gradient of the log-posterior. The first two chapters review key background from the literature. Chapter 3 presents our first paper contribution. In this work, we extend stochastic gradient MCMC to time series, via non-linear, non-Gaussian state space models. Chapter 4 presents the second paper contribution of this thesis. Here, we examine the use of a preferential subsampling distribution to reweight the stochastic gradient and improve variance control. Chapter 5 evaluates the feasibility of using determinantal point processes (Kulesza et al., 2012) for data subsampling in SGLD. We conclude and propose directions for future work in Chapter 6.