Sperrin, Matthew (2010) Statistical methodology motivated by problems in genetics. PhD thesis, .
Sequencing the human genome has made vast amounts of potentially useful genetic data accessible. An important challenge in statistics is to develop methodology to extract information from this data. In this thesis, developments are made in two methodological areas that have wide applications in genetics. First, probabilistic methods to deal with the label switching problem in Bayesian mixture models are introduced. Mixture models are used in situations where populations may consist of a number of sub-populations, or as a semi-parametric modelling tool. The label switching problem can prevent meaningful interpretation of the output of Markov Chain Monte Carlo samplers. Specifically, inference on attributes specific to sub-populations can be difficult. Such attributes play an important role in understanding genetic effects. We introduce probabilistic relabelling strategies as a natural way of overcoming the label switching problem, and compare with existing strategies. The comparisons demonstrate that the advantages oered by probabilistic strategies come without loss in parameter estimation ability. Second, we introduce direct eect testing (DET), which is a novel method that distinguishes direct from indirect eects between binary predictors and a binary response. DET consists of two stages: the rst stage nds eects, the second stage infers the uncertainty in determining which predictors cause which eects. The method is useful when it is of interest to recover direct eects between a large number of predictors and the response. This is a common goal in genetics, where we are interested in the eects of variations in the genome on the prevalence of a phenotype. This work includes detailed simulations, comparing the ability of a number of methods at recovering direct eects. DET outperforms existing methods at recovering direct eects in situations where there is high correlation between predictors, and matches their performance when the correlation is moderate or small.
Actions (login required)