Statistical Methods for Modelling Complex Longitudinal Data with Applications in Cancer Pharmacogenetics and Ageing

Koukouli, Evanthia and Park, Juhyun and Titman, Andrew and Doebler, Stefanie (2022) Statistical Methods for Modelling Complex Longitudinal Data with Applications in Cancer Pharmacogenetics and Ageing. PhD thesis, Lancaster University.

[thumbnail of 2022koukouliphd]
Text (2022koukouliphd)
2022koukouliphd.pdf - Published Version

Download (9MB)


Technological and scientific advancements have promoted data gathering across multiple disciplines emphasizing the necessity for the development of rigorous statistical methods to draw conclusions. Longitudinal data is a key tool to study temporal changes, however, with the increasing data complexity, existing methodologies are often unable to capture non-linear or non-stationary trends. Additionally, irregularly collected, non-continuous or high-dimensional data make statistical analysis even more challenging. Through this work, we develop three statistical models to analyse complex longitudinal data from two real-world databases, the Genomics of Drug Sensitivity in Cancer and the English Longitudinal Study of Ageing. The first part of this work is motivated by the Genomics of Drug Sensitivity in Cancer project and focuses on the prediction and detection of biomarkers associated with anti-cancer drug dose-response. Here, the longitudinal data available are characterised by complete observed trajectories of drug response over multiple drug dosages which are potentially associated with high-dimensional covariates (these include expression profiles of tens of thousands of genes) in a non-stationary manner. These trends are not easily amenable to analysis by classic parametric or semi-parametric mixed models, especially if high dimensionality is present. We built a dose-varying regression model combined with a two-stage variable selection algorithm (variable screening followed by penalised regression) to identify genetic factors associated with drug response and estimate their effect over the varying dosages. The second part of this work is motivated by the English Longitudinal Study of Ageing data set. The longitudinal data available in this study are characterised by irregularly collected and, often, incomplete trajectories and many response variables of ordinal type which measure only a small number of ageing domains (data are derived from multiple questionnaires measuring multiple aspects of older peoples' life). The ultimate aim is to understand the ageing dynamics and study the interrelationships between factors associated with it. To do so, we first explore the theoretical foundations of ageing and the data set itself. Next, we adopt and extend the methodological framework of~\cite{dawson2018} to estimate the quantile dynamics and derive predictions for a common surrogate of ageing, frailty, addressing the problem of incomplete individual responses over the age interval of interest. Finally, we develop a bivariate Gaussian process framework for ordinal and potentially irregularly sampled data which allows the available questionnaire responses to be modelled directly. Here, the unobserved ageing domains are assumed to be smooth functions of age. This method allows the assessment of the interrelationships between several ageing domains after adjusting for individual variation across the observed longitudinal trajectories.

Item Type:
Thesis (PhD)
Uncontrolled Keywords:
?? longitudinal data analysisrepeated measurementsnon-parametric estimationgaussian processesquantile regressionenglish longitudinal study of ageinggenomics of drug sensitivity in cancervarying coefficients modelhigh-dimensional problemsmultivariate longitud ??
ID Code:
Deposited By:
Deposited On:
17 Oct 2022 09:40
Last Modified:
16 Jul 2024 06:00