Multivariate response predictor selection methods : with applications to telecommunications time series data

Lowther, Aaron and Fearnhead, Paul and Nunes, Matthew and Jensen, Kjeld (2020) Multivariate response predictor selection methods : with applications to telecommunications time series data. PhD thesis, Lancaster University.

[thumbnail of 2019lowtherphd]
Text (2019lowtherphd)
2019lowtherphd.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (3MB)

Abstract

This thesis looks at developing a semi-automated approach to estimate multiple, sparse, linear regression models simultaneously. We are motivated by a telecommunications application and aim to produce interpretable models. Firstly, we generalise the best-subset problem which is often used to estimate sparse linear regression models. We call our problem the Simultaneous Best-Subset (SBS) problem and use it to simultaneously estimate multiple linear regression models. The so-called SBS approach produces models that perform more favorably in comparison to models estimated individually using the best-subset approach. We solve the SBS problem by formulating a Mixed Integer Quadratic Optimisation (MIQO) program which can often be solved quickly using an optimisation solver. The MIQO framework allows us to have some control over the regression models estimated which is desirable in an automated setting. Secondly, we propose a simultaneous shrinkage operator. This operator shrinks coefficients between models towards a common value. We show that this operator can further improve parameter estimation when simultaneously estimating multiple linear regression models. This operator was found to be particularly useful when noisy predictors entered the models. Thirdly, we show how the SBS approach can be integrated into a two-step semi-automated procedure for fitting REGression Seasonal AutoRegressive Integrated Moving Average (Reg-SARIMA) models. We apply this automated approach to estimate models for a telecommunications dataset and compare it to the current approach employed by our industrial collaborator. We show how the Reg-SARIMA models provide a better fit to the data, are more interpretable, and perform more favourably for future short-term predictions. In addition to this, the two-step procedure requires much less human intervention into the modelling procedure than procedures currently used by industry. Finally, we propose fast approaches to simultaneously estimate multiple sparse linear regression models. Using a simulation study we show that these approaches often produce models that perform as favorably as the SBS approach, despite producing models in far less time.

Item Type:
Thesis (PhD)
Subjects:
?? predictor selectiontime seriesmultivariate analysismultivariate linear regressionsarimaoptimization ??
ID Code:
141405
Deposited By:
Deposited On:
18 Feb 2020 16:15
Refereed?:
No
Published?:
Published
Last Modified:
15 Jan 2024 00:01