Methods for missing time-series data and large spatial data

Duncan, Rachael and Young, Paul and Nemeth, Christopher (2024) Methods for missing time-series data and large spatial data. PhD thesis, Lancaster University.

[thumbnail of 2024duncanphd]
Text (2024duncanphd) - Published Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (0B)
[thumbnail of 2024duncanphd]
Text (2024duncanphd) - Published Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (0B)
[thumbnail of 2024duncanphd]
Text (2024duncanphd)
2024duncanphd.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (4MB)

Abstract

Performing accurate statistical inference requires high-quality datasets. However, real-world datasets often contain missing variables of varying degrees both spatially and temporally. Alternatively, modelled datasets can provide a complete dataset, but these are often biased. This thesis derives a simplified approach to the skew Kalman filter that tackles the computational issues present in the existing skew Kalman filter by using a secondary dataset to estimate the skewness parameter. In application, this thesis implements the skew Kalman filter using surface-level ozone to bias-correct the modelled ozone data and use the bias-corrected data to infill missing data in the observed dataset. Further, this thesis explores working with large spatial datasets. When carrying out spatial inference, using all the possible data available allows for more accurate inference. However, spatial models such as Gaussian processes scale cubically with the number of data points and thus quickly become computationally infeasible for moderate to large datasets. Divide and-conquer methods allow data to be split into subsets and inference is carried out on each subset before recombining. While well documented in the independent setting, these methods are less popular in the spatial setting. This thesis evaluates the performance of divide-and-conquer methods in the spatial setting to achieve approximate results compared to carrying out inference on the full dataset. Finally, this is demonstrated using USA temperature data.

Item Type:
Thesis (PhD)
ID Code:
214397
Deposited By:
Deposited On:
13 Feb 2024 16:45
Refereed?:
No
Published?:
Published
Last Modified:
25 Feb 2024 00:25