Kourentzes, Nikolaos (2009) Input Variable Selection for Time Series Forecasting with Artificial Neural Networks: An Empirical Evaluation across Varying Time Series Frenquencies. PhD thesis, Lancaster University.
11003735.pdf - Published Version
Available under License Creative Commons Attribution-NoDerivs.
Download (9MB)
Abstract
Over the last two decades there has been an increase in the research of artificial neural networks (ANNs) to forecasting problems. Both in theoretical and empirical works, ANNs have shown evidence of good performance, in many cases outperforming established statistical benchmarks. This thesis starts by reviewing the advances in ANNs for time series forecasting, assessing their performance in the literature, analysing the current state of the art, the modelling issues that have been solved and which are still critical for forecasting with ANNs, thereby indicating future research directions. The specification of the input vector is identified as the most crucial unresolved modelling issue for ANNs' accuracy. Notably, there is no rigorous empirical evaluation of the multiple published input variable selection methodologies. This problem is addressed from four different perspectives. A rigorous evaluation of several published methodologies, along with new proposed variations, is performed on low frequency data, exploring which input variable selection methodologies perform best. This analysis concludes that regression based methodologies outperformed other linear and nonlinear ones. The best way to code deterministic seasonality in the inputs of the ANNs is explored, a topic overlooked in the ANN literature, and a parsimonious encoding based on seasonal indices is proposed. The effect of the frequency of the time series on specifying the inputs for ANNs for forecasting is evaluated, revealing several challenges in modelling high frequency time series and providing evidence that the performance of several input variable specification methodologies is not consistent for different data frequencies. This leads to an evaluation of methodologies to select input variables for ANNs solely for high frequency data. Regression based methodologies are found to perform best, in agreement with the evaluation on low frequency dataset, while the ranking of the remaining methodologies is found to be inconsistent for different data frequencies.