Rodriguez-Calderon, Carlos-Eduardo and Crone, Sven (2018) Catalogue Retail Forecasting – An Empirical Evaluation of Linear Regression and K-Nearest Neighbours sensitivity to preprocessing. Masters thesis, Lancaster University.
2018RodriguezCalderonMRes.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.
Download (2MB)
Abstract
This study investigates demand forecasting in catalogue retailing, a significant yet under-researched domain within the broader retail sector. Despite the continued economic relevance of catalogue-based sales, particularly in direct selling industries such as cosmetics, existing forecasting literature remains limited in both methodological diversity and empirical rigor. This dissertation addresses these gaps by conducting a systematic empirical evaluation of two forecasting approaches—linear regression and k-nearest neighbours (k-NN)—with a particular focus on their sensitivity to different data preprocessing techniques. Using a real-world dataset from a multinational cosmetics company, comprising 1,765 product-level observations across three catalogue campaigns, the study develops and compares forecasting models under varying preprocessing conditions, including logarithmic transformation, feature encoding, scaling, and variable selection. The analysis employs out-of-sample testing and multiple error metrics (RMSE, MAE, and sMAPE) to ensure robust evaluation and comparability. The findings reveal that data preprocessing has a substantial impact on forecasting accuracy, with transformations such as log-scaling significantly improving the performance of linear regression models. While k-NN offers competitive results under certain preprocessing configurations, its sensitivity to feature scaling and parameter selection is notable. Overall, linear regression demonstrates more stable and interpretable performance, particularly when combined with appropriate preprocessing techniques. This research contributes to the literature by providing one of the first rigorous empirical comparisons of statistical and machine learning methods in catalogue retail forecasting, as well as highlighting the critical role of preprocessing in predictive modelling. The results offer practical implications for demand planners and open avenues for future research incorporating more advanced machine learning methods and extended datasets.