Catalogue Retail Forecasting – An Empirical Evaluation of Linear Regression and K-Nearest Neighbours sensitivity to preprocessing

Rodriguez-Calderon, Carlos-Eduardo and Crone, Sven (2018) Catalogue Retail Forecasting – An Empirical Evaluation of Linear Regression and K-Nearest Neighbours sensitivity to preprocessing. Masters thesis, Lancaster University.

[thumbnail of 2018RodriguezCalderonMRes]

Text (2018RodriguezCalderonMRes)
2018RodriguezCalderonMRes.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.
Download (2MB)

Abstract

This study investigates demand forecasting in catalogue retailing, a significant yet under-researched domain within the broader retail sector. Despite the continued economic relevance of catalogue-based sales, particularly in direct selling industries such as cosmetics, existing forecasting literature remains limited in both methodological diversity and empirical rigor. This dissertation addresses these gaps by conducting a systematic empirical evaluation of two forecasting approaches—linear regression and k-nearest neighbours (k-NN)—with a particular focus on their sensitivity to different data preprocessing techniques. Using a real-world dataset from a multinational cosmetics company, comprising 1,765 product-level observations across three catalogue campaigns, the study develops and compares forecasting models under varying preprocessing conditions, including logarithmic transformation, feature encoding, scaling, and variable selection. The analysis employs out-of-sample testing and multiple error metrics (RMSE, MAE, and sMAPE) to ensure robust evaluation and comparability. The findings reveal that data preprocessing has a substantial impact on forecasting accuracy, with transformations such as log-scaling significantly improving the performance of linear regression models. While k-NN offers competitive results under certain preprocessing configurations, its sensitivity to feature scaling and parameter selection is notable. Overall, linear regression demonstrates more stable and interpretable performance, particularly when combined with appropriate preprocessing techniques. This research contributes to the literature by providing one of the first rigorous empirical comparisons of statistical and machine learning methods in catalogue retail forecasting, as well as highlighting the critical role of preprocessing in predictive modelling. The results offer practical implications for demand planners and open avenues for future research incorporating more advanced machine learning methods and extended datasets.

Item Type:

Thesis (Masters)

Uncontrolled Keywords:

Research Output Funding/no_not_funded

Subjects:

?? forecasting and predictionk-nearest neighbourslinear regressionmachine learning (ml)regression algorithmsdirect selling modelretail forecastingopen source softwareempirical evaluationr softwaredecision treesno - not fundednomanagement science and operatio ??

Departments:

Lancaster University Management School > Management Science

ID Code:

236661

Deposited By:

ep_importer_pure

Deposited On:

21 Apr 2026 09:55

Refereed?:

Published?:

Published

Last Modified:

03 Jun 2026 23:30

URI:

https://eprints.lancs.ac.uk/id/eprint/236661