Spearing, Harry and Tawn, Jonathan
(2023)
*A Ubiquitous Framework for Statistical Ranking Systems.*
PhD thesis, Lancaster University.

2023SpearingPhD.pdf - Published Version

Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (2MB)

## Abstract

Ranking systems are everywhere. The thesis will often select sports as its motivating applications, given their accessibility; however, schools and universities, harms of drugs, quality of wines, are all ranked, and all with arguably far greater importance. As such, the methodology is kept necessarily general throughout. In this thesis, a novel conceptual framework for statistical ranking systems is proposed, which separates ranking methodology into two distinct classes: absolute systems, and relative systems. Part I of the thesis deals with absolute systems, with a large portion of the methodology centred on extreme value theory. The methodology is applied to elite swimming, and a statistical ranking system is developed which ranks swimmers, based initially on their personal best times, across different swimming events. A challenge when using extreme value theory in practice is the small number of extreme data, which are by definition rare. By introducing a continuous data-driven covariate, the swim-time can be adjusted for the distance, gender category, or stroke, accordingly, and so allowing all data across all 34 individual events to be pooled into a single model. This results in more efficient inference, and therefore more precise estimates of physical quantities, such as the fastest time possible to swim a particular event. Further increasing inference efficiency, the model is then expanded to include data comprising all the performances of each swimmer, rather than just personal bests. The data therefore have a longitudinal structure, also known as panel data, containing repeated measurements from multiple independent subjects. This work serves as the first attempt at statistical modelling of the extremes of longitudinal data in general and the unique forms of dependence that naturally arise due to the structure of the data. The model can capture a range of extremal dependence structures (asymptotic dependence and asymptotic independence), with this characteristic determined by the data. With this longitudinal model, inference can be made about the careers of individual swimmers - such as the probability an individual will break the world record or swim the fastest time next year. In Part II, the thesis then addresses relative systems. Here, the focus is on incorporating intransitivity into statistical ranking systems. In transitive systems, an object A ranked higher than B implies that A is expected to exhibit preference over B. This is not true in intransitive systems, where pairwise relationships can differ from that which is expected from the underlying rankings alone. In some intransitive systems, a single underlying and unambiguous ranking may not even exist. The seminal Bradley-Terry model is expanded on to allow for intransitivity, and then applied to baseball data as a motivating example. It is found that baseball does indeed contain intransitive elements, and those pairs of teams exhibiting the largest degree of intransitivity are identified. Including intransitivity improves prediction performance for future pairwise comparisons. The thesis ultimately concludes by harmonising the two parts - acknowledging that in reality, there is always some relative element to an absolute system. Forging the armistice between these system types could enflame research into the areas connecting them, which until now remains barren.