Foundations of expected points in rugby union: a methodological approach

Martinez-Arastey, G., Datson, N., Smith, N. A. and Robins, M. T. (2025) Foundations of expected points in rugby union: a methodological approach. Journal of Sports Analytics. pp. 1-14. ISSN ISSN: 2215-020X

[thumbnail of Martinez-Arastey, G., Datson, N., Smith, N., & Robins, M. (2025). Foundations of expected points in rugby union: A methodological approach. Journal of Sports Analytics, 11.Copyright © The Author(s) 2025. Reprinted by permission of SAGE Publications.]
Preview
Text (Martinez-Arastey, G., Datson, N., Smith, N., & Robins, M. (2025). Foundations of expected points in rugby union: A methodological approach. Journal of Sports Analytics, 11.Copyright © The Author(s) 2025. Reprinted by permission of SAGE Publications.)
martinez-arastey-et-al-2025-foundations-of-expected-points-in-rugby-union-a-methodological-approach.pdf - Published Version
Available under License Creative Commons Attribution 4.0.

Download (912kB) | Preview

Abstract

This study explores the feasibility of an Expected Points metric for rugby union, aiming to shift performance analysis from descriptive indicators to a predictive metric of possession quality. Notational analysis was conducted on 132 Premiership Rugby matches, producing a dataset of 35,199 unique phases of play containing variables such as team in possession, pitch location, play type, score differences, time remaining and scoring outcomes. Four machine learning algorithms were explored to predict scoring outcomes: multinomial logistic regression, random forest, support vector machine and k-nearest neighbors. After extensive feature engineering and hyperparameter optimisation, the best-performing model achieved 39.7% accuracy, below a literature-derived baseline for practical usability (44.3%), making it unsuitable for applied contexts. A key challenge was predicting minority scoring outcomes due to severe class imbalance. SMOTE was explored to address this imbalance, resulting in a lower accuracy (35.7%) but an improved 34.4% F1-score. This study highlights the limitations of modelling scoring outcomes in open-play team sports, challenging the predominant positivist paradigm in sports performance analysis. The methodology provides critical foundational groundwork and a benchmark for future research to build upon. It recommends exploring advanced samplers for minority classes, expanded feature sets and alternative modelling techniques, such as recurrent neural networks.

Publication Type: Articles
Uncontrolled Keywords: sports performance analysis, key performance indicators, machine learning, predictive modelling, match analysis
Subjects: G Geography. Anthropology. Recreation > GV Recreation Leisure > GV557 Sports
G Geography. Anthropology. Recreation > GV Recreation Leisure > GV557 Sports > GV711 Coaching
Q Science > Q Science (General)
Divisions: Academic Areas > Institute of Sport
Research Entities > Centre for Health and Allied Sport and Exercise Science Research (CHASER)
Depositing User: Neal Smith
Date Deposited: 13 Oct 2025 09:18
Last Modified: 13 Oct 2025 09:18
URI: https://eprints.chi.ac.uk/id/eprint/8282

Actions (login required)

View Item
View Item
▲ Top

Our address

I’m looking for