I built a predictive model for football match stats (shots, corners, fouls) across 20,000 matches. The strongest predictor ended up being ELO from chess. [OC] Visualization

May 12, 2026
2 views
AC
By Alex Cartwright
I built a predictive model for football match stats (shots, corners, fouls) across 20,000 matches. The strongest predictor ended up being ELO from chess. [OC] Visualization
Click to enlarge

Data Analysis

What This Visualization Shows

This data visualization displays "I built a predictive model for football match stats (shots, corners, fouls) across 20,000 matches. The strongest predictor ended up being ELO from chess. [OC]" and provides a clear visual representation of the underlying data patterns and trends. The visualization focuses on For the past few months I've been working on a personal project: a predictive model for per-match football statistics. Not the final score, but the behaviors: how many shots each team will take, corners, fouls, cards. The dataset covers around 20,000 matches across five seasons and the top 5 European leagues.

I started with hundreds of variables: rolling shot averages, foul rates, corner frequencies, home/away splits, opponent profiles. Everything you'd expect. The first results were decent, but the model was essentially regressing toward each team's historical mean without any real understanding of match context. It could see that Team A averages 14 shots and Team B averages 11, but it had no concept of the gap between the two sides. It didn't know that tonight Team A is so much stronger they'll pin Team B in their own half for 70 minutes and probably end up with 19 shots while Team B scrapes together 6.

Historical averages are built against opponents of all quality levels. They encode nothing about the specific match being played, and that contextual read is exactly what every football fan processes automatically before kick-off. The hard part is giving a model a number for something so intuitive.

I ended up turning to chess. ELO ratings were invented in the 1960s by Arpad Elo to classify players more precisely than tournament standings alone. Beat someone stronger and your score rises significantly; lose to someone weaker and it drops. It updates after every game, with the only inputs being the result and the relative strength of the two players — no performance quality, no expected goals, just who won and against whom.

I built an ELO system for all clubs across the top 5 leagues, initialized from external sources and updated match by match through five seasons. When I added the ELO gap between the two teams as a predictor, things shifted immediately.

**Bivariate Spearman correlation with shots:**

|Predictor|Correlation| |:-|:-| |ELO gap|**0.377**| |Rolling shot average|0.273|

The chess number outperformed every football-specific variable in the model. And when you break it down by bucket, it's obvious why:

|ELO gap|Avg shots| |:-|:-| |< −200 (much weaker)|9.2| |−200 to −100|10.5| |−100 to −50|11.0| |±50 (balanced)|12.8| |\+50 to +100|13.0| |\+100 to +200|14.4| |\> +200 (much stronger)|**17.4**|

*Global average: 12.7 shots*

From 9.2 to 17.4 driven entirely by the strength gap — and no rolling average captures it, because rolling averages don't know who those shots were taken against. A team that faced three weak sides in a row will have inflated numbers; the ELO gap adjusts for that automatically.

200 variables, five years of data, six leagues, and the most important feature had nothing to do with football.

Happy to get into the methodology or the initialization choices in the comments., which allows us to understand complex relationships and insights within the data through visual storytelling.

Deep Dive into the Topic

This data visualization represents a sophisticated analysis of complex information patterns that provide valuable insights into underlying trends and relationships. Data visualization serves as a bridge between raw numerical data and human understanding, transforming abstract statistics into comprehensible visual narratives.

The power of data visualization lies in its ability to reveal patterns, outliers, and correlations that might not be apparent in traditional tabular formats. Through careful selection of chart types, color schemes, and interactive elements, effective visualizations can communicate complex information quickly and accurately to diverse audiences.

Modern data visualization combines statistical analysis with design principles to create compelling visual stories. This interdisciplinary approach requires understanding both the underlying data and the cognitive processes involved in visual perception. The result is more effective communication of quantitative insights that can inform decision-making and drive positive change.

Data Analysis and Insights

The patterns revealed in this visualization demonstrate the importance of systematic data analysis in understanding complex phenomena. By examining different data segments, time periods, and categorical breakdowns, we can identify trends that inform strategic planning and decision-making processes.

Statistical analysis of this data reveals variations across different dimensions that provide insights into underlying drivers and relationships. These patterns help identify areas of opportunity, potential risks, and key performance indicators that can guide future actions and resource allocation.

The analytical approach used in this visualization enables comparison across different categories, time periods, or geographic regions, revealing insights that support evidence-based decision-making. This type of analysis is essential for organizations seeking to optimize performance and understand complex market dynamics.

Significance and Applications

This data visualization has important implications for understanding trends and patterns that affect decision-making across multiple sectors. The insights derived from this analysis can inform policy development, business strategy, resource allocation, and operational improvements.

For analysts, researchers, and decision-makers, this type of data visualization provides essential insights for strategic planning and performance optimization. Whether addressing operational challenges, market analysis, or policy development, understanding data patterns helps create more effective strategies and solutions.

The broader significance lies in how this information contributes to our understanding of complex systems and relationships. This knowledge helps predict future trends, identify potential challenges, and develop more informed approaches to problem-solving and opportunity identification.

Comments

Loading comments...

Leave a Comment

0/500 characters

About the Author

Alex Cartwright

Alex Cartwright

Senior Data Visualization Expert

Alex Cartwright is a renowned data visualization specialist and infographic designer with over 15 years of experience in...

Infographic DesignData AnalysisVisual Communication
View Profile

Visualization Details

Published5/12/2026
CategoryData Analysis
TypeVisualization
Views2