[OC] I trained old school ML to predict Australian Open 2026 semis & finals with some interesting results! Visualization
![[OC] I trained old school ML to predict Australian Open 2026 semis & finals with some interesting results! Visualization](/api/images/reddit-maps/1qwsdyq_1770314402126.jpg)
Data Analysis
What This Visualization Shows
This data visualization displays "[OC] I trained old school ML to predict Australian Open 2026 semis & finals with some interesting results!" and provides a clear visual representation of the underlying data patterns and trends. The visualization focuses on Hi everyone,
I generally don't follow tennis super closely but started following the AO this year for some reason. I was watching Novak's game with Musetti and was wondering why Prediction Markets had Novak winning by a significant margin. Notably, I saw that Novak until QF had not faced many ranked players so it didn't feel like he was tested in his journey as much. This made me feel like Novak had a good chance of losing the game.
Anyway, this inspired me to see if I could create an experiment to predict the rest of the games of AO based on historical data. I wanted to see if I could test generative AI to predict outcomes for semis & finals (think ChatGPT/Claude). While looking into this topic, I came across some research that suggested LLMs (i.e. ChatGPT/Claude) are supposedly *worse* at predicting outcomes from tabular data compared to traditional machine learning algos like XGBoost.
So I figured I’d test it out as a fun little experiment (obviously caution from taking any conclusion beyond entertainment value).
If you prefer the video version to this experiment here it is: [https://youtu.be/w38lFKLsxn0](https://youtu.be/w38lFKLsxn0)
I trained the XGBoost model with over 10K+ historical matches (2015-2025) and compared it head-to-head against Claude Opus 4.5 (Anthropic's latest LLM) for predicting AO 2026 outcomes.
**Experiment setup**
* These were the XGBoost features – rankings, H2H, surface win rates, recent form, age, opponent quality * Claude Opus 4.5 was given the same features + access to its training knowledge * Test set – round of 16 through Finals (Men's + Women's) + did some back testing on 2024 data * Real test – Semis & Finals for both men's and women's tourney
**Results**
* Both models: 72.7% accuracy (identical) * Upsets predicted: 0/5 (both missed all of them) * Biggest miss: Sinner vs Djokovic SF - both picked Sinner, Kalshi had him at 91%, Djokovic won
**Comparison vs Kalshi**
+--------------------+----------+--------+-------------+----------+ | Match | XGBoost | Claude | Kalshi | Actual | +--------------------+----------+--------+-------------+----------+ | Sinner vs Djokovic | Sinner | Sinner | 91% Sinner | Djokovic | | Sinner vs Zverev | Sinner | Sinner | 65% Sinner | Sinner | | Sabalenka vs Keys | Sabalenka| Saba. | 78% Saba. | Keys | +--------------------+----------+--------+-------------+----------+
Takeaways:
1. Even though Claude had some unfair advantages like its pre-training biases + knowing players’ names, it still did not out-perform XGBoost which is a simple tree-based model 2. Neither approach handles upsets well (the tail risk problem) 3. When Kalshi is at 91% and still wrong, maybe the edge isn't in better models but in identifying when consensus is overconfident
The video goes into more details of the results and my methodolofy if you're interested in checking it out! [https://youtu.be/w38lFKLsxn0](https://youtu.be/w38lFKLsxn0)
Would love your feedback on the experiment/video and I’m curious if anyone here has had better luck with upset detection or incorporating market odds as a feature rather than a benchmark., which allows us to understand complex relationships and insights within the data through visual storytelling.
Deep Dive into the Topic
This data visualization represents a sophisticated analysis of complex information patterns that provide valuable insights into underlying trends and relationships. Data visualization serves as a bridge between raw numerical data and human understanding, transforming abstract statistics into comprehensible visual narratives.
The power of data visualization lies in its ability to reveal patterns, outliers, and correlations that might not be apparent in traditional tabular formats. Through careful selection of chart types, color schemes, and interactive elements, effective visualizations can communicate complex information quickly and accurately to diverse audiences.
Modern data visualization combines statistical analysis with design principles to create compelling visual stories. This interdisciplinary approach requires understanding both the underlying data and the cognitive processes involved in visual perception. The result is more effective communication of quantitative insights that can inform decision-making and drive positive change.
Data Analysis and Insights
The patterns revealed in this visualization demonstrate the importance of systematic data analysis in understanding complex phenomena. By examining different data segments, time periods, and categorical breakdowns, we can identify trends that inform strategic planning and decision-making processes.
Statistical analysis of this data reveals variations across different dimensions that provide insights into underlying drivers and relationships. These patterns help identify areas of opportunity, potential risks, and key performance indicators that can guide future actions and resource allocation.
The analytical approach used in this visualization enables comparison across different categories, time periods, or geographic regions, revealing insights that support evidence-based decision-making. This type of analysis is essential for organizations seeking to optimize performance and understand complex market dynamics.
Significance and Applications
This data visualization has important implications for understanding trends and patterns that affect decision-making across multiple sectors. The insights derived from this analysis can inform policy development, business strategy, resource allocation, and operational improvements.
For analysts, researchers, and decision-makers, this type of data visualization provides essential insights for strategic planning and performance optimization. Whether addressing operational challenges, market analysis, or policy development, understanding data patterns helps create more effective strategies and solutions.
The broader significance lies in how this information contributes to our understanding of complex systems and relationships. This knowledge helps predict future trends, identify potential challenges, and develop more informed approaches to problem-solving and opportunity identification.
Comments
Loading comments...
Leave a Comment
About the Author

Alex Cartwright
Senior Data Visualization Expert
Alex Cartwright is a renowned data visualization specialist and infographic designer with over 15 years of experience in...