Down the xG Rabbit Hole: Powering Data to Empower You
I decided to jump straight into xG for my first article for a couple of reasons. If you've watch a game recently, chances are you've seen this advanced data used in one way or another. If not, xG, also known as expected goals, relies on machine learning to provide a probability that a shot will result in a goal based on a variety of factors (distance, angle, clarity, etc.) This has been the most popular stat in football(soccer) analytics over the past decade. I've seen some really awesome implementations of it. I've also seen it consistently misused, so I decided to start by providing a practical use for xG in your day-to-day analysis. Let's jump into it.
xG Timelines
We'll only be focused on working with timelines in this article, but I'll circle back around to xG again at some point I'm sure. xG timelines are simply the sum of expected goals over the course of a match. Watching the 2023 women's world cup, I decided to take a deeper look at Australia. They made some impressive changes towards the end of their England match that caused a lot of problems. I've highlight the period we'll look at in the cover image
Australia's Changes vs England
We'll start our analysis with Australia's final substitution of the match. Australia is trailing 2-1 and Gustavsson can be seen shouting multiple instructions while van Egmond runs on and passes on even more information. All these changes seem to be working well as Australia gets on the front foot and begins to apply pressure especially down the left. First they're able to get into an advanced position and serve a cross on to Kerr's head inside the six yard box, but she flick it over. Then changing the point from the left creates a clean look for Vine in the right assist zone, but Earps make a big save and England clear. Pushing down the right creates another opportunity, but Fowler's shot is blocked out for a corner. The corner is served on top of the England goalkeeper and is only cleared to the edge of the six where Kerr who slashes it wide. Australia push forward again with a long ball into the 18, but it's cleared. Hemp picks it up and drives 30 yards before sliding in Russo who gets on it quickly and strikes it with her first touch. Just like that it's 3-1 and England have put the match to bed. The excitement that this sport brings is something else, but let's dig into this 5 minute spell and gain some insight into it.
Traditional Scouting vs Data-Driven Scouting
Looking at the xG timeline, Australia generated 0.71 xG and England only had the one shot that measured 0.23 xg. So what can we actually take away from xG if it can lack correlation with goals and the overall result? xG like any other analytical tool provides context. It allows us to understand the flow and momentum of a match and enables a more efficient approach to scouting. Before anyone gets started, this is not a replacement watching full matches when scouting opponents, but it can be used in addition to traditional scouting.
I'll give you my approach to traditional scouting first, so we can talk about how you can implement xG timelines into your scouting approach. I apply a pretty typical 3-match approach. First I will watch the most recent match first noting any trends. Next I typically jump to the previous week and watch another match noting any larger trends and paying special attention to trends from their most recent match. The reason being, our college schedule only has a single rest day between matches, so a repeated weakness between Thursday/Saturday or Friday/Sunday has a good chance of being corrected in training before we even see that opponent. If I can find weaknesses that span a week or more, there's a better chance we can exploit it on matchday. Next I find the most recent match where they played an opponent with a similar style/shape to what our team will be, and I'll watch that and note the same things. Skipping past the time the ball is out of play usually means I'm spending about 3 hours per opponent on a more traditional scouting approach. I could use my additional time on watching a couple extra matches, but I've found that I can gain more insight by taking a half hour to identify data trends to show me where to spend this time. As a bonus, I am able to back up my subjective analysis with other forms of data, and I gain insight I wouldn't have from video analysis alone.
Taking a look back at scouting Australia, you may be wondering why I picked out these 5 minutes out of the staggering 749 they played at the World Cup. Well simply it comes down to the structure I use to scout. We all know how important the last 15 minutes of a match are. Statistically this is the period when the most goals are scored, and that's for a good reason. Teams that are trailing are more willing to push forward to level the score which increases the chances they will score, and as we saw in this match, it also opens them up to counter attacks. This is why gaining an understanding of how your opponent will react in either of these situations can be critcal to the final result. So how do we take advantage of xG to simplify this process? Searching for the last time Australia came from behind to score multiple late goals lands us in Aprill 2022 over a year before this match. Taking advantage of xG, I'm quickly able to scan through Australia's World Cup matches and find more film to add this analysis. The highlighted section of the Australia vs Nigeria group stage match provides us with more positive changes that Australia made while trailing. Now we have 20 minutes of footage to study and use to understand what we should expect from Australia if we're in front in the last 15 minutes. This same strategy can be equally applied to any other time-related scouting i.e. when Australia is leading towards the end of the match or numerous scenarios coming out of the haltime break.
Analysis of Australia's Tactics When Trailing
We'll follow up with a tactical analysis of this match in the near future with an article focused on video analysis and scouting.