Getting Started with Data: Creating and Visualizing Shot Data
What to Track
We're going to take a look at tracking event data manually in this guide. I'll be working with shot data and utilizing Dartfish for coding/tagging the events and ggplot2 in R for the visualizations with the goal to provide insight into the performance of college team out of possession last season. I'll keep this guide primarily focused on the process, so you can implement with your preffered choice of tools just make sure you can get your data into a csv format when you're finished. You should even be able to track events with pen and paper during a live match with a bit of practice. Let's take a look at what data we need to gather for quality analysis later on. Our dashboards are going to track goalkeeper and team perfromance out of possesion based on five things: shot outcomes, goalkeeper actions, location where the shot was taken from, placement of the shot and the quality of the chance. The first two are simple. We're watching the film and selecting the outcome and goalkeeper action based on our criteria. I set up automated scripts to create heirarchical groups for the outcomes, to save a bit of time, and I can jump to the Auto Key tab to manually make adjustments when something out of the ordinary happens.
Location Data & Chance Quality
Since we're going to want location based info for our analysis, we're going to track the location of every shot using x;y coordinates of 80;120, and the placement using x;y of 100;50. This will allow us to feed the data onto heatmaps and pitch plots with ease later on. If you want to learn more about building a pitch with ggplot2 for displaying your data, you can find my guide on the topic in the data analysis section of the blog.
Next up we're going to add a few more measurable details about the shots. We'll be using the shot clarity and pressure on the ball in conjunction with shot distance and angle to give us some insight into chance quality. The most important thing is to stay consistent when you start adding in more advanced metrics. The best way to do this is by defining a clear criteria for your metrics. For me, I determine shot clarity based on the number of players between the shot and the goal when it's taken, and I use circles skewed towards goal with 5, 3, and 1, yard radiuses to determine shot pressure. After looping through our shots a few times, we're done with the manual part of the process. Export your data as as csv, so we can get our data plugged into a dashboard and start to learn how to interpret it.
Data Plots
Python, Tableau, R, the old and reliable excel. The good options for visualizing are countless today, but we'll be jumping back into R and building a Shiny Application. There's pros and cons to everything, but for quick and clean approaches R' is a solid choice. It's built for statistical analysis, it provides fine-control over visualizations, and it's easy to get up and running as a web app. We'll walk through how to get this data into a nice pitch plot, and then we'll go over how to get this into a dynamic dashboard for further analysis. First up let's load the packages we'll need for the plot, and then set some static variables.
shotmap.R
library(tidyverse)
library(showtext)
font_add_google("Source Sans 3")
showtext_auto()
font_family <- "Source Sans 3"
font_color <- "#aaaabb"
background_color <- "#17161b"
background_line_color <- "#505055"
foreground_line_color <- "#707075"
inside_lines <- "#aaaabb"
outside_lines <- "#707080"
pitch_line_size <- 0.6
pitchWidth <- 80
pitchLength <- 120
Next let's pull in our data and see what it contains. I'm simply reading the csv, and wrapping it with colnames in the console, so I can get the headers for each column.
shotmap.R
colnames(read.csv("data/shots.csv"))
Output
[1] "Name" "Position" "Duration" "Event" "Outcome" "Outcome_Sub" "Assist_Type"
[8] "Assist_Sub" "Movement" "Assist_Location" "Shot_Placement" "Shot_Location" "Goalkeeper_Actions" "Team"
[15] "Opponent" "Shot.Clarity" "Shot.Pressure" "Details"
Since I only need a few columns for the shot map, I'm going to go ahead and select those columns with the dpylr package included with tidyverse to maintain effiency. I'm going to group the shots by Outcome and plot the Shot.Location first. I'll also use the Outcome_Sub category and Opponent later to build dynamic filters when we build our app, so I'll grab those now.
shotmap.R
shot_data <- read.csv("data/shots.csv") %>%
select (Outcome, Outcome_Sub, Shot.Location, Opponent)
head(shot_data)
Once we have the four columns of data we need for the plot, I run the head() function on the data, to give me the first few rows, so we can decide the next steps.
Output
Outcome Outcome_Sub Shot.Location Opponent
1 Off Target High 58;106 IUPUI
2 Off Target Near 32;112 IUPUI
3 Goal Conceeded 26;107 IUPUI
4 Goal Conceeded 19;112 North Dakota
5 Save Save & Deflect 48;106 Purdue
6 Goal Conceeded 37;105 Wright State
Everything looks good for our strings, but we're going to need to make some changes to our Shot.Location column. First we need numerical values instead of x;y, so we're going add a pipe opperator after our select statement. We're using separate to break the location data into two seperate columns. We'll specifiy the data column Shot.Location first, into = c("x_column_name","y_column_name"), the separator which is ; in this case, and finally we'll set convert = true so they're automatically converted to numerical format. You may have noticed that my y values are high for measuring the performance of shots conceeded. This is because Dartfish has a inverted y-axis by default. You may experience something similar, so you can take a similar approach to change to one of your axes. I'm setting a temporary column "y_inverse" that I'm going to subtract from our pitchLength variable in a mutate function to give us our our y values.
shotmap.r
shot_data <- read.csv("data/shots.csv") %>%
select (Outcome, Outcome_Sub, Shot.Location, Opponent) %>%
separate(Shot.Location,
into = c("x_shot_location", "y_inverse"),
sep = ";", convert = TRUE) %>%
mutate(y_shot_location = pitchLength - y_inverse)
Next up I'm going to use the ggplot() function to set our shot_data. Then I'll use aes() to set x and y equal to the correct columns. Next up, I'm drawing the lines of our pitch, learn more about how to create your own here: Pitch Lines in R, or just copy them in and we'll jump to the next step.
shotmap.R
ggplot(shot_data, aes(x = x_shot_location, y = y_shot_location)) +
#Annotate halfway Line
annotate("segment",
x = 0, xend = pitchWidth,
y = (pitchLength/2), yend = (pitchLength/2),
color = inside_lines, size = pitch_line_size
) +
#Annotate center dot
annotate("point",
x = (pitchWidth/2) , y = (pitchLength/2),
color = inside_lines, size = (pitch_line_size*3)
) +
#Annotate center circle
annotate("path",
x=(pitchWidth/2)+10*cos(seq(0,2*pi,length.out=2000)),
y=(pitchLength/2)+10*sin(seq(0,2*pi,length.out=2000)),
color = inside_lines, size = pitch_line_size
) +
#Annotate goal areas
annotate("rect",
xmin = (pitchWidth/2)-10, xmax = (pitchWidth/2)+10,
ymin = 0, ymax = 6,
fill = NA, color = inside_lines, size = pitch_line_size
) +
annotate("rect",
xmin = (pitchWidth/2)-10, xmax = (pitchWidth/2)+10,
ymin = (pitchLength-6), ymax = pitchLength,
fill = NA, color = inside_lines, size = pitch_line_size) +
#Annotate penalty area
annotate("rect",
xmin = (pitchWidth/2)-22, xmax = (pitchWidth/2)+22,
ymin = 0, ymax = 18,
fill = NA, color = inside_lines, size = pitch_line_size) +
annotate("rect",
xmin = (pitchWidth/2)-22, xmax = (pitchWidth/2)+22,
ymin = (pitchLength-18), ymax = pitchLength,
fill = NA, color = inside_lines, size = pitch_line_size
) +
#Annotate penalty spots
annotate("point",
x = (pitchWidth/2), y = 12,
color = inside_lines, size = (pitch_line_size*3)
) +
annotate("point",
x = (pitchWidth/2), y = (pitchLength-12),
color = inside_lines, size = (pitch_line_size*3)
) +
#Annotate Penalty Arcs
annotate("path",
x=(pitchWidth/2)+10*cos(seq(0.205*pi,0.795*pi,length.out=600)),
y=12+10*sin(seq(0.205*pi,0.795*pi,length.out=600)),
size = pitch_line_size, color=inside_lines
) +
annotate("path",
x=(pitchWidth/2)+10*cos(seq(1.205*pi,1.795*pi,length.out=600)),
y=(pitchLength-12)+10*sin(seq(1.205*pi,1.795*pi,length.out=600)),
size = pitch_line_size, color=inside_lines
) +
#Annotate Pitch Border
annotate("rect",
xmin=0, xmax = pitchWidth,
ymin = 0, ymax = pitchLength,
fill = NA, color = outside_lines,
size = pitch_line_size
)+
#Annotate Goals
annotate("rect",
xmin = (pitchWidth/2)-4 , xmax = (pitchWidth/2)+4,
ymin =0, ymax = -0.5,
fill = outside_lines, color = outside_lines, size = pitch_line_size
) +
annotate("rect",
xmin = (pitchWidth/2)-4 , xmax = (pitchWidth/2)+4,
ymin =pitchLength, ymax = (pitchLength+0.5),
fill = outside_lines, color = outside_lines, size = pitch_line_size
)
Now we're going to add a scatter plot to our graph. I'm calling the geom_point() and making two changes to it. First I'm using the aesthetic mapping to color by the Outcome, and then second I'm setting the size statically to 5 for better readability.
shotmap.R
geom_point(aes(
color = Outcome
), size=5)
Now that we have the data where it needs to be, we'll focus on cleaning things up so we can make sense of it. First I'm added alpha = 0.7 to the geom_point, so we can see through points have a better idea what's going on. Next up I'm going to set a scale_colour_manual, so I can use custom colors with the plot. It works by simply setting the outcome strings equal to a color code. I'm using a dark green for blocked shots, dark blue for saves, orange for goals, and grey for off target. Next I'm going to make some theme adjustments. I'm removing grid lines and axes and setting the plot background and font colors, so the data has better visibility.
shotmap.R
geom_point(
aes(color = Outcome, aplha = 0.7),
size=5
) +
scale_colour_manual(
values = c("Blocked" = "#047857",
"Save" = "#1D4ED8",
"Goal" = "#F97316",
"Off Target"="#A1A1AA")
) +
theme(
rect = element_blank(), #Remove inner background
line = element_blank(), #Remove grid lines
axis.title = element_blank(), #Remove axis titles
axis.text = element_blank(), #Remove axis values
plot.background = element_rect(
fill = background_color, color = background_color
),
legend.title = element_blank(),
legend.text = element_text(size=18,
family=font_family,
hjust=0.5, vjust=0.5,
face="bold",
colour = inside_lines),
legend.key.size = unit(.75, "cm"),
)
Since this shot map already has a ton of information, we're not going to try to vsiualize anything else, but we can take advantage of the alpha and shape attributes to better visualize the outcomes. We'll set alpha and shape both equal to Outcome. Then we'll build a manual mapping for each. For the alpha, we want our off target shots to be fairly transparent, so our aplha is at 30%. For Blocked, Save, and Goal, I'm using a range of 60-80%. Next I'm grouping our outcomes into two category. Shots that made it on target, and one that didn't. Our goals and saves will be mapped to a circle, so we use the number 16 for our shape. I'll then use shape 18 (diamonds) for the blocked and off target shots.
shotmap.R
geom_point(aes(
color = Outcome, alpha = Outcome, shape = Outcome
), size=5) +
scale_alpha_manual(
values = c("Blocked" = 0.6, "Save" = 0.7, "Goal" = 0.8, "Off Target"=0.3)
) +
scale_shape_manual(
values = c("Blocked" = 18, "Save" = 16, "Goal" = 16, "Off Target"=18)
)
Next up we're going to quickly filter the data and have a look at anything that stands out, then we'll build a dashboard we're we can do this step and any others using select inputs in a shiny application. For now I'm jumping back up to our raw data and adding a filter for blocked shots and sourcing the code to give me a plot with only blocked shots. Then I'll the same thing, but this time I want shots that resulted in goals or saves.
shotmap.R
shot_data <- read.csv("data/shots.csv") %>%
filter(Outcome == "Blocked")%>%
...
shot_data <- read.csv("data/shots.csv") %>%
filter(Outcome == "Goal" | Outcome == "Save") %>%
...
Here's an example based on my data. Once I split the data, I'm able to identify significantly less blocked shots and more chances in Zone 4 straight away.I'll be able to dive into these 10 clips and find what is causing this trend fairly efficiently. For now, let's go ahead and build out a dashboard so we can load in new data in the future and visualize it in seconds. I'm going to save our shotmap.R file into a folder called components, and start building out the application.
If this is your first time creating a shiny application, You can create a demo app from the menu that will highlight the file structure. For this we''ll be create a simple app using the server.R and ui.R file structure. For our server file, we'll load in the required and shiny library and create our server function which needs input, output, and session parameters. Inside we're only going to create and output variable called shotPlot and use renderPlot to source our file from our components folder.
server.R
library(shiny)
function(input, output, session) {
output$shotPlot <- renderPlot({
source("components/shotmap.R", local = TRUE)$value
})
}
Now we'll create our ui file by loading in the required shiny library, and building our page with the fluidPage function. We'll pass our mainPanel in with our plotOutput from the server file. If we did everything correctly, we should be able to run our app and generate our static plot in it.
ui.R
library(shiny)
fluidPage(
mainPanel(
plotOutput("shotPlot")
)
)
Now We're going to build out our first interactive element. Let's add a sidebar layout and panel to hold it. After that we'll define our SelectInput with the id of outcomes, the label of Outcome: and set the choices to NULL since we'll jump over to the server component and create the options dynamically.
ui.R
library(shiny)
fluidPage(
sidebarLayout(
sidebarPanel(
selectInput("outcomes",
"Outcome:",
choices = NULL)
),
mainPanel(
plotOutput("shotPlot")
)
)
)
Now in the server file, we're going to pull in all the libraries and static variables so we can have access to them in other components. This includes defining our data and in this case the values we'll use for our selectInput. For this we are simply getting the unique strings in the outcome column.
server.R
library(shiny)
library(showtext)
library(tidyverse)
font_add_google("Source Sans 3")
showtext_auto()
font_family <- "Source Sans 3"
font_color <- "#aaaabb"
background_color <- "#17161b"
background_line_color <- "#505055"
foreground_line_color <- "#707075"
inside_lines <- "#aaaabb"
outside_lines <- "#707080"
pitch_line_size <- 0.6
pitchWidth <- 80
pitchLength <- 120
shot_data <- read.csv("data/shots.csv") %>%
select (Outcome, Outcome_Sub, Shot.Location, Opponent) %>%
separate(Shot.Location, into = c("x_inverse", "y_inverse"), sep = ";", convert = TRUE) %>%
mutate(y_shot_location = y_inverse, x_shot_location = pitchWidth-x_inverse)
outcomes = unique(shot_data["Outcome"])
Now we just need to define two things in our server function. First we'll make our shot data reactive, so we can adjust it dynamically. Second we're going to update our selectimput with the outcomes we load in from the data.
server.R
function(input, output, session) {
shot_data_reactive <- reactive(shot_data)
observe({
updateSelectInput(session, "outcome", choices = outcomes)
})
output$shotPlot <- renderPlot({
source("components/shotmap.R", local = TRUE)$value
})
}
Back in our plot file, we can remove most of the clutter. The change I make is by changing the variable name to shotmap_data and setting that with our reactive shot data. I can then filter this data with out selectInput by setting the Outcome column equal to input$outcome. This will grab our input variable and refilter the data whenever it changes. That's it pass in the rest of our ggplot, and we should have dynamic shotmap grouped by the selected outcome. Next we'll tidy up the ui and we're ready to more dashboards or components for any future analysis
shotmap.R
shotmap_data <- shot_data_reactive() %>%
filter(Outcome == input$outcome)
ggplot(shotmap_data, aes(x = x_shot_location, y = y_shot_location)) +
...
Back in the ui I'm going to use tags$style to create custom css style for the dashboard, and I'm setting the height of the plot to 800px.
ui.R
library(shiny)
fluidPage(
tags$style(
HTML(
'
body {
font-family: "Source Sans 3", sans-serif;
font-weight: bold;
background-color: #17161b;
color: #aaaabb;
height: 100%;
}
h2 {
text-align: center; /* Set text alignment to center */
font-weight: bold; /* Set font weight to bold */
padding-bottom: 0px;
}
h3 {
text-align: center; /* Set text alignment to center */
font-weight: bold; /* Set font weight to bold */
padding-bottom: 0px;
}
.selectize-input {
background-color: #aaaabb !important;
color: #17161b !important; /* Set text color for selectInput */
font-weight:bold;
}
.selectize-dropdown-content .option {background-color: #17161b !important;
color: #aaaabb !important; /* Set text color for selectInput */
font-weight:bold;
padding: 4px 0 !important; /* Adjust top and bottom padding */
margin: 0 !important; /* Reset margins */
}
.well {background-color: #17161b !important;
border: none !important;
color: #aaaabb !important; /* Set text color for selectInput */
}
'
)
),
sidebarLayout(
sidebarPanel(
selectInput("outcome",
"Outcome:",
choices = NULL)
),
mainPanel(
plotOutput("shotPlot",
height = "800px")
)
)
)
Now that we functional dashboard, it seems like a good place to break. Keep an eye out for the second part to this guide where we'll build out additional dashboards in a single app using the tabsetPanel. This will allow us to build in several more components for tracking shot data. Feel free to let me know what types of plots you want to learn how to build in the next part.