Projecting 2018-19 Goaltender Performance

Goaltenders are important, but volatile. How can statistical forecasts overcome considerable uncertainty in future performance? By embracing it.

Why Project Performance?

The first instinct upon seeing goaltending projections should be: why? Goaltending is a notoriously volatile position making little sense to those even with a deep understanding of the game. In any given season, both high-pedigree and also-rans goaltenders are seemingly as likely to deliver top performances. A perceived star having a poor season can sink a promising season.

But that magnitude of impact makes it an interesting and useful exercise. Goaltending, though volatile, exerts an outsized influence on games and seasons, for better or worse. If your goaltender is on, the game is easy, and if they are off, everyone invested in the team is just waiting for something to go wrong.

Importantly, volatility is something statistics can capture and quantify along with the potential impact on the team. In a league where true skill from team to team can be tight, that impact is relatively large. Last regular season, goalies made up 11 of the top 30 WAR (Wins Above Replacement) contributors, according to

And that’s the crux: a volatile but important position is still important. It is often useful to use data to project future results, no matter how difficult and frustrating the process can be.

In the Business of Results

It’s important to preface that this analysis deals with goaltender statistical profile rather than true goaltender ability. In a perfect world, we could successfully derive a metric that aligned the two in a meaningful way, and current methods do their best to isolate goaltender performance by adjusting for the quality of shots their team allows. However, there are latent variables characteristic of certain teams. For example, some teams may allow a higher-rate of screened shots or cross-ice passes relative to the recorded shot attributes might suggest. I’ve estimated that team-level latent effects on shot quality can be 0.2% at even strength and about 0.6% on the powerplay.
Therefore, all projections suggest which goalies will likely return the best results rather than which goalies are definitively better. Results are influenced by ability, health, age, opportunity, coaching, and team-effects, all contributing to the difficulty of prediction.

What Result Do We Care About?

In order to create a projection, we first must decide what to measure over the upcoming season. Some fantasy websites might project games played or wins, or standard save percentage. However, we want a metric that best isolates goaltender performance, given the available data. Publically, this is currently best done by adjusting save percentage by expected save percentage using an expected goal (xG) model. Each shot is weighted by the probability of it being a goal given what we know about the shot and measured against actual goals against.
Metric stability

Rebound Control

However, rebound shots weigh heavily in expected save percentage calculations, and rightfully so. Rebound shots are about 4 times more dangerous than initial shots (a shooting percentage of 26% and 6%, respectively). However, rebounds are not necessarily independent of the goaltender, in theory to goaltender has some control over rebounds opportunities against them.
Determining that rebound-prevention as a repeatable ‘skill’ is tricky – unlike a goal against, there is no solid definition of a rebound. The shooter, goalie, rebounder, defender, and record keeper (rebounds are designated when a follow-up shot is within 2 seconds of the previous) all have some impact on the outcome. However, removing credit for rebound chances against and replacing with an expected goal value derived from an expected rebound calculation multiplied by the probability of a goal (about 25% of rebound shots end up as a goal) helps remove some of the noise rebound xG creates leading to more stable predictions.

Non-Naive Bayes

Results are often complicated by sample size. More shots mean more information. While we could only include goalie-seasons with X number of shots, these cut-offs can be arbitrary and can be fiddled with to create spurious results (1000 shot minimum looks like this, but 1200 looks like this). My approach is the add a regressor to observed results to bring the goaltender back a single number prediction headed into the season based a simple linear model. The model inputs are last seasons results, shots against, partner performance, age, and whether it was a rookie season. This prediction acts as the Prior (prior probability distribution, red line below), our best guess of how the goaltender will perform that particular season before allowing evidence (results) to pile up.
If a 25-year old rookie is brought up from the AHL, we will probably expect below average results, say an extra goal against every 100 shots (-1%). If they post a 30-save shutout in their first game, the evidence (the shutout) wouldn’t necessarily overwhelm the prior, so combining our prior beliefs and evidence (realized save percentage) into a posterior (posterior probability distribution, blue line below), our updated estimate of their results will better than the prior of -1%, but not by much. However, after 10 games of superb results will begin to move into positive territory.
Piling on the evidence

How quickly does the evidence overwhelm the prior? That depends on the prior strength. We can imagine the prior as a synthetic goalie put in net for a set number of shots recording the same results as the prior expectation of them. So if we have a strong prior, we might ‘simulate’ close to a season of data before considering actual results. A weak prior might only be a hundred shots. The weaker the prior, the quicker the actual results and posterior results converge, as seen above.

What prior strength best stabilizes results over the season in order to best use in prediction? We will test that out later.

Target Data

The metric I’m choosing to measure is:
  1. Save % Lift Over Expected – consider actual save % relative to the expected save % derived from an expected goal model (which considers shot location, shot type, strength, shooter, and time and location over the event prior to the shot.
  2. Regressed – using a Bayesian approach we will test various prior strengths in order to create a metric with a good balance between efficiency and workload.
  3. Rebound Adjusted – Removing some of the noise that rebounds can add when using expected goal models to measure shot quality faced by a goalie.
This metric satisfies both philosophically and statistically. Philosophically, we are measuring goaltender performance based on what they do with each initial (non-rebound) shot against, based on the features we know about it and index their results to league average.
Statistically, when trying to predict future results, this metric performs better than raw save %, save % over expected unadjusted for rebounds, and unregressed save % over expected adjusted for rebound. Though this isn’t always a high bar to clear.
Prior work from RITHAC 2017

The Marcel Framework

The easiest way to forecast the future is to look at the past. But how far into the past? Is yesterday more relevant than the day before it, and by how much?
A standard method to forecast athlete performance uses the marcel framework, which has its roots in baseball and has been adapted for hockey numerous times. Results from prior seasons are aggregated and given less weight the further in the past they are.
A two-season marcel projecting 2018-19 results might weight 2017-18 results by 75% and 2016-17 results by 25%, totalling 100%. If we wanted a feature to represent goaltender shots faced, and in 2017-18 they faced 2,000 and in 2016-17 they faced 1,000 shots, using the 75-25 weights, our representation of shots faced would be 1,750 ((2000 * 0.75) + (1000 * 0.25)).
Like our prior strength parameter, to best parameter capture history (look back seasons) and recency (how to weight each season) can be tested.

Building the Grid

The goal of the analysis is to best predict future performance, and we have a few parameters we want to test to best generate model inputs and targets – prior strength, marcel lookback seasons, and relative weighting of lookback seasons. For each parameter, we can test various values (i.e. 100, 400… 3000 prior shots, 1…5 lookback seasons, 10 different weighting configurations) and then test model performance for each of the unique 350 combinations of parameters.

Under the Hood

Each parameter combination is used to create the:
  1. Target variable – regressed, rebound adjusted save % over expected
  2. Input features
    1. Marcel-weighted regressed, rebound adjusted save % over expected
    2. Marcel-weighted shots against
    3. Marcel-weighted even-strength rebound adjusted save % over expected
    4. Marcel-weighted rebound adjusted save % over expected of partner goaltenders
    5. Age

For each, test season we calculate the target variable and aggregate the input metrics from prior seasons. We can then train a few different models exploring the relationship between marcel-weighted prior metrics and unseen future results.

Each model splits out 80% of the 576 goalie-seasons from 2010-11 to train a model. The caret package is used to create a cross-validated model by splitting the data into 5 folds, repeating the process 5 times, in order to the find the optimal tuning parameters. The remaining 20% of the data is held out and the model performance is measured on that unseen data. Four models are fit.
  1. Random Forest Model (4 inputs) – input features of regressed results, shots against, prior even-strength results, and age. This decision tree looks for splits in the data that might be useful in predicting future performance.
  2. Linear Model (3 inputs)  – input features of regressed results, shots against, and prior even-strength results. Simple model solely based on prior results.
  3. Linear Model (4 inputs)  – input features of regressed results, shots against, prior even-strength results, and age. The model hopes to balance performance with age.
  4. Linear Model (5 inputs)  – input features of regressed results, shots against, prior even-strength results, age, and performance of partner goalies.
Each model is then applied to the about 60 goaltenders with NHL experience likely to be on opening day rosters. For each of the 4 model predictions, we have 350 different parameter calculations, considering only models with good out-of-sample testing scores. Those out-of-sample scores are then used to take a weighted average of the prediction along with each of their confidence intervals. Finally, the 4 model prediction and confidence intervals are averaged together to represent reasonable forecast for the upcoming season.


Each goaltender has a forecast presented with a range of results, given their statistical profile and the modelling process. A lower peak and wider plot distribution represent a more uncertain prediction. It appears that age and prior inconsistency generally increase the uncertainty, which makes intuitive sense. However, due to the nature of the modelling process, the exact relationship is a bit obfuscated.

It’s also important to note that this metric represents both efficiency (per shot) and workload. Goaltenders that have demonstrated the ability to handle a heavy schedule, like Frederik Andersen, are given more credit since their above average results will likely be across more shots (overcoming the regressor). Taking extra starts from a back-up or replacement-level goaltender will likely benefit the team.

Thinking About Uncertainty

There’s obviously a lot of overlap between many goalies, which might make it unclear how exactly a decision-maker might glean information from the analysis. It might more helpful to simulate seasons by ‘drawing’ results from the calculated distribution and comparing results to peers like we would in the card game ‘War.’ If we sample from the distributions of Braden Holtby and Peter Budaj 1000 times, Budaj would post superior results about 3% of the time.

This exercise can be done for each team with veteran goalies in their system against 2 veteran free-agent goalies, Kari Lehtonen and Steve Mason. While goalies like Greiss and Darling are projected to only outplay Steve Mason in about 20% of simulated seasons, this apparent gamble could also factor in things like contract status, age, or injury risk. In any event, we can capture the uncertainty and provide the opportunity to make a calculated decision.

Calculated risks

Bottom Line

An alternative calculation is to simulate absolute goals prevented over expected for each team. Based on rostered goaltenders forecasted outcomes we can create a distribution of possible outcomes by simulating their season thousands of times. As a point of reference, last season that range was about +/- 40 goals, representing about a 15 point swing in the standings. There are no certain outcomes, but you can maximize the probability of ending up in positive territory.

Simulated Seasons[/caption]


Every season brings its own hard lessons on how difficult it can be to predict goaltender performance. Therefore it makes sense any forecast shouldn’t avoid uncertainty, but rather try to embrace it.

Teams and decision-makers are best aided by understanding that future performance is only probabilistic. Carey Price might be one of the most talented goaltenders in the league, but how likely was his poor performance last season? Unlikely, but certainly not zero. That’s true of every goalie heading into the 2018-19 season.

The universe of goaltenders are more talented than ever, so it’s no surprise that the top talents in the world when indexed to each other, are not separated by much. The means as the upcoming season unfolds, the results we observe will quickly deviate from what is expected in many cases. In some of those, they will reconverge, but others might see that opportunity lost to injury or an opportunistic teammate.

But it is important to know what to expect from goaltenders. Evaluators might have an easier time forecasting bottom-6 skater performance, but the impact on the outcome of the season is considerably less.

Teams only get a few chips a season on goaltenders, the edge might be small but the payoffs compound over the course of the season and often season-defining. A statistical forecasting approach that incorporates uncertainty can help them quantify that bet.

Thanks for reading! Any custom requests ping me at @crowdscoutsprts or Code for this and other analyses can be found on my Github.

Leave a Reply

Your email address will not be published. Required fields are marked *