It’s All Relative
In a salary cap league, how teams spend their finite budget has become very important to any present or future success. The relative value of a contract is often more important than the absolute value of the contract. Within a very strict set of contract rules, teams will devote a share of their allotted cap space to a player at a price dependent on a number of market forces. The goal of this study is to determine what that price should be considering some of those market forces to compare to the actual salary.
So, how do we go about determining the market rate? First, it helps to make some simplifying assumptions – we expect the cap-hit or AAV (Annual Average Value of the contract) to probably be a function of:
- Position – different positions are valued slightly differently. Any contract negotiation anchor would consist of comparables playing the same position.
- Age – the NHL’s not-so-free labor market puts significant restrictions and limitations on young player’s earnings. Thus, any analysis looking at market rate should factor in age.
- Skill / ability / comprehensive contribution to winning – the player’s perceived ability will determine market value. Unlike age and position, skill is extremely difficult to accurately gauge and forecast (since many deals are multi-year). This will pose the biggest obstacle to a clean quantitative analysis. Across all sports, teams consistently misvalue player ability, most notoriously over-valuing their ability and overpaying them.
- Contract Length (Term) – There are different interactions between age, term, and AAV. A short contract length might signal less money (a ‘show me’ bridge contract) for a young RFA or more money (player trading longer term for higher AAV) for an older UFA. Data courtesy of generalfanager.com.
- Projected Salary Cap at Contract Date – A $5M AAV contract signed in the summer of 2009 is not the same as a contract signed in the summer of 2016. Managers are forward-looking allocating a set percentage of their expected salary cap to a player rather than an absolute amount. Data courtesy of generalfanager.com.
To determine how each player cap-hit stacks up against what we would expect, we must create a formula or algorithm to return each player’s expected AAV. Finding the difference between the expected AAV and actual AAV – or residual – would signal the relative value of their cap-hit. Spending a million less than market forces would expect (or, more specifically, our model would predict) allows the team to allocate to either save money or invest it elsewhere.
A model can be built using the features discussed above, predicting AAV as a function of age, position, and ability – the catch-all for talent or skill or whatever. But how do we comprehensively quantify ability, the age old question?
One Feature to Rule Them All
My baseline method will be to use GAR (Goals Above Replacement) from war-on-ice.com to help predict salary. GAR is a notable attempt to assign numerical credit to players based on their team winning, which proves a decent proxy for ability. However, GAR or any ‘be all, end all’ stat has limitations – injuries interrupt accumulation of goals above replacement and defensive contributions are very difficult to quantify, among other things. No algorithm is omnipotent, but GAR is a very helpful attempting to answer this question.
In addition to GAR, I will use data collected from my project, CrowdScout Sports, designed to smartly aggregate user judgment. It has been in beta over the course of the 2015-16 season with over 100 users making over 32,000 judgments on players relative to each other. With advanced metrics provided, a diversity of users, and the best forecasters gaining influence, I hope the data provides an increasingly reliable comprehensive player rating metrics. The rating is intended to answer the question posed to the user as they are prompted to rank two randomly chosen players – if the season started today, which player would you choose if the goal were to win a championship.
Both metrics will be used as a proxy for ability when trying to explain AAV, data courtesy of generalfanager.com. Both metrics are designed not to be influenced by cap-hit, a necessity for the model to properly to explain cap-hit.
GAR Linear Model
First, let’s explore the relationship between AAV and term, salary cap expectations, position, age, and ability using the GAR metric. Using 2014-2015 data from war-on-ice.com and using their GAR model, a dataset containing player features at the onset of 2014-15 season was assembled. The AAV of the upcoming 2015-16 season (where the player was signed prior to the season) was targeted. Any incomplete records were removed. The age variable was transformed into a bucketed variable since there isn’t a linear relationship between age and AAV, rather different levels of pay by age. The natural bucketing of age in relation to cap-hit are:
- 18-21 – Entry Level Contract (ELC) players
- 22-24 – A mix of ELCs, bridge contracts, and a few high fliers who get paid
- 25-27 – RFA controlled, second contract players in their early prime
- 28-31 – UFA contract years (likely higher cap-hit) but players likely to still be in their prime
- 32-35 – UFA contract years with some expected decline in ability
- Over 35 – Declining ability compounded with specific contract rules for 35 plus players
The 924 remaining players were then split into 10 folds to cross-validate the Generalized Linear Model (GLM) – iteratively training on 90% of the data and testing out of sample on the remaining unseen 10% of data, then combining the 10 models. The cross-validated model is then used to score the original dataset – the coefficients from the GLM are multiplied by each player’s individual variables – age (1/0 for each bucket), position (1/0 for each position), contract length, projected cap, and GAR. The outcome is the expected AAV.
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 836, 837, 838, 838, 838, 836, ...
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Our simple GLM explains about two-thirds of cap-hit. GAR, Contact Length, and Projected Cap are all a strong positive predictors. Each age bucket is subsequently paid more. Of note, the 22-24 age bucket is the weakest age coefficient since at that age some players are on their ELC while others have earned legitimate star contracts. In this model, position wasn’t a significant predictor, although it signals defensemen and goaltenders probably go at a premium to centers, while wingers take a discount.
The player-level residuals (expected AAV less actual AAV, a positive value representing surplus value to the team) are plotted below. The model would be stronger, but for some significant outliers – Jonathan Toews, Patrick Kane, Thomas Vanek, and Tyler Meyers were all paid about $4M more than the model expected. Conversely, Duncan Keith, Roberto Luongo, and Marian Hossa were all underpaid by at least an expected $4M. Like most linear models, it had trouble predicting a non-normal target. That is, the distribution of AAV values had a skew to the right, where the model struggled to pick up ‘extreme’ values. Transforming AAV into a log of AAV did not increase predictive power.
The next iteration of the GLM was run using the CrowdScout score as a proxy for ability. A few notes on the inclusion of this data:
- What is this metric? It represents the relative strength of that player’s Elo rating compared to the entire population at the time of analysis. The Elo rating is the cumulative result of over 100 scouts selecting between two randomly generated (but generally similar) players some 32,000 times. Each of these selections feed into an algorithm that adjusted each player’s score based on the prior probability of the match-up and k-factor given to the user – the more active and accurate that user had been historically the greater their influence.
- I think skepticism should be applied to any analysis performed on data acquired through some level of effort of the owner. That said, the CrowdScout data is the result of my own engineering project and is intended to aid (fantasy) managerial decision-making, rather than provide advanced analytical insight. Any clean, methodologically tight analysis would be a bonus.
- There is a concern of collinearity in this analysis – since it is possible a subset of users associated higher salary with better ability, opposed to the reverse. Conversely, an obviously overpaid player can be under-rated due to an emotional discounting of their ability. For the purpose of this analysis, we will assume the effects neutralize each other and in aggregate AAV did not significantly impact the CrowdScout score. There will obviously be a correlation between player score and AAV, but that does not imply causation.
With the CrowdScout data, I kept all players from the 2015-16 who had been judged at least 70 times, effectively dropping players who did not spend a significant amount of time on an NHL roster or didn’t receive many implied ratings from a diverse set of users. A dataset containing position, age bucket (same buckets as GAR Linear Model) as of 10/1/2015, and CrowdScout score as of 5/25/2016 was constructed for 548 players. A model was then built cross-validating 10 folds from the data, testing each model on unseen, out of sample subsets.
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 494, 494, 493, 494, 492, 492, ...
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The same model methodology using CrowdScout score as a proxy for ability explains about three-quarters of AAV. Like the GAR model, ‘ability’ has a strong positive relationship with AAV. Pay increases with significant jumps in expected pay from 21-24 to 24-28 and then again as players hit unrestricted free agency at around 28. Goaltenders and wingers are likely expected to have their AAV discounted, all else equal, although the relationship isn’t significant.
Using CrowdScout as a proxy for ability creates a better fitting model compared to using GAR. This is consistent with what we would expect to see since CrowdScout data doesn’t have to worry about players missing games due to injury. This is a study into what we would expect players to be paid – rather than players should be paid – therefore the CrowdScout score is very likely baking in some reputational assessments leading to a stronger relationship with cap-hit. It’s all possible that crowd wisdom is able to determine the impact defensive prowess has on comprehensive ability better than most public data.
This analysis also measures spending efficiency based on the 2015-16 AAV and ability, because the CrowdScout Score was not available at the start of the season. However, we can create a predicted CrowdScout Score from the 2014-15 season to hold up against 2015-16 AAV, since teams can only act on past performance and project out.
Paid Against the Machine
The original goal of the analysis was to compare player cap-hit to the expected cap-hit. A simple linear model explaining AAV as a function of age, position, term, projected cap at the time of the deal, and CrowdScout score does a good job predicting cap-hit. However, we can also explore additional modeling methods, increasing the depth of interactions between variables (i.e. age and draft year) and strengthen the predictive power. I will make an adjustment to the CrowdScout Score and use a machine learning model which will be able to handle the additional interactions between features:
- Predicted CrowdScout Score – Outlined here the CrowdScout Score can be reliably predicted using on-ice metrics. I will score each players 2014-15 statistics from puckaltyics.com with the GLM and Random Forest model and take the average of the predicted scores. This will replace the actual CrowdScout Score in the model, which can be biased.
- Age (as of season start, 10/1/2015) – Move from a strictly bucketed age to a continuous age variable, to help aid the different interactions. This would not work in a linear model, Jagr would mess everything up.
- Contract Length – Length has proved to be a key explanatory variable. Data courtesy of generalfanager.com.
- Projected Salary Cap at Contract Date – Also a key explanatory variable. Data courtesy of generalfanager.com.
- Drafted Boolean – The interaction between whether the player was drafted or not, term, and age should help the model to work out if the player is on an ELC, 2ndcontract, or UFA contract player.
In order to handle interactions between the new variables in the model, a Regression Tree will be used – known as the Random Forest algorithm. A Random Forest is an ensemble model, creating decision trees from randomized variables and subsets of observations, then each ‘tree’ is considered when scoring or predicting an observation. The advantage of this algorithm is that it is extremely powerful. The disadvantage is that it is basically a black box, there are no clean, interpretable parameters to say ‘when all else is equal we expect a player moving from the 31-35 age group to over 35 to be paid about $500k more’ like in a GLM.
A 500 tree model was able to minimize the RMSE under 0.5, with an R2 of close to 0.95.
Despite the lack of coefficients, we can also take a peek under the hood to check how important each variable is in the algorithms decision-making.
The CrowdScout Score and Term variables are the most important variables in the Random Forest model when explaining AAV. That is, when they are used to create a ‘tree’ or decision, they cumulatively reduce the sum of squared residuals more than the other variables. Age, which should work together in tandem with draft history and term, was also important. Projected Cap was had some influence, Draft History even less so. Team salary and position (consistent with the linear models) were the least important, having no influence in the enhanced model and were dropped.
Note, when the 2014-15 GAR was added to a dataset of non-rookie players and added to the Random Forest model, the importance of GAR was around that of age and did not increase the performance of the model.
The Random Forest model still has trouble predicting very high cap-hits. For example, Patrick Kane and Jonathan Toews and their AAV of $10.5M are considered to be overpaid by over $1M when compared to market value, Toews slightly more with a 78 predicted CrowdScout Score compared to Kane’s 86. With a predicted CrowdScout score of 88, Alex Ovechkin makes $1.2M more than the model would predict. On the flip side, Justin Abdelkader was underpaid by about $2M in the Random Forest model last season. Interestingly, this summer he received a raise of almost the same amount. Patrick Eaves was also underpaid last year by over a million. He was notably underpaid in both GLM models, using Elo and GAR – sporting a healthy predicted CrowdScout score of 58 and 2014-15 GAR of 13.8 he was a 31-year-old winger paid a paltry $1.15M. Other players making about a million less than predicted during the 2015-116 season were Morgan Rielly, Mattias Ekholm, and Kyle Okposo – all of who received healthy raises this summer.
At a team level, the Islanders, Hurricanes, and Predators led the way in contracting players for less than market value last year. The Islanders received strong value from pending free agents Nielsen and Okposo. The Hurricanes had positive value across the board less Skinner. The Predators are frugal by design, extracting value from their young defense. Note that this analysis fails to include goaltending, where Rinne and Ward would move each team down.
The Avalanche, Flames, and Rangers had the worst value from their contracts. Colorado has very few good contracts when compared to the market. The Flames had a few bad contracts on defense and did not receive an sort of bonus from having top players on ELCs. The Rangers were also pulled down by an overpaid defense.
Also note that the error terms here are small and it wouldn’t take much to move a team up or down the rankings. It also demonstrates that the future is tough to predict and few managers can avoid making salary allocation errors every now and then.
It is critical that NHL franchises effectively manage their salary cap in order to be viable. It appears a model and can explain about 95% of the market for NHL talent. This feels about right, some deals are visibly off from the start, some valuations will change with time, but most of the time teams and agents are in line with what the market would expect as a function the player’s age, draft year, position, term, team salary, and ability. In this study, it appears holding up data from the CrowdScout project to objective on-ice features provided a good proxy for ability.
The Random Forest model is quite strong, with 5% of contracts left unexplained. Some share of this is mis-valuation of the player and market, some of it is inaccuracies of the CrowdScout rating and modeling, some of it might be unexplainable (discount to stay close to family, injury or character concerns, etc.). We are specifically interested in quantifying the first term – how teams might misvalue certain players. With a relatively small error term, it is possible the majority of these residuals are made up of the unquantifiable and the majority of team-level differences is noise. Eye-balling teams in the top 5 and bottom 5 by spending efficiency passed the sniff test, but most managers and agents settle on deals that are in line with the league market.
Finally, it’s important to remember this is a study in what we expect a player’s cap-hit to be given market conditions, rather than what they should make in a free-market NHL. Players on ELCs often provide teams very good value relative to their contract, but in this analysis there is no bonus for production from ELCs since the player age and contract length often signaled when players are likely to be on an ELC. The expected AAV is also calculated with perfect information at the start of the 2015-16 season, where deals have to project out future performance during contract discussions. This alternative analysis might be looked at in the near future, expecting considerably larger error terms – longer timelines introduce more uncertainty.
It’s also important to remember that this analysis leans on ever-maturing data from the CrowdScout project. As expected, it contains enough reputational information to help build a stronger model than using GAR from war-on-ice.com as a proxy for ability. It is possible that this data contains systemic bias – if a higher salary caused the CrowdScout Score to be higher, rather than them simply being correlated. A simple plot (below) suggests that the CrowdScout Score often differs from AAV, which is encouraging. Given that, I hope this unique dataset and model will prove helpful in evaluating contracts and cap management in the future.
Huge thanks to asmean to contributing to this study, specifically advising on machine learning methods.
 If a team can consistently acquire and retain talented players who consistently play above their expected contract, they will be operating with a significant advantage. If your 24-year old top 4 defenseman is signed at $4.5M AAV and most comparable players are averaging over $5M AAV, more depth or quality can be acquired elsewhere. If your mid-range starting goalie makes $6M and the goaltending market falls out and sees comparables average less than $5M, you are at a disadvantage. Easy enough.
 In absolute terms, that’s a very tough question. The NHL labor market is a long way than the economic-textbook-supply-meets-demand-free-efficient-market. There are salary floors, ceilings, team floors, team ceilings, bonuses, rules regarding age and accrued seasons. Deals are often made with little certainty of future performance (read: teams are poor at forecasting individual player career arcs), and often see a trade-off in salary and duration. An efficient market this is not.
 A model is only as good as its target variable, and I believe any comprehensive analysis of ability should attempt to answer that question or one similar to it. Hockey is a goal-scoring contest first and foremost, but the ultimate goal (winning the championship) resembles a marathon of hockey games. This is a tricky distinction since it invites past winners to be overrated, when in alternative histories they did not win, thanks to luck. This is certainly a deeper philosophical question, but an analysis in market value should only care about results.
 2015-2016 GAR has not or will not be posted.
 Opposed to simply over-rating a player based due to reputation and other biases. The system is designed to reward those users who have the foresight to forecast declining ability of a player getting by on reputation alone. Some reputational bias will be present until the time a sizeable crowd of excellent forecasters exists.
 Presumably when most players were under contract for the 2015-16 season.