CMSAC '20: Clustering and Analyzing 5v5 NHL Shot Location Data

I presented my work on clustering NHL player 5v5 shot heatmaps at the Carnegie Mellon Sports Analytics Conference poster competition.


Heatmap-style visualizations are a popular and intuitive way to display shot location data in hockey. However, these visualizations do not allow for large scale player comparison. In this poster, I provide a solution to this issue by clustering NHL players based on their even strength shot locations over three seasons (2017-18 to 2019-20) and analyzing the results.

The clustering consists of two stages. First, I fit a 565 square shot density polygrid to each player’s shot locations through the use of 2D-Gaussian kernel density estimation. Following that, I perform Ward's linkage hierarchical clustering to group players based on their shot polygrids.

This methodology results in 10 clusters that can be broken down into three subgroups: home plate forwards, perimeter forwards and defencemen. Upon forming these clusters, I analyze the results through comparing overall shot heatmaps, the top scorers, and the distribution of expected unblocked shooting percentage, actual unblocked shooting percentage, and player height for each cluster.

Brendan Kumagai
Brendan Kumagai
Hockey Data Science Intern

Aspiring data scientist with a passion for hockey and science.