NBA Games Analysis

Antonio Hila
4 min readJul 3, 2020

Winning basketball games is hard, and every year 30 teams put in a lot of work from a lot of different people to do just that. With this project I’m trying to help by finding some factors in games that lead to these wins. I took data from teams from the last decade and ran models to predict wins based on those stats and compared them to the actual results. After obtaining a error score I was happy with i analyzed the features of those scores to see exactly which of those factors had the largest contribution with the idea that teams can use those to build styles that are more conducive to winning.

The data

The data was gathered from basketball-reference.com. 10 seasons of data for all 30 teams. In the end there was 300 data points for the teams with per game stats like points, assists, rebounds, steals, blocks, shooting percentages, etc. Along with that were also some miscellaneous stats such as True shooting percentage, defensive effective field goal percentage. Since stats such as net rating are stats that are not really changeable from just style since they are an indicator of how good a team is doing those are removed from the data. Instead I am just looking at stats like assists and rebounds that with more passing or effort can be increased in teams. It can even be done with signings. For example if rebounds are important factors than adding a center who can rebound well might be a good move for a team. So in the end there were 13 stats I found to be important to use that were possible to improve upon. I wanted to keep the number on the lower side as more features with only 300 data points wouldn’t have been great for the model.

Exploring the Data and Modeling

As I talked about earlier defensive field goal percentage did well in the model. I found out why when I was exploring the data by finding the correlation of all the features to their wins. and the ones I found to be the best were; Age, Defensive FG % and True Shooting %.

This was definitely the most surprising, while yes older players are generally more experienced I think it goes a little deeper than that. I think it has more to do with the tanking in the NBA. Generally teams that are on a downward trend end up trying less to just win games and instead trade away older players and immediately lose games on purpose in the hopes of getting a good pick in the draft. Trading away older players will make them worse and younger and then getting draft picks who come in at 18 or 19 will keep the team young and since they traded the older players they will likely take a bit of time to build up again.

True shooting and Defensive Field goals make more sense as they are measures of efficiency, on both sides. This is important to note though as points was a lower strength feature of the model. So the actual act of scoring isn't as important as scoring at an efficient rate.

In the end the model did a fairly good job, getting a root mean squared error of 5.6 wins. Which means that with the average wins being at 41, being only 5 wins off on average is good seeing that only 13 features were used.

Problems with the Model

The biggest obvious problem with this is when only using team stats you don’t include the the most important part of the sport, which is the players. The NBA is ruled by the players, where stars have a very large factor in results. All the stats I have talked about just get better by having better players which isn’t that easy to do.

The other problem is the style of play.

3 pointers completely change the NBA now compared to what it was 10 years ago. Looking at the chart above you can see a really large change in the percentage of the shots that are 3’s. Going from 20 to almost 40 percent in just 10 years. So the stats that matter might have drastically changed from 10 years ago to today. But limiting it to just 1 season of data only gives 30 data points which is not enough to properly fit and train a model. The answer to this might just be to classify wins and losses of individual games for a season which would give a lot more data points with similar results.

--

--