Winning basketball games is hard, and every year 30 teams put in a lot of work from a lot of different people to do just that. With this project I’m trying to help by finding some factors in games that lead to these wins. I took data from teams from the last decade and ran models to predict wins based on those stats and compared them to the actual results. After obtaining a error score I was happy with i analyzed the features of those scores to see exactly which of those factors had the largest contribution with the idea that teams can use those to build styles that are more conducive to winning.

The data

Exploring the Data and Modeling

This was definitely the most surprising, while yes older players are generally more experienced I think it goes a little deeper than that. I think it has more to do with the tanking in the NBA. Generally teams that are on a downward trend end up trying less to just win games and instead trade away older players and immediately lose games on purpose in the hopes of getting a good pick in the draft. Trading away older players will make them worse and younger and then getting draft picks who come in at 18 or 19 will keep the team young and since they traded the older players they will likely take a bit of time to build up again.

True shooting and Defensive Field goals make more sense as they are measures of efficiency, on both sides. This is important to note though as points was a lower strength feature of the model. So the actual act of scoring isn't as important as scoring at an efficient rate.

In the end the model did a fairly good job, getting a root mean squared error of 5.6 wins. Which means that with the average wins being at 41, being only 5 wins off on average is good seeing that only 13 features were used.

Problems with the Model

The other problem is the style of play.

3 pointers completely change the NBA now compared to what it was 10 years ago. Looking at the chart above you can see a really large change in the percentage of the shots that are 3’s. Going from 20 to almost 40 percent in just 10 years. So the stats that matter might have drastically changed from 10 years ago to today. But limiting it to just 1 season of data only gives 30 data points which is not enough to properly fit and train a model. The answer to this might just be to classify wins and losses of individual games for a season which would give a lot more data points with similar results.