Until recently a players position in the NBA was never something that was argued or something people tried to change. However, about 6 or 7 years ago teams started to realize that peoples skill set might be more effective a way to put players into position. Generally a player’s position has been decided by size. A shorter player being a point guard, a bigger player being a center and everything in between. So players like PJ Tucker who was a small forward just 4 years ago is now playing center or Power Forward for the Houston Rockets exclusively because he is a great rebounder and his weight makes him a good defender in the post. So what I am aiming to do is see which players now, and 20 years ago have been playing the wrong position.
I will do this by using their statistics; box score stats like points and rebounds as well as the shooting position stats. The shooting position is the number of shots a player makes from different locations on the court, 18 in total but I will get more into that later. I will be training my model using those features on players over a 20 year span.
The 5 defined positions in the NBA are the following, Point Guard, Shooting guard, Small forward, Power Forward, and Center. Besides finding a good model to classify positions I want a model that is sometimes wrong but for good reason. This will help teams build based on skills they need on their team rather than what the predefined roles of the players actually are. So if a Big has the skills of a wing player, than the team could look to get a player with center skills to complement him.
Although the defined positions are those 5 I just named, what I will be doing is classifying players on 3 positions; PG, wings and bigs. The modern NBA game virtually does the same with how they scout talent and build their teams. This is because Shooting Guards and Small Forwards as well as Power Forwards and Centers have very similar skill sets.
So I first start with gathering the data, getting over 650000 shots from about 2000 players, as well as their yearly per game averages over a 20 year span. Next I generate features, mainly the shot charts, use eda to understand those features then implement the features to generate baseline models to work off of and finally tune those models to get the best accuracy and analyze their results.
The features I ended up using were shot charts primarily based on this beautiful breakdown of the court I put together. This is how I ended up breaking down the shooting locations. Corner 3s, deep 3s, short 2s, restricted area and everything else in between, 18 locations in total. I also included rebounds and assists as they are important skills for bigs & guards respectively, as well as the other general box score stats like field goal %, points, steals and blocks. Next I wanted to look at how the shot charts of different players compare.
One thing I thought would be very important was comparing shot charts between players, I wanted to make sure my idea made sense before I moved on.
I started with 2 star players, 1 wing, James Harden and 1 big, Anthony Davis. You can see some clear differences here with harden shooting much more from the outside. And then when you take a look at 2 players within the same position you see much more similarity between them. And it’s also important to note this was a comparison between a Power Forward and a Center which further gives credence to why I went with the 3 position approach.
I ran many different models for this to try and get the best possible score. In the end the 2 best were the Random Forest Model and the Support Vector Machine. The Support Vector Machine did the best between the 2 getting almost an 85% accuracy score.
So we can see that even without balancing the data it does a very good job not over-predicting any class, and we can see by the feature importance that rebounds and assists are important, which I had mentioned earlier, as they are impactful skills for bigs and pgs respectively and then a ton of different shooting areas come into play. Interestingly short wing 2 point shots have a high score among the data. One reason for this is that it differentiates between bigs and everything else. Short mid range 2s are very commonly shots for wings and Point Guards as bigs that are that close will likely try to get into the paint for layups and dunks or post ups from closer range.
So now that the classification was done I wanted to look at using these same stats to compare players regardless of position. The reason for this is that some players fit very well with a team and because contracts generally don’t run much longer than 2 or 3 years being able to find similar players is really important for teams to build every year and maintain continuity with their style.
So using euclidean distance as a measure of similarity i found the most similar player for every player in the dataset. I want to note that I only used the 2019/20 season for this because I wanted this to be for active nba players only so that current teams could be able to use this.
So here I’m entering Bradley Beal, a wing player, and he is classified as a pg and that’s because the starting pg of the Washington Wizards missed the whole season with an injury so he had to take that responsibility of being the creator of the team, but my model still compares him to another wing player Jayson tatum because of his style of offense.
In the end the models did a good job of classifying and comparing players but its real value is finding the players it misclassified for a good reason. One example of this is Dirk Nowitzki who was a 7 ft big but was classified as a wing. And the shooting chart alone should tell you why, with such a versatile shot profile. So managers knowing this about him should focus on getting another big man who can complement his wing skills instead of more wings. And that’s exactly what the Mavericks did in 2010 signing center Tyson chandler and winning the championship that same year. This is the strength of the model, finding players who may fit into a different category and building around that.