Video games movies have been very popular in general. Media in general that come from movies like Warcraft or The Witcher come out with a large number of people already interested in the game. So what games out there now could follow in the footsteps of those movies and become extremely popular blockbuster type movies. To investigate this we looked at data from 3 main sites; Metacritic, which had critic and user scores as well as the number of reviews, Gamespot which had critic reviews in their api and VGChartz which had sales of games. Since we are pitching this idea to Microsoft we focused on Xbox One games, something that would already be on their platform and Xbox One since it has the newest games.
To start we first had to know what data we would be gathering. We decided to gather data on the following topics, user and critic scores, count of user reviews, genre, publisher, content rating and sales. We may not use everything in the final dataset but we wanted to gather everything and see what would follow some useful trends.
We started with VGChartz, which we could read in directly as a dataframe using pandas. We did that and organized the columns only keeping the important data such as the Global sales, publisher and name of the game. The next was to import all data from the Gamespot api on Xbox One games. From here we imported just the game names and the critic rating of the games. And finally using web scraping we brought in data from Metacritic. We only used games with an actual Metacritic rating to narrow down to games that would actually output some sort of correlation in the data. From there we took the name, genres, critic score and user score from the data.
Cleaning the Data
The hardest part of this was merging the 2 datasets as some of the names between all 3 sites slightly differed from the others so we wanted to get as many connections as possible without ruining the name of any of the games. So to do this we attempted multiple things. We first started by putting all the games in lower case and trying to take out special characters. This helped but taking out special characters was not as useful as expected as it only added 15 games to the total after merging and it ruined some of the game names to the point that they didn’t look like the actual game (N++ just became N). So we stuck with just putting the name in lower case and then merging.
After merging the data frames we then began the cleaning of the total. Removing unnecessary data such as the publisher from extra sites, the additional sales from all individual countries, the amount of critic scores, etc. We also converted the list of genres from all the movies into a dummy list for the top 5 movie genres. This would allow us to continue to narrow down the top rated movies later and make sure that they fell into the list of genres we identified as the most popular. We also converted all the strings of numbers for sales, ratings etc, into integers or floats. Along with that we also converted all games that didn’t have any user rating to a Nan instead of a string of ‘No reviews’,
Analyzing the Data
Now that we have the data ready we can start analyzing it to find what movies we find to be the best to transfer over. We first start with analyzing the correlations between the critic reviews, user reviews and user review count vs the sales of the game. We wanted to see what would be the best starting indicator to base the other analysis off of.
Using the data we see that the largest correlation was between the amount of user reviews and the sales. The next highest was the Metacritic score. So using this information we now know that 2 of the biggest indicators of profits lies in the user review count and the Metacritic score. From there we analyzed the data for content rating, genre and publisher to see what other correlations we could find. What we found is that publisher was very sporadic and had very little correlation to the data. So we moved on to the content rating and the genre of the games.
What we saw is that they both had a strong correlation to our data. For content rating, we saw that mature games garnered more than twice as many reviews per game as any other rating. Since it was such a high number we decided to use that as another parameter for our final metric. Then we looked at the genre data. We also saw there that of the movies that ranked in the top 20 based on user review count all of them were in 1 or more of the top 5 genres. The only games that didn’t were 4 sports games, which we left off of the top genres as sports games would not make any sense to turn into movies.
So now we had our parameters of data to base our final conclusions off of. The 4 parameters we used were; the user review count, the Metacritic score of the game, content rating of Mature, and a genre that would fall into one the top 5 genres. Entering those parameters and checking the top 5 percentile of Metacritic scores and the top 2 percentile of user review count (we used top 2 as it was the most highly correlated with profits). We found 4 games that met the criteria to turn into a successful movie.
Of the 4 games 3 have not been made into any type of media content yet (movies or shows) and the one that has, Witcher, has received massive acclaim and garnered widespread appeal from all fans as the most popular show on Netflix last year. This bodes very well for our data since 1 of the 4 which was converted to a show did extremely well.