Most of us assume that the better the product is, the better it will sell. This carries to video game sales, where conventional wisdom is that the better a game is, the better it will sell. In a recent report, which I first learned about via a Joystiq post, SIG analysts Jason Kraft and Chris Kwak test this assumption.
To simplify the report, they collect data on 275 games published for the Sony PlayStation2 console. Running a number of basic OLS regressions, they find no statistically significant relationship between ratings and sales. For those wondering about the specific methods and sample, follow the link to the Joystiq post above to receive a copy of the report, but my opinion is that they pass on methodological grounds (this does count for something). It should also be noted that they do some clever testing and sample groupings, even if I am not going to discuss them here.
But I do have a criticism of their analysis, and it is one that I frequently use with most statistical analyses done in the social sciences. I do not want to tip my hand, yet, because I am going to request their dataset and run some regressions myself, but you might be able to figure it out if you have been in my classes and/or study this chart taken from their report (the y-axis is millions of units sold).
For now, however, let’s assume that their findings are solid. What explains the insignificance of the relationship? Unfortunately, and this is particularly odd given that Kraft and Kwak are in the business of predicting sales patterns so they can make investment recommendations, the report lacks an answer.
Perhaps the answer can be found in the movie industry. In the past, I have seen research that argues that the older people become, the less they rely on press reviews of movies and more on their friends’ opinions. Might the same dynamic be found with video games? I suspect so.
Kraft and Kwak’s analysis does not let us test this hypothesis, although it does support the argument. That is because the data they collect is for the past five years and for games published on the PS2. As has been widely reported, video games are now dominated by late 20 somethings (the average age of players is 30), rather than teens. This is the age group that visits the movies based on what their friend says, not what the newspaper writes. With this kind of sample group, and using the age-dependent argument, we expect a small or non-existent relationship between video games’ ratings and sales.
This, however, does not allow for any conclusions. Instead, we need a larger data set. This could include breaking data down based on the age of the primary player’s age for each game. A less satisfactory way would be to collect data on the Nintendo GameCube or data from the 1990s (as opposed to the 2000s). The problem with the GC data is that even this group is old. The problem with the 1990s data is that reviews of video games were minimal, and so there was a much smaller chance of people receiving ratings reviews.
Unfortunately, finding that best dataset would be a lot of work and time, neither of which I can afford to do. Despite these problems, Kraft and Kwak’s research is interesting, fun, and well done. I will be sure to report the results of my own analysis if I am allowed access to Kraft and Kwak’s data.
I think there are huge measurement issues with this study. First of all, one cannot help but notice the genre issue. Grand Theft Auto is a very different game from World Soccer Winning Eleven (Pro Evolution Soccer to you non-yanks – which brings another point up, these sales are US only, right?). World Soccer’s ratings are high because it is the most realistic sports game ever made. If you don’t like sports, or you like sports games that are fun but not realistic, then you are not really part of Konami’s target market. In other words, Konami may sit there and say there are X many people interested in owning a soccer game, and most of them want one that feels as close to the real thing as possible. If we make the game for Z dollars, we need to sell it to a certain percentage of X to really make it worth our while. Do you make Winning Eleven with its care for quality or do you make a flaming piece of garbage like EA’s FIFA tends to be? There’s your study: When games compete in the same year with the same genre, who wins and why? If one looks at EA’s battles with 2k sports and Konami we see that EA had to take longer to devlop their games for some of them and they had to figure other ways to monopolize the market in order to compete. So quality does have some pull or EA would just sit back and mash out the same crap it does when it is unchallenged.
I guess part of the question is going to be: who is doing these rankings? Seriously, Madden 2006 is somehow ranked lower than Madden 2005? It’s almost literally impossible for this to be the case, given that there was nothing taken out of 2005 to 2006, but more features and better AI were added.
On top of that, rankings mean precicely dick. If someone reviews a movie, you can at least trust that they went and sat through it. But, since most of the people who do rankings are 35 year old white males who are sitting in a dark room, holding their cock, while being nostalgic about Contra, that doesn’t normally occur in video game ratings. Most rankings are done without actually playing through the games – and relies heavily on the particular genre bias of the person reviewing. IGN, notably, tends to overrate RPGs and underrate Sports games. They also horribly overvalue graphics and sound, which are nice, but nominal with respect to actual gameplay. In computer games (which I assume would have a similar trend), games like Counter-strike were basically dismissed out of hand by most reviewers (despite its engine, half-life, being ‘game of the year’), while more graphics-heavy shooters were routinely fellated despite their playability being laughable. Battlefield 1942 was touted as this amazing, amazing game by IGN for the entire year of its release, and got almost a perfect rating – yet it was played by fewer people than Day of Defeat because it had issues that prevented it from being something that was engaging for more than about an hour a day.
I suppose my point is – they’re conflating “ratings” with “quality” which is a huge mistake.
Steve,
Good points.
The report does not specify US or worldwide sales, but it is safe to assume these are US-only figures. Focusing on US sales is probably the best way since it serves as a shortcut to hold many things constant. Including Japanese gamers, for example, into the mix might prove muddling because of the significant differences between preferred video game experiences with US players. I could go on about this, both explaining it more and arguing that culture is only good as an error term, but you can probably do it in your head for me.
The analysts also tested the relationship between sales and ratings within genres. Here, too, the relationship did not hold. The only place where they sometimes found the relationship to exist was within a given franchise (e.g., Tiger Woods, NHL, NASCAR). Do not place too much weight on this, though, because (1) this is the only area where they found a correlation and (2) it is an incredibly limited test group (e.g., small n). Still, it is worth noting in light of youru comments.
/Jason
Greg,
You are right to wonder about the rating data. They use [url=http://www.metacritic.com/games/]Metacritic[/url] (Note: That was the first test of using a new plugin that permits phpBB code), which I heard about some time ago, but only talked about and visited recently. I know almost nothing about this site, so there may well be problematic biases with it. Its methodology, aggregating/averaging large n reviews, is theoretically well-founded and better than using a single source (for the reasons you outline).
I know my response does not “answer” your questions and criticisms, but it does provide a bit more information.
Also, I am not here to defend the report, just point it out so do not shoot the messenger!
/Jason
in agreement with your words, i think that they should discount, say the first 3months of sales. after 3 months, the games are not purchased with out playing them, purchased more on friends reveiws, and actuall game play. Also the idea needs adjustment for amount of time on the market, we could create a value of games sold per time on the market and plot this value against rating. Additinally it looks like we may need to set a threshold value, take the +b and remove everythign under b, then use a linear fit with whats left, or better yet, maybe it isn’t linear. Lots of problems, and it wouldn’t be dificult to fiddel with different analysis to find one that supports a link between sales and ratings. who are these guys?
are they fucking with us?
I’m most amused by the comments in response to this post. No one has read the analysis, except me, and yet some, particularly Kirk, feels like attacking the analysts. Please, before getting nasty, read the report. And if you don’t have the background to make judgements regarding statistical analysis, then maybe it’s best if you either learn or just sit on the sidelines.
What is your theoretical reason for discounting the post-three months? All the analysts are doing is testing an assumption people have about sales. Therefore, it makes sense to include all sales in their analysis (although this does not happen because they are using a five year time window).
In any event, 60 percent of a game’s sales are within the first three months, so the data does reflect your preferences to some degree.
The rest of your comments are a bit incoherent. But yes, it’s possible to fiddle with the analysis, which is exactly what you are suggesting. As with everything, it’s important to examine why and how the analysis is done to make sure it’s appropriate. But there’s no reason to trash the analysis or the analysts without reading the report or looking at the data first. Furthermore, there are more and less constructive ways of phrasing suggestions or questions. More constructive tends to be more useful.
/Jason
i guess reading would help some, i kinda got taken off of the idea that they were testing a beleif of others, not trying to create a working one. All my ideas are based on trying to correlate game quality with sales. What background do you need to make statistical judgements? So mainly though, i missed the point, mainly looked at that graph, and really thought that linear fit looked kinda of bad, and then followed on to… well no wonder it doesnt work. anyways i guess im not really interested enough in this to read anymore, so ill concede i dont know what im talking about. And ill probably stay out of future posts like this, cause the real reason i posted, is the same reason i am now. procrastination.