A significant news story these last many hours has been Google’s refusal to hand over search-related data to the Department of Justice. We can debate the methodological issues associated with the DOJ project for all of the five minutes it takes to realize it’s fundamentally flawed, but I want to focus on the news coverage.
In every press account I initially saw, the headline focused on Google’s refusal, not on the other search engine’s (e.g., MSN and Yahoo) compliance. Is that bias a result of the popularity of Google? Is it reflective of what I perceive to be tendency to support enforcement over privacy issues?
Before asking those types of questions, however, it is important to make sure the facts are straight. Is Google receiving a disproportionate share of the attention of this story?
In support of this is the fact that the media did not pick up on the story until Google made it’s latest refusal, rather than when the other search engines complied. This, however, can be explained away by the fact that the press relies on the government to report the news before they report on it.
A better method would be a large-n study of news headlines. Thanks to the Internet, we can do that with news aggregators. I began with News.Google, the news aggregator I use most often. At first, I searched for each of the major search engine companies in the titles of news stories. Doing so yields these results (at 1210a, January 21):
Google: 7,210
Yahoo: 3,680
MSN: 241
There are two immediate problems with this procedure. First, it does not only identify news stories about the request for search engine data. While this applies equally to all three companies, today marked the single biggest decline in Google’s stock since its IPO–a development that is more the result of tech’s recent earnings floundering than the DOJ issue. The second problem is that I was using a Google service and it’s possible that the service biases itself.
Before abandoning News.Google, I counted the number of headlines that contained each of the three major search engines AND were related to the DOJ matter. I sorted by date so that articles would not be grouped together based on relevance and used the first three pages for each of the three search results mentioned above. Here are the results:
Google: 23
Yahoo: 9
MSN: 10
While doing so, I found a potential problem. Some of the news stories aggregated by Yahoo! News has “Yahoo! News” added to the end of the headline. So some news stories in the Yahoo search were completely unrelated to Yahoo at all, but were still included. That means Yahoo’s total for Measure 01 is probably too high (results were included that are unrelated to Yahoo) and for Measure 02 it is too low (the same number of headlines were checked for content, but Yahoo’s returns had less Yahoo-related articles to count from).
Next step, try again using a generic search term. I choose: “search engine” + justice. Using the first three pages of results, sorted by date, I tallied the number of headlines that had to do with the DOJ issue and included each search engine name. The results:
Google: 26
Yahoo: 2 (both of which also had Google in it)
MSN: 0
But this doesn’t address the possibility that each search engine biases results with its name in it. I am not aware of news aggregators that are not run by or based upon the major search engines, and a quick Google search only turned up RSS aggregators. So I test each of the other two search engines to see if we find a common trend among all.
Yahoo offers a more sophisticated news search tool than Google. Interesting. I can search the headline and search the body (among other things). For each of the three major search engines, I search the headlines for the search engine name (e.g., Google) and search “any part of the article” for “justice”. Here are the results:
Google: 701
Yahoo: 15
MSN: 3 (a search for Microsoft yielded 17)
Now counting the first three pages of date sorted results for “search engine” + justice:
Google: 23
Yahoo: 2
MSN: 1 (the headline was actually “MS”)
Looks like Google still comes out on top. Now for MSN News. I could not figure out advanced search techniques (their “Learn more about our service” page was a contact form for problems). So I first did a general news search for each of the three major search engines:
Google: 125,227
Yahoo: 183,223
MSN: 130,412
Interesting in that they are all closer to each other, but there are too many problems with this method, some of which are outlined above. I then tried what I think is the best measure outlined so far, which is to count the number of times each search engine appears in a headline about the DOJ issue in the first three pages of a search for “search engine + justice. Note that there is no option to sort by date.
Google: 13
Yahoo: 1
MSN: 0
While each of these measures has problems, the results suggest that the coverage was skewed toward focusing on Google’s non-compliance rather than the other companies’ compliance. This may be a result of the fact that the event that was being covered involved Google, but this does not explain a lack of earlier or multi-dimensional coverage. Furthermore, I learned some interesting techniques when using news aggregators.
When companies like Yahoo or MSN are compelled by the government to hand over information, they’re required to do so in a secret manner — thus nobody knows it happened, especially not news reporters. The only reason this story is now public is because Google resisted. We have no idea how common this practice is but we know it’s happened a lot more since Sept. 11, as gov’t investigators have scooped up huge amounts of travel records, etc.
I am not aware of any rule requiring companies to maintain secrecy when asked for data and I did not see any mention of a secrecy clause in the reports on this story.
More likely, Yahoo and MSN either did not think it would be an issue or did not want to discuss the turning over of possibly private data. Neither of those two possibilities is relieving.
There was an interesting Wall Street Journal–I think–article on this recently that goes into details about each side’s arguments. Google’s legal defense is built entirely on the premise of trade secrets, which, apparently, has stronger weight in these types of cases. It could also mean that they do not actually believe the privacy argument.