Blogging thesis progress

\ After discussing it with my advisor, I’ve decided to start blogging about my work on my master’s thesis. I’ll start things off with an post about my research questions.

\ The Internet, particularly the world wide web, is an increasingly important part of how people seek out political information. According to results from a 2004 Pew/Michigan survey, 53% of Internet users had gotten news about the Iraq war online, 35% of Internet users had gotten news about gay marriage online, and 26% of Internet users had gotten news about the debate over free trade online. Early theorists of the Internet championed it as an egalitarian medium; since the cost of producing a web site is much lower than traditional publishing, and the potential reach of that web site is much greater, the Internet would expand the political voice and knowledge of the average citizen. As Howard Dean’s campaign manager Joe Trippi effused, “The Internet is the most democratizing innovation we’ve ever seen, more so even than the printing press.” Others have taken a more pessimistic view of the same phenomenon. Sunstein and Putnam, for example, fear that with public attention diffused across millions of web sites political discourse will become more polarized.

\ Another possibility is that the Internet might not be so egalitarian after all. To understand why this would be, it’s necessary to reflect on the structure of the web. The element tying one web page to another is the hyperlink. Clicking a hyperlink is what allows an Internet user to “browse” from one web page to another. Across the web, hyperlinks follow a power law distribution . A power law distribution is highly inegalitarian; this means that a small number of web sites are the destination of the vast majority of hyperlinks.

\ The distribution of traffic to web sites also follows a power law. To understand why this should related to the hyperlink structure, it’s necessary to think about the ways Internet users discover web sites. If a user already knows about a web site, they can visit it directly. If they don’t, they can discover it via a hyperlink from a site they already know about or by using a search engine like Google. Both of these methods favor the discovery of highly linked-to sites. When browsing the web, the more hyperlinks there are to a site the more likely a user is to come across one of them. When using a search engine, most users only visit web sites on the first page of results. The release of search data for over 600,000 AOL users showed that 90% of clicks went to the results from the first page, 74% of clicks went to the first 5 results, and 42% of clicks went to the first result. This is significant because search engines’ rating algorithms give heavy weight to the number ofhyperlinks a site receives. Although the exact algorithms vary from search engine to search engine and are often secret, search engine result ordering is barely distinguishable from simply ordering web sites based on the number of hyperlinks to them.

\ Using a data set that meshed data from an Internet service provider about the sites their users visited with data on the number of hyperlinks to those sites, Matthew Hindman found a .704 correlation between the amount of traffic a site received and the number of hyperlinks to it. Hindman also found that the power-law distribution of hyperlinks on the web as whole also applies to political content. Using techniques I’ll discuss in future posts, Hindman examined communities of web sites dealing with abortion, the death penalty, gun control, the presidency, the congress, and politics in general. In all of these cases, a power law fit the distribution of hyperlinks with an R2 greater than .90.

\ Despite the Internet’s importance, little research has been done examining the sources of political information to which Internet users are most readily exposed. Hindman’s research tell us that the visibility of web sites on at least some political issues follows a power law, but it does not tell us anything about the characteristics of the most visible web sites relative to the rest. What kinds of organizations are behind the most visible web sites on an issue? What kinds of information is presented by the most visible web sites? Are the viewpoints of the most visible web sites representative of the entire set of web sites on an issue? Do the web sites about an issue cluster together based on ideology, type of source, or some other factor? These are the questions my thesis is designed to address.