Poe in Cyberspace (Spring 2009)

Google's First Trillion Pages: Web 2.0 and Beyond

Last summer, the number of Web pages in the Google index quietly passed the one trillion mark. Unimpressed, some information experts regard this first trillion as only the tip of the iceberg. The “Deep Web,” believed to be several times larger than that, cannot be detected by Google’s scanning devices, which are known as crawlers or spiders. Poe researchers already know that Google does not search proprietary Web sites, including subscription databases such as the MLA International Bibliography, Project Muse, and Jstor. Nevertheless, in apparent violation of some law of compensation, as Google grows larger it may actually also be growing better. When Google began, it pioneered in tapping the human intelligence of Webmasters in using their links to other sites to determine search rankings, doing so at a time when its rivals were still relying on machine intelligence alone to analyze site content. In the spirit of the new two-way, interactive Web 2.0, Google is now developing several fascinating procedures designed to improve Poe (or other) Web research.

1. Google knows about subjects. Just as academic researchers face a long learning curve in mastering their trade, so Google is programmed to learn about subjects. We take it for granted that Google knows that the string of characters in the search phrase “Edgar Allan Poe,” signifies a particular author. We may not realize, however, that Google is constantly learning that Poe research is likely to entail particular subheadings, such as poetry, short stories, criticism, biography, or the titles of particular works, all of which it may now offer as clickable options.

2. Google knows about your computer. If you are reading this, surely you have already used Google to search the Web for Poe information. Some of this information may be stored temporarily on your computer in a file called a “cookie,” which is how it “remembers.” If you have access to two computers, whether at home, the office, or on the road on which you have previously run different Poe Web searches, try this test. Run a new but identical Google search on both machines, and examine the results carefully. Probably they will vary in some detail, perhaps considerably, depending on the history of the two computers.

3. Google knows about you. If you have an account on Google or Gmail (they're gratis), some of your searching history will be stored in the Google database when you log in, no matter where you are or which computer you use. Ask another user to duplicate your search with identical details but on a different machine: it may have unexpectedly different results.

3. Google knows about your Poe research on your local desktop computer. No, it’s not magic, and it's not an invasion of your privacy. If you activate Google Desktop (also gratis), it will index all the files on your local computer so that you can look up and find all the files that contain the distinctive words or phrases that you request. This can be extraordinarily useful in retrieving quotes, commentaries, and references that you know you typed into the computer but can't quite remember where or when.

4. Google can combine information from Google Desktop with your Google searches. Google Integration allows your Web search results to be displayed on the same screen display as your local personal computer search results. Although anyone can obtain Web search results, your personal search results via Google Desktop are never shared, thus protecting your privacy. If you use several computers, as I do – at home, at the office, and while traveling – the unshared personal search information will be specific to each computer. On one older desktop machine I've used for quite a while I’m told that I have 15, 587 “Edgar Allan Poe” results indexed on Google Desktop, whereas on my newer laptop I have only 204 results stored in its Google Desktop index. Google services are always learning: while using yet a third computer, no Poe Desktop results were shown at first, but after several minutes of operations Google apparently updated itself, reporting 4.709 Desktop Poe results. Such sample Desktop results are displayed chronologically by default. If the initial samples are inadequate, ask to see all the results as a local web page that only you can see.

5. Google contains multitudes. Since Google is active in many different areas, such as its massive book scanning project, its huge collection of videos (it owns YouTube), and its involvement in indexing the news, don't be surprised if your Poe Web search results go beyond the Internet to include books, videos, and news. (We'll only discuss Google Books here.) Although the Google Book project now embraces an estimated 7 million volumes, its main weakness are the lack of a general catalog (see The New York Times, 2 February 2009), the fact that many texts are available only in "snippet" form, and that many of its descriptors, especially in multi-volume sets, are woefully inadequate. In addition to its extensive repertory of works in the public domain, Google Books hoped to make available many copyrighted works under "fair use" provisions. However, but a legal challenge was brought by the Authors Guild and the Association of American Publishers (AAP) on behalf of the rights holders. In October 2008 Google agreed to provide a fund of $125 million as part of a settlement, and a preliminary court approval followed in November 2008. Further hearings, scheduled for June 2009, may lead to announcements of agreements pertaining to personal access to books, institutional subscriptions, terminals in American libraries, and purchase arrangements for printed and online books (http://books.google.com/googlebooks/agreement/)). Meanwhile, Google Books already provides a remarkable tool for historical research in much printed matter published before about 1920, the beginning of current copyright restrictions. The contents of the Google Books project are fully searchable and even permit date limits to be imposed, for example, in studying Poe's vocabulary and his possible sources. Some Google Book results are displayed in Full Text mode but most still appear in Limited View as snippets.

6. Google creates automatic timelines. Google can assemble random references to "Edgar Allan Poe" from 1809 to 2009 as a bar graph with peaks for Poe’s birth in 1809, his most productive decades, his death in 1849, and finally, his 200th birthday celebrations in 2009. Clicking on any decade opens it into years, and clicking on any year opens it into months. It is a timeline of information, not of source dates, and biographical material dominates. For example, the most frequently mentioned year of the 1840s is 1849, the year of Poe’s death, and the most frequently mentioned month in that year is October, the month he died. Although random and computer-selected, the items do convey some sense of chronology. A small timeline may appear at the end of the first page of your matches for the search phrase "Edgar Allan Poe." The default display, 10 matches, can be expanded to 100 matches. To go directly to the expanded timeline, enter the search request "Edgar Allan Poe view:timeline."

Having acquired a huge amount of information in recent years about how its search engine is used, Google has found a number of ways to analyze user habits. One of the most curious is Google Trends, which goes back to 2004. Warning: what follows may seem implausible, bringing to mind Mark Twain's remark (erroneously attributed to Disraeli), that there are “liars, damn liars, and statistics. ” To begin with, according to Google Trends, for the last five years Poe searches on the global Internet have been highly seasonal, peaking in October and falling in the summer. That seems plausible enough, perhaps reflecting student demand as the Fall semester gets into high gear. What is not expected, however, is the Google report that global requests for Poe are now mostly in languages other than English, which has fallen to fourth place in the Google statistics, the first three languages being Tagalog, Spanish, and Swedish! (This pattern began several years before the Madrid Poe conference scheduled for May 2009.) A similar pattern comes up whether the search is for “Edgar Allan Poe,” “Edgar Poe,” or just “Poe.” Among countries, the United States is unexpectedly in eighth place, behind the Philippines, Mexico, Columbia, Chile, Peru, Argentina, and Venezuela; the only American city in the top ten is Miami, placing ninth, perhaps explained by its large Spanish-speaking student population. If the Internet "fact" of the national distribution of Poe Web searches is believable and reliable – I received similar results on several different dates – then Web searches for Poe may no longer originate primarily in the United States. What this suggests about Poe’s readers is monumental to contemplate. Go to http://trends.google.com and test it or yourself.

What seems more less implausible in Google Trends is its analysis of the frequency of various Poe search phrases: combinations of “Poe raven” and “Poe toaster” (the secret annual decorator of Poe’s grave) were among the most popular. “Poe poems” was more popular than “Poe poetry,” his poems more than his tales, and his death more than his life. Insufficient statistical demand existed for Poe criticism or Poe research to produce a report. Here’s some distressing news, if true: no matter which search pattern is requested, Poe queries on Google have fallen each year since 2004, declining by as much as 50% by 2007 and a bit further in 2008 (offset somewhat in 2009 by his bicentennial).

Finally, let’s examine the Google interface. Although most users regard Google as a simple search engine, the rich interface is well worth studying. Controls are placed in all three parts of the screen, above the search box, around the search box, and below the search results. At the top you can chose among Google's several domains, such as the Web, images, news, videos, and more. (Yes, "More" leads to "Still More"!) The search box has a definite syntax, with symbols for phrase (" "), site (site:xxx), thesaurus (~word), no synonym (+word), exclude (-word), OR (|), and complete the phrase (words *). The operator AND between words is assumed, and searches are not case sensitive. The search box options are Advanced Search, which can limit the search to a particular Web location, such as eaopoe.org, and Preferences, which selects the languages to search and sets the number of search results to display on a single page (up to 100). The search box itself may have domain options, such as Web, Books, or Videos, depending on your past usage. In some interfaces, Google becomes a search portal by inviting you to try its several search rivals, Yahoo, Ask, AllTheWeb, Live, Lycos, Technorati, Wikipedia, Bloglines, Altavista, A9, and GoodSearch.

To the right and across from the search box, the Definition option leads to up basic information on the current search phrase on Answers.com. Although your browser may report some 4.2 million results for “Edgar Allan Poe,” don't worry about how to see them all since neither Firefox nor Internet Explorer can display more than about 718 matches. (If you find this hard to believe, try it yourself.) To see other matches, narrow your search terms. At the bottom of the screen, Google will suggest some Related Search terms, videos you can view, and interactive features, such as Add a Result or see SearchWiki notes, which provide opportunities to rank, remove, or annotate Web pages, your input going only to the Google server.

As you try Google from time to time, you will notice that its interface constantly evolves in the spirit of Web 2.0, perhaps differing in some details from the descriptions in this report. It is not too early to speculate about how Google will deal with Web 3.0, the future Semantic Web, in which hitherto incompatible computer systems will exchange information about Poe and computers and people will be able to communicate with less difficulty about him. Already Google is on the road to Web 3.0 in developing different systems that can not only work well with each other but also can communicate well with us in a two-way manner, that is to say, interactively. (For an introduction to the Semantic Web, see an article in the January 2009 Scientific American at http://www.sciam.com/article.cfm?id=semantic-web-in-actio&print=true).

Heyward Ehrlich, Rutgers-Newark
Poe in Cyberspace columns are online at
http://eapoe.info.