Culturomics: Google Tracks Culture Trends Through Books

  • Share
  • Read Later

A screen shot of the Google Books Ngram Viewer

Want to know if the Beatles really were bigger than Jesus? Then head over to the Google Books NGram Viewer.

Using this tool, you can compare words and phrases from several languages across centuries by charting their frequency in books, an exercise that can shed light on everything from rock stars and religious figures to gender studies and epidemiology. (An n-gram is a sequence of words or numbers, i.e. “Civil War” or “1600 Pennsylvania Avenue”).  Simply enter one word, or several. Then click the wildly understated “Search lots of books.” Observe and marvel.

“Men” meets “women” around 1985. “Science” began overtaking “religion” around 1930. There was a very brief period where “Virgina Woolf” appeared in more books than “Mark Twain.” Funny how a low point for “God” in the 20th century corresponds with a high point for “sex” (that would be around 1980).

(See the top 10 buzzwords of 2010.)

The NGram Viewer has debuted in conjunction with the publication of a research article in the journal Science (available after free registration). The study authors — among them Harvard scientists, including Steven Pinker, and folks from Google — have written an intriguing paper on what they call the new science of culturomics and its many applications. Drawing on a database of more than 5 million books — 4% of all those ever published (and only a selection of the total books that Google has scanned so far) — they explain how we can quantitatively analyze the lexicon, changes in grammar, censorship, even fame. They speak of “lexical ‘dark matter’” (most English words, apparently, do not appear in dictionaries), the “quantifiable fingerprints” of suppression (just compare how frequently the Jewish painter Marc Chagall occurs in English versus German books while the Nazis were in power), and the reality of fleeting celebrity. According to their analysis, “People are getting more famous than ever before, but are being forgotten more rapidly than ever.” Similarly, they conclude, “We are forgetting our past faster with each passing year.”

(See the top 10 underreported stories of 2010.)

Of course, the humanities cannot be reduced to statistics. Poetry is not a science. And you need to place results in context; you need to know, for example, that just because the “Great War,” didn’t occur so much in the latter half of the 20th century, that doesn’t mean people stopped writing about it. They were just calling it “World War I” instead. Those behind the study say this tool is meant to supplement other methods of inquiry, not supplant them. “It’s not just an answer machine. It’s a question machine,” Erez Lieberman-Aiden, a co-author of the study and computational biologist at Harvard University, told Wired. “Think of this as a hypothesis-generating machine.”

So check out the machine and plug in what you’re most curious about.  Notice that “Freud” occurs more frequently than “Galileo,” “Darwin,” and “Einstein.” See when “ain’t” was more common than “isn’t.” And resign yourself to the fact that, at least in books, the “Beatles” weren’t bigger than “Jesus.” (via Guardian)