Senin, 20 Desember 2010

Having fun with Ngram

One day last week on my way to the gym, I heard this NPR story on Google's Ngram Viewer. This is a tool that anyone can use to search for the usage of words or phases across 4 centuries of books (5 million books containing 500 billion words). According to the NPR piece, Google has digitized 15 million books, which is about 15% of all books ever published.

So I decided to give it a try.  Here I typed in the word "MRSA" and this is what I found:
The axis font is too small to read but the x-axis starts at 1800 and you see the blue line begin to emerge at about 1960. The far right x-axis gridline is the year 2000. There are 40 years between each x-axis gridline.

Next, I searched two separate terms, "influenza" and "MRSA." In this graph, influenza is the red line and MRSA is the blue line. The big red line spike occurs at 1920.









And here, I added "malaria" (shown in green) to the mix:









So looking at the big picture (at least as far back as 1800), gives us some perspective--perhaps more of a public health view.

Can't you tell that I'm now on vacation and have too much time on my hands?

Tidak ada komentar:

Posting Komentar