The Google corpus

A story on NPR this afternoon made the incredibly slow drive home from Newark in the snow a little more bearable: “Google Books Tracks Cultural Change with Words.” As part of the Google Books project to take over the world … sorry, digitize all books ever written, Google has released an “n-gram viewer“. This is basically a way to search Google’s book collection as a corpus and track trends over time. Both the report and the website are very interesting — although I was amused to note that none of the experts quoted is a linguist (no-one at NPR thought linguists might have something to contribute to a story about, err, words).

So, I thought I’d run a parenting question through the site: apparently my toddler has magically turned into a preschooler (and not a pre-schooler) since he turned 3 last month. I’m not quite sure how that happened, since he definitely still toddles, and he has been and will continue to be pre-school for some time.

According to this graph, toddler has been around since at least the early 20th century, although its use took off in the 70s. Preschooler only emerged at that time, and its use appears to be waning. Of course, this corpus only contains books, which represent a very limited subset of language use. A very cursory glance at the sample concordance lines confirms my suspicion that preschooler is a much-loved term for parenting books — after all, if you’ve already bought the book on pregnancy, the book on newborns, and the book on toddlers, you’re clearly going to be in the market for a book about the next stage … oh, let’s call ’em preschoolers.

At least I have a new way to while away those long snowy evenings during winter break in Delaware.

(And if you too are the parent of a preschool toddling infant baby, check out Barefoot Books — as sold by the other parent of said three year old.)

Author: Nigel Caplan

Nigel Caplan, Ph.D., is an associate professor at the University of Delaware English Language Institution, as well as a textbook author, consultant, and speaker. Nigel holds a PhD from the University of Delaware, a master's in TESOL from the University of Pennsylvania, and a bachelor's degree from Cambridge University. He is currently director of Project DELITE, a federal grant providing ESL certification to Delaware teachers. He also brews beer.

One thought on “The Google corpus”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: