Irregular Verbs
by Mignon Fogarty via Grammar Girl
Why do we say we saw a movie instead of we seed a movie, and did you know that the past tense of the verb “help” used to be “holp” instead of “helped”?
Regular Verbs Versus Irregular Verbs
Most of the time you add -ed to a verb to put it in the past tense; “slurp” becomes “slurped,” “scarf” becomes “scarfed,” and “offend” becomes “offended,” for example. When you make a verb past tense back tacking on an -ed, you’re dealing with a regular verb. It’s the regular way we make things past tense.
English also has verbs that don’t follow this pattern: verbs such as “am,” which becomes “was”; “tell,” which becomes “told”; and “sing,” which becomes “sang.” These are called irregular verbs because they don’t follow the regular pattern.
Think of irregular verbs as relics from Old English.
People who grew up speaking English just know the irregular verbs, but children and people who are learning English as adults struggle with them. As toddlers are learning the language they often say things such as “He breaked my doll,” instead of “He broke my doll,” and “Daddy goed to the store,” instead of “Daddy went to the store,” and adults who are learning English are faced with memorizing a long list of irregular verbs.
The Root: Old English Irregular Verbs
Irregular verbs are relics from the past. Believe it or not, the rules for conjugation (a fancy word for “working the verb”) were even more complicated Old English. Our regular verbs are called “weak verbs” in Old English, but Old English also had at least seven different kinds of strong verbs. Many of our irregular verbs are holdovers from those seven types of strong verbs (1), which is why you can’t see any one pattern when you look at a list of irregular verbs. There are actually multiple sparsely represented patterns. For example, “teach” and “catch” become “taught” and caught,” “choose” and “freeze” become “chose” and “froze,” and some verbs don’t change at all: “Hit”* and “quit” stay “hit” and “quit” in the past tense (2).
The Role of Foreigners Learning English Irregular Verbs
Over time, English became simpler and many verbs were regularized. Languages become simpler when a lot of foreigners learn the language as adults, especially when they’re just learning by listening to everyday interactions and don’t have formal books and classes (3) as would have been the case between Old English and Modern English.
And researchers noticed something really interesting about which verbs stayed irregular and which verbs changed to become regular: the more often a word is used, the more likely it is to stay irregular. In fact, every one of the 10 most common English verbs is irregular:
“I am” –> “I was”
“I have” –> “I had”
“Do you?” –> “Did you?”
These are all easy words: single syllable words from Anglo-Saxon origins (4). Besides “be,” “have,” and “do,” they are “go,” “say,” “can,” “will,” “see,” “take,” and “get.”
Irregular Verb Evolution
Researchers at Harvard found a strong correlation between how often a verb is used and whether it regularized (1, 5). They think these 10 common verbs held on to their irregular form so firmly precisely because they’re so common. They actually compared the process to biological evolution, in which changes–mutations–in the most important genes are the least likely to propagate.
Think about how often you hear the verbs “am” and “have” in everyday conversation. “I have to go now. I am hungry, and I have a headache.” If you’re learning English just by listening, these are going to be the easiest verbs to learn properly because you hear them over and over again.
But if you were someone learning English in the Middle Ages dealing with words you don’t hear very often–”to chide,” for example–you may not be able to remember that the past tense is “chode,” and instead you’d just default to the regular rule and say “chided” on the rare occasions when you need the word; or you wouldn’t have learned the verb and wouldn’t know to correct your children when they defaulted to the regular form. Once enough children grew up thinking “chided” was the normal form of the verb, “chode” was doomed.
Strange Exceptions
Finally, there’s a nuance and a couple of exceptions to this verb-evolution process that are worth talking about just because they’re so strange and interesting.
“Burned” Versus “Burnt”
First, verbs don’t always evolve at the same rate in different countries. As far as I can tell, nobody knows why, but British English speakers have held on to irregular verbs more than American English speakers, which is why they say “dreamt,” “burnt,” and “learnt” in Britain, and we say “dreamed,” “burned,” and “learned” in America.
“Sneaked” and “Snuck”
Second, there are a few rare verbs that were regular but have taken on an irregular past tense. It’s like evolution going in reverse. “Sneaked” is the regular past tense form of the verb “to sneak,” but sometime in the late 19th or early 20th century, “snuck” started sneaking into English (6).
“Lighted” and “Lit”
Sometime after 1800, people began to prefer the irregular verb “lit” to the regular past tense “lighted” (7). “Lit and “lighted” both currently exist as fully acceptable past-tense forms of “to light.” “Snuck” is still considered slightly less than acceptable, but according to the Harvard researchers, 1% of the English-speaking population switches from “sneaked” to “snuck” every year, with the shift being most powerful in America.
Summary
The bottom line is that you either know the irregular verbs because you absorbed them by growing up in an English-speaking country or you have to memorize them, which is a pain. But if you have to memorize them, I hope you at least find it more interesting now that you know you’re digging into the relics of English, and that one reason these irregular verbs still exist is that English learners in the past could remember them.
Mignon Fogarty is Grammar Girl and the author of the new book, Grammar Girl Presents the Ultimate Writing Guide for Students, available online, in bookstores, and this fall, through Scholastic book fairs.
References
1. Lieberman, E. et al. “Quantifying the Evolutionary Dynamics of Language,” Nature, vol. 449, no. 7163, p. 713-716, October 11, 2007. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2460562/ (accessed September 13, 2011).
2. “English Irregular Verbs,” Wikipedia. http://en.wikipedia.org/wiki/English_irregular_verbs (accessed September 13, 2011).
3. McWhorter, J. Our Magnificent Bastard Tongue, Gotham Books, 2008.
4. Stephen Pinker “The Irregular Verbs,” http://pinker.wjh.harvard.edu/articles/media/2000_03_landfall.html (accessed September 13, 2011).
5. Michel, Jean-Baptiste et al. “Quantitative Analysis of Culture Using Millions of Digitized Books,” Science, vol. 331, p. 176-182, January 13, 2011. http://mfi.uchicago.edu/publications/papers/Science_Culturomics.pdf (accessed September 13, 2011).
6. Gellene, D. “How English adds the ‘-ed’” Los Angeles Times, October 11, 2007. http://articles.latimes.com/2007/oct/11/science/sci-verbs11 (accessed September 13, 2011).
7. Yong, E. “The Culture Genome: Google Books Reveals Traces of Fame, Censorship, and Changing Languages,” Discover Magazine. http://blogs.discovermagazine.com/notrocketscience/2010/12/16/the-cultural-genome-google-books-reveals-traces-of-fame-censorship-and-changing-languages/ (accessed September 13, 2011).
Facts about the language
The 20-volume historical Oxford English Dictionary is the largest record of words used in English, past and present. It contains words that are now obsolete or rare (such as xenagogue ‘a person who guides strangers’ and vicine ‘neighboring or adjacent’) in addition to the latest coinages such as phishing and podcast.
The second edition of the OED, published in 1989 and consisting of twenty volumes, contains more than 615,000 entries, and the third, available online, is expanding all the time, with batches of 2,500 new and revised words and phrases being added in regular quarterly updates.
It is a question often asked, but not so easily answered. Even the OED does not set out to include every specialized technical term or slang or dialect expression ever used. New words are constantly being invented, developed from existing words, or adopted from other languages. Most will be used rarely, or only by a small group of people. This means that an unlimited number of words may occur in speech and writing which will never be recorded in even the largest dictionary.
Furthermore, what exactly is a word? Clearly we should include single units such as cat and dog. But are the plurals cats and dogs separate words? Should we include compounds such as walking stick, which are made up of two existing words? There are an almost unlimited number of such two-word compounds, which can’t all be included in a dictionary. And what about abbreviations like BBC and Dr, or proper names such as London, Nelson, and Harry Potter: are they words? As you can see, the question is not a straightforward one.
Although it may be impossible to know the number of words in English, the Oxford English Corpus can help us assess the number of words in current use.
Instead of talking about words, it’s more useful in this context to talk about lemmas, a lemma being the base form of a word. For example, climbs, climbing, and climbed are all examples of the one lemma climb. Just ten different lemmas (the, be, to, of, and, a, in, that, have, and I) account for a remarkable 25% of all the words used in the Oxford English Corpus. If you were to read through the corpus, one word in four (ignoring proper names) would be an example of one of these ten lemmas. Similarly, the 100 most common lemmas account for 50% of the corpus, and the 1,000 most common lemmas account for 75%. But to account for 90% of the corpus you would need a vocabulary of 7,000 lemmas, and to get to 95% the figure would be around 50,000 lemmas.
The remaining 5% of the corpus consists of a very large number of lemmas which occur rarely: words like moidore or parados, which may occur only once every several million words. Like all natural languages, English consists of a small number of very common words, a larger number of intermediate ones, and then an indefinitely long ‘tail’ of very rare terms.
| Vocabulary size (no. of lemmas) | % of content in OEC | Example lemmas |
|---|---|---|
| 10 | 25% | the, of, and, to, that, have |
| 100 | 50% | from, because, go, me, our, well, way |
| 1000 | 75% | girl, win, decide, huge, difficult, series |
| 7000 | 90% | tackle, peak, crude, purely, dude, modest |
| 50,000 | 95% | saboteur, autocracy, calyx, conformist |
| >1,000,000 | 99% | laggardly, endobenthic, pomological |
The long tail means that to account for 99% of the Oxford English Corpus you would need a vocabulary of more than a million lemmas. This would include some words which may occur only once or twice in the whole corpus: highly technical terms like chrondrogenesis or dicarboxylate, and one-off coinages like bootlickingly or unsurfworthy that people would probably understand but would be unlikely to use.
If we decide that around 90-95% of the corpus gives a reasonable idea of an average vocabulary, we are left with a figure somewhere in the range of 7,000-50,000 lemmas: say, 25,000. What does a vocabulary of this size represent? It represents the set of most significant words in English: those which occur reasonably frequently and which account for all but a small part of everything we may encounter in speech or writing. It includes all the words that we actively use in general everyday life.
It’s interesting to note that most reasonably sized dictionaries contain significantly more than 25,000 lemmas.The 11th edition of the Concise Oxford English Dictionary, for example, lists more than 75,000 single-word lemmas, which means that the majority of its entries must belong to the long tail of extremely rare words. This makes good sense: such terms occur very infrequently, but when they do they are likely to be crucial to what’s being said, and the reader might well want to look them up.The idea of a quantifiable vocabulary should be seen in this light: the words we ignore for the purposes of the exercise may be very rare, but in context they may be very important.
Shtick, Pavilion and other great words
By Dr Sima Barmania via The Independent
On Tuesday, those short listed for the prestigious Man Booker prize gathered in West London with the great and good of the literary establishment. As discussions ensued regarding the merit of the Judges’ decision, Howard Jacobson, winner of the 2010 prize, was succinct in highlighting that the main prerequisite is that they only be “good writers.”
Last week, Jacobson, who studied English at Cambridge, wrote in The Independent that he simply “wanted to make sentences, not win prizes” Stating that the sentences “were prize enough in themselves.”
Jacobson does indeed create undisputedly great sentences, but also, noticeably, uses great words; for example “shtick,” a word I had never previously encountered, or perhaps never noticed. Thus, not only has the reader been introduced to a beautiful sentence, but also a charming word.
The message “you don’t have to be a writer to appreciate words” is something that the founder of wordsmith.org Anu Garg’s 1 million followers, of which I am one, already take heed of, he tells me.
Each day, Garg e-mails his community of subscribers who reside in over 200 countries, a word of the day, together with the etymology and an example of the word’s use in context.
He provides this service free of charge and has been praised by the New York Times as “the most welcomed most enduring piece of daily mass e-mail in cyberspace.”
Perhaps, what is most fascinating is Garg’s background. He is an immigrant born and raised in Uttar Pradesh in Northern India who later moved to America. He did not begin learning English until the age of 9, in 6th grade, where growing up he learnt the British version and after moving to the United States switched to American English.
Garg, despite his lexicon of words, still considers himself a “lifelong student of the English Language.” Communicating with Garg, I was keen to understand how his Indian roots had influenced, both him and others.
“In India there’s huge competition to succeed. People realize that good education is the key that opens doors to a better life.” This has resulted in India’s growing literacy levels and a burgeoning book market.
I asked Garg of how best to encourage young people, from all social classes in the United Kingdom, to take an interest in words. He replied, rather endearingly that it would “help if we shared the etymology. Once you see words as living beings: words are born, they grow and change, and sometimes they die out, they become much more than just words”.
To explain further:
“When you see that ‘pavilion’ is like a butterfly spreading its wings (from Latin papilio: butterfly) it’s easy to fall in love with words.”
It is apparent that Garg is more than just a purveyor of words; he is fervently enthusiastic and genuinely enthralled by them.
Furthermore, he does not underestimate the significance of words, but rather acknowledges them as valuable tools: “Words are the universal currency of humankind. The better we are with them, the better we can be in anything we do. With the right words we can do what money or power can’t.”
Some, like Jacobsen may prize great sentences, but others, myself included, are simply pleased with a great word.
Singapore’s language battle: American vs ‘the Queen’s English’
By reddotrevolver via Asian Corresponent
Known as a country in Southeast Asia with a highly educated workforce, Singapore is also one of the only countries in the region that uses English as a working language, and as a medium of instruction in schools. The ease of communication has established the country as the headquarters in Asia for many multinational companies.
A report by the Educational Testing Services (ETS) based on data from Jan-Dec 2010 shows that Singapore came in third in TOEFL (The Test of English as a Foreign Language) scores out of 163 countries. It is the only Asian country in the top three.
However, students in Singapore are taught in British English, or ‘the Queen’s English’, since elementary school. To Singapore’s former Prime Minister, Lee Kuan Yew, this poses a serious and imminent challenge.

According to Channel NewsAsia, Lee said:
“There is an intense worldwide competition for talent, especially for English-speaking skilled professionals, managers and executives. Our English-speaking environment is one reason why Singapore has managed to attract a number of these talented individuals to complement our own talent pool.
“They find it easy to work and live in Singapore, and remain plugged into the global economy. Singapore is a popular educational choice for many young Asians who want to learn English, and they get a quality education. This has kept our city vibrant.”
Mr Lee said one of the challenges ahead is to decide whether to adopt British English or American English.
He said: “I think the increasing dominance of the American media means that increasingly our people, teachers and students will be hearing the American version, whether it is ‘potatoes’ or ‘tomatoes’. They will be the dominant force through sheer numbers and the dominance of their economy.
“I believe we will be exposed more and more to American English and so it might be as well to accept it as inevitable and to teach our students to recognise and maybe, to even speak American English.”
Lee added that “communication skills” will be one of the most valuable qualities to possess in the twenty-first century.
Be it fashion, music, food or movies, American popular culture has had a pervasive influence on Singaporean society, as the adoption of the American slang has made its way to the lexicon of Singapore English. However, in official documents, and even in text messages, British spelling is used. Yet, a good command of English is a good command of English, regardless of whether it is written in British spelling or spoken with an American accent. Perhaps American investors will appreciate an American accent when speaking to Singaporean businessmen, but Americans have long done business with their British counterparts who have thick British accents. To be able to be understood by the other party is still what remains the most imperative.
Are There Hidden Messages in Pronouns?
By Juliet Lapidos via Slate
Some 110 years after the publication of the Psychopathology of Everyday Life, in which Sigmund Freud analyzed seemingly trivial slips of the tongue, it’s become common knowledge that we disclose more about ourselves in conversation—about our true feelings, or our unconscious feelings—than we strictly intend. Freud focused on errors, but correct sentences can betray us, too. We all have our signature tics. We may describe boring people as “nice” or those we dislike as “weird.” We may use archaisms if we’re trying to seem smart, or slang if we’d prefer to seem cool. Every time we open our mouths we send out coded, supplementary messages about our frame of mind.
Although much of this information is easy to decode (“nice” for “boring” won’t fool anyone), linguistic psychologist James Pennebaker suggests in The Secret Life of Pronouns that lots of data remain hidden from even the most astute human observers. “Nice” and “weird” are both content words; he’s concerned with function words such as pronouns (I, you, they), articles (a, an, the), prepositions (to, for, of), and auxiliary verbs (is, am, have). We hardly notice these bolts of speech because we encounter them so frequently. With the help of computer programs to count and scrutinize them, however, patterns emerge.
Sounds enticing; sounds, in fact, rather like a publisher’s fantasy pitch, combining the strangely long-lasting craze for language books laced with pop psychology, and the added hook, the modern touch, of a computer that observes and catalogs beyond measly human capacity: a Watson for the psychiatric establishment. To Pennebaker’s credit, his claims are fairly modest, especially when compared with those of Deborah Tannen and other practitioners of the word-sleuth genre. (He doesn’t promise that if we change our pronoun usage we’ll see tangible improvements in our social lives.) The problem is that much of what he turns up is even more modest than he seems to notice. Counting function words as they’re used in ordinary life often yields the opposite of what Freud detected in confessions from the couch: confirmation of the obvious.
The most ingenious application Pennebaker proposes for function-word analysis is lie-detection, something of a dark art. Several years ago, Pennebaker and a couple of colleagues recruited 200 students and asked them to write two essays about abortion, one espousing a true belief, the other a falsehood. They asked another group to state their true and false takes in front of a video camera. When judges were called in to figure out which was which, they were accurate 52 percent of the time. (50 percent is chance.) A computer, programmed to look for specific “markers of honesty” gleaned from previous studies, performed much better, with a 67 percent accuracy rate. Truth-tellers, Pennebaker explains, tend to use more words, bigger words, more complex sentences, more exclusive words (except, but, without, as in the sentence “I think this but not that”), and more I-words (I, me, my, etc.). Liars, apparently, trade in simple, straightforward statements lacking in specificity because—Pennebaker posits—it’s actually pretty difficult to make stuff up. They avoid self-reference because they don’t feel ownership of their expressed views.
When Pennebaker dips into the more general field of “emotion detection” (he calls it that), his word-counting feels a bit Rube Goldberg-ish. After Sept. 11, 2001, Pennebaker and a colleague saved the LiveJournal.com postings of over a thousand amateur bloggers. They found that “bloggers immediately dropped in their use of I-words” following the attacks, and that their use of we-words almost doubled. Pennebaker takes these fluctuations to mean that “shared traumas bring people together,” “shared traumas deflect attention away from the self,” and that “shared traumas, in many ways, are positive experiences” (because people feel more socially connected). The brute fact that Sept. 11 influenced pronoun usage may interest readers, but Pennebaker’s analysis merely reiterates long-held psychological dogma. (Try Googling “shared traumas bring people together.”) I can’t help but wonder if Pennebaker—albeit unconsciously—interpreted his results to match the conventional wisdom.
Perhaps that’s harsh: Certainly there’s nothing wrong with devising yet another way to elucidate common human responses, and Pennebaker’s experiments are always imaginative. Yet it’s often the case that his conclusions, especially the ones he draws from I-word usage, are heavily dependent on context and prior knowledge.
In one chapter, Pennebaker notes that Rudolph Giuliani demonstrated a dramatic increase in I-words during the late spring of 2000, when he was still mayor of New York. Pennebaker fills us in that “Giuliani’s life [was] turned upside down. … He was diagnosed with prostate cancer, withdrew from the senate race against Hillary Clinton, separated from his wife on national television … and, a few days later, acknowledged his ’special friendship’ with Judith Nathan.” Pennebaker adds that “by early June, friends, acquaintances, old enemies, and members of the press all noticed that Giuliani seemed more genuine, humble, and warm.” So it’s reasonable to conclude that Giuliani’s ascending I-word usage reflected a “personality switch from cold and distanced to someone who [due to a few significant setbacks] was more warm and immediate.”
But we already knew that. If we didn’t, where would Pennebaker’s method leave us? He argues, at various points, that the following groups use I-words at higher rates:
1. Women
2. Followers (not leaders)
3. Truth-tellers (not liars)
4. Young
5. Poor
6. Depressed
7. Afraid (but not angry)
8. Sick
The common thread unifying these seemingly random clusters is, roughly, an enhanced focus on personal experience. Sick and depressed people dwell on their conditions and are thus more likely than their healthy counterparts to talk about themselves. Followers, in conversation with leaders, might be after something: “I was wondering if I could have a raise.” That’s pretty close to a tautology, though, and does nothing to solve the problem that, without insider information, it’s impossible to know which condition or attribute I-usage reflects. A word-count-wannabe presented with Giuliani’s speeches might deduce, erroneously, that the mayor had become more truthful, or less leaderly, or had lost money.
For obvious reasons, I’m unusually attuned to my pronoun usage at the moment, and I’ve noticed a thing or two. I start off this essay with lots of we-words (16 in the introduction), and sprinkle them throughout. With the exception of the section you’re currently reading, I drop only one self-referencing I (in the fifth paragraph). I don’t deny that this imbalance might mean something. Perhaps it indicates that, like politicians who drone on about what “we” expect from the president, or how “we” want a return to old-fashioned American values, I’m trying to imply audience agreement when, in truth, I have no clue what the audience thinks. But you don’t need to count pronouns to figure that out. You only need to know that you’re reading a book review.
What makes slang stick?
By Juliet Lapidos via Slate
Feeling nostalgic for a journalistic era I never experienced, I recently read Tom Wolfe’s 1968 The Electric Kool-Aid Acid Test. I’d been warned that the New Journalists slathered their prose with slang, so I wasn’t shocked to find nonstandard English on nearly every line: dig, trippy, groovy, grok, heads, hip, mysto and, of course, cool. This psychedelic time capsule led me to wonder about the relative stickiness of all these words—the omnipresence of cool versus the datedness of groovy and the dweeb cachet of grok, a Robert Heinlein coinage from Stranger in a Strange Land literally signifying to drink but implying profound understanding. Mysto, an abbreviation for mystical, seems to have fallen into disuse. It doesn’t even have an Urban Dictionary entry.
There’s no grand unified theory for why some slang terms live and others die. In fact, it’s even worse than that: The very definition of slang is tenuous and clunky. Writing for the journal American Speech, Bethany Dumas and Jonathan Lighter argued in 1978 that slang must meet at least two of the following criteria: It lowers “the dignity of formal or serious speech or writing,” it implies that the user is savvy (he knows what the word means, and knows people who know what it means), it sounds taboo in ordinary discourse (as in with adults or your superiors), and it replaces a conventional synonym. This characterization seems to open the door to words that most would not recognize as slang, including like in the quotative sense: “I was like … and he was like.” It replaces a conventional synonym (said), and certainly lowers seriousness, but is probably better categorized as a tic.
At least it’s widely agreed that young people, seeking to make a mark, are especially prone to generating such dignity-reducing terms. (The editor of The New Partridge Dictionary of Slang and Unconventional English, Tom Dalzell, told me that “every generation comes up with a new word for a marijuana cigarette.”) Oppressed people, criminals, and sports fans make significant contributions, too. There’s also a consensus that most slang, like mysto, is ephemeral. Connie Eble, a linguist at the University of North Carolina, has been collecting slang from her students since the early 1970s. (She asks them to write down terms heard around campus.) In 1996, when she reviewed all the submissions she’d received, she found that more than half were only turned in once. While many words made it from one year to the next, only a tiny minority lasted a decade.
When asked for an example of an expression that fizzled out quickly, Eble cited “a dangling modifier,” meaning a single earring. (As in, “you know that dude with the skateboard, the one with the dangling modifier?”) Eble guesses that “dangling modifier” didn’t survive because it was too clever. She also recalled that, in the 1970s and 1980s, she encountered a slew of drunkenness-related phrases that were similarly too complex, such as a pair of terms for vomiting into a toilet, “drive the porcelain bus” and “talk to Ralph on the big white phone.” (I’ve heard that last one, actually, but from a friend who’s fond of sounding odd.)
In that same 1996 review, Eble found that the 40 most frequently submitted slang words could often be classified as judgments of acceptance or rejection. There were several synonyms for excellent, including sweet, killer, bad, cool, and awesome. Conversely, she noted a few expressions meaning a “socially inept person:” dweeb, geek, turkey. Another positive indicator is brevity. Eble said short words fare well (cool, bad, sweet, geek), and that oohs and other back-of-the-mouth noises tend to crop up (cool, tool, groove, booze).
For a slang term to really succeed, it also helps to have influential proponents. Michael Adams, the editor of American Speech, reminded me of a recurring joke in Mean Girls: Gretchen wants to introduce fetch as slang (to mean, pretty much, awesome), but clique leader Regina won’t have it. “Stop trying to make ‘fetch’ happen,” she says, “It’s not going to happen.”
And it doesn’t happen, because Gretchen’s not the kind of girl who inspires imitation. If, however, someone with real social pull starts using a word, or if it’s thrown around approvingly in a film, it’s given a boost: Clueless helped disseminate whatever.
Even if it has a famous supporter, though, a slang word’s long-term survival is more the exception than the rule. Mysto, for one, died out swiftly despite being a short, easily understood word that was evidently tossed around by the Merry Pranksters before getting recorded by Tom Wolfe.
Once a word gets to the level of general understanding, it’s still subject to caprice. Groovy, which dates back to the 1930s, became fashionable in the 1940s, then unfashionable, then fashionable again in the 1960s. Now everyone knows what it means, but if you use it you either have long, gray hair and wear tie-dye or you’re mocking the sort of people who have long, gray hair and wear tie-dye. Groovy got stuck, and though it’s possible that it’ll make a comeback, for now it feels coupled to a particular time. Yet cool in the excellent sense—popularized by jazz musicians in the 1940s—isn’t tainted in this way. No one says cool with the expectation that Charlie Parker will come to mind.
Perhaps cool has been more durable than groovy because it’s an ordinary word in addition to a slang word. It’s unobtrusive, which Adams also mentioned as a positive indicator of slang tenacity. Maybe the gr and vee sounds in groovy, which are rather harsh, are what keep it from seeming natural, and association-less, in conversation.
The only way to test that these theories are more than post-facto justifications is to apply them to newish slang words. Scrolling through newly added Urban Dictionary entries, I came across: La Slosha (“A woman whose awesomeness and attractiveness is only surpassed by her ability to consume copious quantities of vodka coke,” added Aug, 8); Txtnesia (“When you forget what you texted someone last,” added July 31); and Boones (“One or more hipsters that are idiotic and talk in hipster slang,” added Aug, 2). It seems to me that Boones has the best chance to survive: It’s short, contains an ooh, expresses a social judgment, and isn’t too complicated. It also strikes me as rather useful. Let’s see what happens.
In search of language’s missing link
By David Robson via New Scientist
Through the looking glass, Lewis Carroll’s Alice stumbles upon an enormous egg-shaped figure celebrating his un-birthday. She tries to introduce herself:
“It’s a stupid name enough!” Humpty Dumpty interrupted impatiently. “What does it mean?”
“Must a name mean something?” Alice asked doubtfully.
“Of course it must,” Humpty Dumpty said with a short laugh: “My name means the shape I am – and a good handsome shape it is, too. With a name like yours, you might be any shape, almost.”
PURE whimsy, you might think. Nearly 100 years of linguistics research has been based on the assumption that words are just collections of sounds – an agreed acoustic representation that has little to do with their actual meaning. There should be nothing in nonsense words such as “Humpty Dumpty” that would give away the character’s egg-like figure, any more than someone with no knowledge of English could be expected to infer that the word “rose” represents a sweet-smelling flower.
Yet a spate of recent studies challenge this idea. They suggest that we seem instinctively to link certain sounds with particular sensory perceptions. Some words really do evoke Humpty’s “handsome” rotundity. Others might bring to mind a spiky appearance, a bitter taste, or a sense of swift movement. And when you know where to look, these patterns crop up surprisingly often, allowing a monoglot English speaker to understand more Swahili or Japanese than you might imagine (see “Which sounds bigger?” at the bottom of this article). These cross-sensory connections may even open a window onto the first words ever uttered by our ancestors, giving us a glimpse of the earliest language and how it emerged.
More than 2000 years before Carroll suggested words might have some inherent meaning, Plato recorded a dialogue between two of Socrates’s friends, Cratylus and Hermogenes. Hermogenes argued that language is arbitrary and the words people use are purely a matter of convention. Cratylus, like Humpty Dumpty, believed words inherently reflect their meaning – although he seems to have found his insights into language disillusioning: Aristotle says Cratylus eventually became so disenchanted that he gave up speaking entirely.
The Greek philosophers never resolved the issue, but two millennia later the Swiss linguist Ferdinand de Saussure seemed to have done so. In the 1910s, using an approach based in part on a comparison of different languages, he set out a strong case for the arbitrariness of language. Consider, for instance, the differences between “ox” and “boeuf”, the English and French words for the same animal. With few similarities between these and other such terms, it seemed clear to Saussure that the sounds of words do not inherently reflect their meanings.
The world of linguistics was mostly convinced, but a few people still challenged the status quo. While the German psychologist Wolfgang Kohler was staying in Tenerife, he presented subjects with line drawings of two meaningless shapes – one spiky, the other curved – and asked them to label the pictures either “takete” or “baluba”. Most people chose takete for the spiky shape and baluba for the curvy one. Though Kohler didn’t say why this might be, the observation strongly suggested that some words really might fit the things they describe better than others. His work, first published in 1929, did not attract much attention, and though others returned to the subject every now and then, their findings were not taken seriously by the mainstream. “They were considered a curiosity and never properly explored,” says Gabriella Vigliocco, professor of the psychology of language at University College London.
The turning point came in 2001, when Vilayanur S. Ramachandran and Edward Hubbard, both then at the University of California, San Diego, published their investigations into a condition known as synaesthesia, in which people seem to blend sensory experiences, including certain sounds and certain images (Journal of Consciousness Studies, vol 8, p 3). As many as 1 in 20 people have this condition, but Ramachandran suspected that cross-sensory connections are in fact a feature of the human brain, so that in practice we all experience synaesthesia at least to a limited extent. To explore this idea, he and Hubbard revisited Kohler’s experiment to find out whether average people, and not just synaesthetes, might automatically link two different sensations.
Using similar shapes to those in the original experiment, but changing the names of the invented terms slightly, they found that an astonishing 95 per cent of people labelled the spiky object as “kiki” and the curvy one as “bouba”. One possible explanation is that this might be down to the shapes of the lips as we form the vowels in these words; in “bouba” they are more curved than in “kiki”.
The work turned out to be hugely influential, helping sound symbolism to finally get off the ground as numerous studies explored the kiki/bouba phenomenon. Chris Westbury at the University of Alberta in Edmonton, Canada, for instance, has shown that consonants as well as vowels may elicit a similar association. He found that continuants, like “m” sounds, are associated with curvy shapes, while “plosives” that break up the airflow to make a more jarring sound – like a “k” – are considered spikier (Brain and Language, vol 93, p 10).
With the renaissance of the idea that the sound of a word could be linked to some kind of inherent meaning, the obvious next step was to investigate whether sound symbolism extends beyond this one intriguing example.
Cross-sensory connections
Building on the idea that certain words might elicit cross-sensory connections in our brain, a team at the University of Edinburgh, UK, decided to explore the links between sounds and tastes. Christine Cuskley, Simon Kirby and Julia Simner dropped bitter, sweet, salty and sour drops of solution into their subjects’ mouths. Then they asked them to manipulate a computer synthesiser to produce different kinds of vowel sounds that seemed to best match the taste on their tongues. The results were not random. Sweet tastes were associated with high vowel sounds, in which the tongue is placed nearer to the roof of the mouth, and back vowels, where the tongue is placed towards the throat rather than the lips. The “oo” in boot demonstrates both of these traits. Low, front vowel sounds, meanwhile – something like the “a” in “cat” has these qualities – were associated with sour tastes (Perception, vol 39, p 553).
Others have been looking for evidence of sound symbolism in everyday speech. Although examples of onomatopoeia – words truly formed from a sound associated with what is named – are rare, it is possible that more subtle instances of sound symbolism have been lurking, almost literally, right under our noses. English words that begin with “sn” are often associated with our organ of olfaction: think “snout”, “sniff”, “snot”, “snore” and “snorkel”. Sceptics had argued that these “phonaesthemes” are pure coincidence, but research by Benjamin Bergen at the University of California, San Diego, suggests otherwise. He found that the brain processes meanings of pairs of phonaesthemes such as “snore” and “sniff” more quickly than other pairs related simply by their meaning (such as “cord” and “rope”) or their sounds (such as “druid” and “drip”). That is exactly what you would expect if olfaction and the “sn” sound are somehow linked in the brain, says Bergen.
That’s not all. At a recent workshop on sound symbolism in Atlanta, Georgia, he reported that “wh” words associated with words that describe the production of noises such as “whisper”, “whine” or “whirr”, and those beginning with “fl” that tend to signal movement in the air, such as “fly” or “flail”, also enjoyed this fast track in the brain’s processing. Bergen concludes that these may all be forms of sound symbolism.
Indeed, it now looks as if sound symbolism may be present in many languages. Japanese, for example, contains a large grammatical group called “mimetic” words, which by definition are particularly evocative of sensual experiences. Gorogoro roughly translates as “large object rolling”, while nurunuru is meant to evoke the feel of a slimy substance. “If you ask a speaker of Japanese, they will say they evoke an image of an expression,” says Sotaro Kita at the University of Birmingham, UK. He is convinced that this group of words contain some sort of sound symbolism, having discovered that both Japanese and English-speaking children learn made-up mimetic verbs more quickly when they follow the sound-meaning associations found in Japanese than when they contravene them (Cognitive Science, vol 35, p 575).
Suspecting that sound symbolism might also help adults to understand a foreign tongue, Lynne Nygaard at Emory University in Atlanta, Georgia, recently presented English speakers with pairs of antonyms (such as fast/slow) recorded in 10 different languages – including Albanian, Dutch, Gujarati, Mandarin and Yoruba. When given the corresponding pair of English words, and asked to match the foreign words to them, subjects performed better than they would by chance – suggesting the words’ sounds must give clues to their meaning.
What could these clues be? A subsequent analysis hinted at some answers. Words that indicate general movement tend to have more vowels, for instance, and they are more likely to have glottal consonants (the “h” in “behind”, for example). Sounds might also reflect the speed of movement: slow movement tends to be represented by sonorant sounds such as “l” or “w”, whereas explosive obstruents produced from a blocked airway, such as “ch” or “f”, are suggestive of more rapid speeds. Nygaard presented her work at the Atlanta workshop.
Bringing all the evidence together, there seems to be a strong case for saying that sound symbolism does occur in human language. However, some big questions remain. How common are words that elicit cross-sensory connections in modern languages? “Maybe they represent just small pockets of vocabulary,” says Morten Christiansen at Cornell University, in Ithaca, New York.
Then there’s the question of why we link certain sounds to certain shapes, flavours and styles of movement. The inherently nasal quality of the “sn” sound might explain why we sneeze and snore, but most attempts to explain many other examples are just stabs in the dark. Investigations into the cross-sensory connections of full-blown synaesthesia may well shed light on this.
Finally, is sound symbolism universal, perhaps even innate? Tests showing that the patterns are recognised by young children, and by people across cultures, suggest that is a possibility, but more work needs to be done before it can be taken for granted.
Nevertheless, these questions have not stopped researchers exploring the potential implications of their findings. As Kita’s and Nygaard’s work suggests, sound symbolism could at the very least explain why some words stick in our mind better than others – a fact confirmed in a string of studies by Susan Parault, then at the University of Maryland in College Park, which showed that children across a range of ages are better able to learn unfamiliar words if they are sound-symbolic.
Advertisers and marketing executives may begin to see dollar signs in these insights. For example, Charles Spence at the University of Oxford, who has investigated the multi-sensory experience of chocolate, hopes to help confectioners alter their brand names to reflect the taste of the products. Others have looked at whether the names of cancer drugs might affect patients’ perceptions of them (Social Science and Medicine, vol 66, p 1863).
Most intriguingly, sound symbolism might shed light on the origins of language. It appears to revive a popular 18th-century idea called the “bow-wow” theory, which proposes that humankind’s first words were onomatopoeic, mimicking sounds in our ancestors’ environment. The idea seems plausible until you try to explain how humans ever came to describe silent concepts – the appearance of a cave, for example. This is why it fell out of favour following Saussure’s persuasive work. But later theories fail to explain how an initially dumb primate could have evolved a complex, arbitrary system of communication with no obvious stepping stones in between.
While there’s good reason to believe that humans first developed the neural toolkit for language through hand gestures, for example, how did we make the transition from gesture to the spoken word? Ramachandran and Hubbard propose that sound symbolism provided the stepping stone. If the angular sounds of “kiki” seem to fit a distinctively jagged rock, for example, the word might have emerged as obvious shorthand. Sound symbolism “helped to get the first words off the ground”, says Hubbard.
Bow-wow words
Not everyone is convinced. Christiansen, for instance, accepts this revised bow-wow theory is plausible. “But we can’t prove it either way,” he says. Others are more positive. It’s very speculative, but it is a possibility, Vigliocco says. “Manual gestures seem like an obvious way [to imitate], but vocal imitation is possible as well, from imitating the shape of an object with the shape of the mouth, to imitating the size of an object by adjusting the length of the vocal tract.”
The beauty of the idea, says Cuskley, is that it helps to solve one of the most exacting problems facing any evolutionary theory of language: how did the ancestral genius who invented the first words get others to understand their meanings so that language could spread? Sound symbolism would have made these first words stick in the mind, and from these simple symbolic sounds our ancestors could have started to build a larger vocabulary. Eventually, the need to describe a greater number of ideas pushed humans to develop more arbitrary terms until they finally developed the complex language systems we use today.
The implication, according to Kita, is that the sound-symbolic relations we see in today’s languages may be remnants of those very first words – a kind of Rosetta stone that helps bridge the gulf to our earliest languages. That is a profound claim, since most attempts to chronicle ancient languages fail at just a few thousand years BC. These cross-sensory connections, on the other hand, give us a glimpse of tens of thousands of years ago, at humanity’s dawn. “They are fossils from our ancestors’ language,” Kita says.
It is intriguing to think that if faced with the first humans ever to use language, we might have at least some common ground to share our thoughts. Now there’s an adventure worthy of Lewis Carroll’s Alice.
Why It’s Smart to Be Bilingual
By Casey Schwartz via Newsweek

Iain Masterton / Getty Images
On a sweltering August morning, in a classroom overlooking New York’s Hudson River, a group of 3-year-olds are rolling sticky rice balls in chocolate sprinkles, as a teacher guides them completely in Mandarin.
This is just one toddler learning game at the total–immersion language summer camp run by the primary school Bilingual Buds, which offers a year-round curriculum in Mandarin as well as Spanish (at a New Jersey campus) for kids as young as 2.
Bilingualism, of course, can be a leg up for college admission and a résumé burnisher. But a growing body of research now offers a further rationale: the regular, high-level use of more than one language may actually improve early brain development.
According to several different studies, command of two or more languages bolsters the ability to focus in the face of distraction, decide between competing alternatives, and disregard irrelevant information. These essential skills are grouped together, known in brain terms as “executive function.” The research suggests they develop ahead of time in bilingual children, and are already evident in kids as young as 3 or 4.
While no one has yet identified the exact mechanism by which bilingualism boosts brain development, the advantage likely stems from the bilingual’s need to continually select the right language for a given situation. According to Ellen Bialystok, a professor at York University in Toronto and a leading researcher in the field, this constant selecting process is strenuous exercise for the brain and involves processes beyond those required for monolingual speech, resulting in an extra stash of mental acuity, or, in Bialy-stok’s terms, a “cognitive reserve.”
Bilingual education, commonplace in many countries, is a growing trend across the United States, with 440 elementary schools (up from virtually none in 1970) offering immersion study in Spanish, Mandarin, and French, in that order of popularity.
For parents whose toddlers can’t read Tolstoy in the original Russian, the research does offer some comfort: Tamar Gollan, a professor at University of California, San Diego, has found a vocabulary gap between children who speak only one language and those who grow up with more. On average, the more languages spoken, the smaller the vocabulary in each one. Gollan’s research suggests that while that gap narrows as children grow, it does not close completely.
The rule of thumb for improving in any language is simple practice. “The more you use it, the better off you are,” Gollan says. “Vocabulary tests, SATs, GREs—those are tests that probe the absolute limits of your ability, and that’s where we find that bilinguals have the disadvantage, where you know the word but you just can’t get it out.”
Gollan believes this deficit can be compensated for with extra study. A more complicated question is how and whether bilingualism may interact with other cognitive issues that can appear in early childhood, specifically attention disorders, says Bialystok. Because attention-deficit/hyperactivity disorder (ADHD) is linked to compromised executive functioning, it is unclear what impact learning a second language—which calls upon exactly these executive skills—might have on children with this condition. Research on this question is underway.
Some of the most valuable mental perks of bilingualism can’t be measured at all, of course. To speak more than one language is to inherit a global consciousness that opens the mind to more than one culture or way of life.
Bilinguals also appear to be better at learning new languages than monolinguals. London-based writer Clarisse Lehmann spent her early childhood in Switzerland speaking French. At 6, she learned English. Later she learned Spanish, German, and, during three years spent living in Tokyo, Japanese.
“There’s a witty humor in English that has a different sensibility in French,” she says. “And in Japanese, there’s no sarcasm. When I tried, it would be ‘We don’t understand what you’re trying to say.’?”
With five languages under her belt—and a working familiarity with Latin and Greek as well—Lehmann finally considers herself sufficiently multilingual. “Enough, enough!” she says. “I don’t want to learn any more languages.”
That ugly Americanism? It may well be British.
By Dennis baron via The Web of Language
Matthew Engel is a British journalist who doesn’t like Americanisms. The Financial Times columnist told BBC listeners that American English is an unstoppable force whose vile, ugly, and pointless new usages are invading England “in battalions.” He warned readers of his regular FT column that American imports like truck, apartment, and movies are well on their way to ousting native lorries, flats, and films.
Engel’s tirade against the American “faze, hospitalise, heads-up, rookie, listen up” and “park up” got several million page views (or page impressions, as the Brits seem to call them), along with thousands of comments, when it appeared in the BBC News Magazine.
But like many critics of Americanisms, Engel got some of his facts wrong. True, the U.S. may be influencing the spread of English as a world language today, but it was British imperialism, not American, that set English on the path to world domination.
Plus, a few of the words Engel complains about aren’t even Americanisms. The first OED citations for hospitalize (so spelled), heads up, and rookie are British, not American, and if the OED and Google are any indication, “park-up,” unheard of stateside, seems to be solely a Briticism, an unnecessary alternative to simply saying park, as in “Park-Up.com finds the cheapest parking for you” in London and Brighton.
As is fitting for a Financial Times writer, Engel acknowledges that there’s a kind of linguistic marketplace where languages trade words the same way that their speakers trade goods and services, but he sees the balance of trade as seriously tipped in favor of the ugly Americanism.
Engel also praises the English for encouraging “the diversity offered by Welsh and Gaelic—even Cornish is making a comeback.” But the status of Welsh and Gaelic is still bitterly contested, not just in England but also in Wales and the Six Counties, and although some Brits grudgingly accept diversity when it’s home-grown rather than imported, Prime Minister David Cameron’s recent assertion that British multiculturalism has failed and the new government policy that requires immigrants to prove they can speak English before coming to England reveal, not a sense that English is a big tent with room for all, but a growing strain of linguistic nativism.
It should surprise no one that the Brits have been complaining about Americanisms since they first came to America. The word Americanism was actually coined in 1781 by John Witherspoon, a Scot who relocated to New Jersey and became the first president of Princeton.
Americanism, the word coined by John Witherspoon in 1781, was one of the first Americanisms. According to Witherspoon, even “persons of rank and education” used Americanisms.
Witherspoon intended his new word to be neutral: an Americanism was simply “an use of phrases or terms, or a construction of sentences” that differed from British usage. He coined it on the analogy of Scotticism, a term of insult that goes back to the 17th century. Witherspoon tried to treat Scotticism as a neutral term as well, though as he did so he acknowledged that “the Scottish manner of speaking came to be considered as provincial barbarism; which, therefore, all scholars are now at the utmost pains to avoid.”
Witherspoon notes that Scotticism shouldn’t be a negative term, though he also says that Scotticisms have come to be considered barbarous, even by the Scots.
Some of Witherspoon’s best friends were Americans, and he saw that in light of American independence, and in the course of time, American English could be expected to diverge from the language of England and develop its own standards. But while he waited for this to happen, Witherspoon found many American errors and improprieties to complain about.
Witherspoon published his essay on the errors and improprieties of Americanisms in the Pennsylvania Journal on May 9, 1781. Some 19th-century usage critics pointed out that the word journal literally refers to a “daily” publication. The Pennsylvania Journal in which Witherspoon attacked Americanisms was a weekly.
Witherspoon, like Engel, objected to a number of so-called Americanisms that turned out to be British:
- He disliked the American use of notify: “In English we do not notify the person of the thing, but notify the thing to the person.” But notify had been used in the transitive sense since the mid-1400s, long before the New World was a glimmer in Sir Francis Drake’s eye.
- Witherspoon found the American phrase “fellow countryman” to be a tautology, though the “tautology” was in use in Engand well before the Pilgrims landed at Plymouth Rock.
- Witherspoon objected to the way Americans used certain with a proper name, as in “A certain Thomas Benson,” because certain is indefinite while the name is very specific. But this use was certainly English before it was American (OED, s.v., II.7.f).
- Witherspoon seemed to think that Americans used clever only in a positive sense, while the Brits used it simply to indicate the ability to do something, whether that something was good, bad, or indifferent. But his contemporary, Alexander Pope, was one of many Old World writers to use clever positively to mean ‘nice, likeable, convenient, or agreeable.’
- And Witherspoon joins the general 18th-century chorus decrying those who use mad for ‘angry’ instead of what he thinks it should mean, ‘rabid, crazy, nutso, bonkers.’ It’s a classic American error that Witherspoon finds particularly maddening, although it’s not particularly American: according to the OED, mad was first used to mean ‘angry’ in the 1300s, and after the decline of Middle English wroth it became “the ordinary term for ‘feeling anger’ in many dialects in Great Britain (and later in North America).”
Witherspoon’s 1781 essay on Americanisms sparked a long tradition of attacks on Americanisms, mostly by Brits (and of course the French). Some of the expressions in question did turn out to be American. And certainly some of them were ugly, or illogical, or redundant. But many of them were British. Much language is ugly, illogical, and redundant; and not all objectionable phrases are American.
Just as Witherspoon’s essay on Americanisms struck a chord, Engel’s critique prompted other like-minded Brits to bombard the BBC with examples of unwanted and pernicious Americanisms, and the Beeb published a follow-up list of the 50 most maddening ones. But there were objections to Engel as well, not all of them from Americans. To his critics Engel replied that he had been treated more civilly and sensibly by members of the National Rifle Association after a column on gun control than by “this lot,” his dismissive characterization of the linguists and lexicographers who questioned his under-researched data and his off-the-wall conclusions.
As a parting shot, Engel warned that the English love of Americanisms, if unrestrained, will lead to “51st statehood,” and he counseled his fellow countrymen to maintain “the integrity of our own gloriously nuanced, subtle and supple version—the original version—of the English language.”
But as English grows as a world language, more and more speakers of English aren’t native speakers of English and don’t live in English-speaking countries. As a result, the importance of the “original version” of the language will continue to decline, and the question of who owns English will ultimately become irrelevant.
But until that happens, and speaking as Engel did of 51st statehood, it might be appropriate to note that the first OED citation for statehood back in 1868 shows that word to be—you guessed it—an Americanism.
First OED citation of the word statehood, from an article on the impeachment of Pres. Johnson and the nomination of Gen. Grant, in the New York Times, June 8, 1868.
Language Forensics
By Ben Zimmer via The New York Times
IMAGINE, if you will, a young Mark Zuckerberg circa 2003, tapping out e-mail messages from his Harvard dorm room. It’s a safe bet he never would have guessed that eight years later a multibillion-dollar lawsuit might hinge on whether he capitalized the word “Internet,” or whether he spelled “cannot” as one word or two.
But that is exactly the kind of stylistic minutiae being analyzed in a lawsuit filed by Paul Ceglia, owner of a wood-pellet fuel company in upstate New York. Mr. Ceglia says that a work-for-hire contract he arranged with Mr. Zuckerberg, then an 18-year-old Harvard freshman, entitles him to half of the Facebook fortune. He has backed up his claim with e-mails purported to be from Mr. Zuckerberg, but Facebook’s lawyers argue that the e-mail exchanges are fabrications.
When legal teams need to prove or disprove the authorship of key texts, they call in the forensic linguists. Scholars in the field have tackled the disputed origins of some prestigious works, from Shakespearean sonnets to the Federalist Papers. But how reliably can linguistic experts establish that Person A wrote Document X when Document X is an e-mail — or worse, a terse note sent by instant message or Twitter? After all, e-mails and their ilk give us a much more limited purchase on an author’s idiosyncrasies than an extended work of literature. Does digital writing leave fingerprints?
The law firm representing Mr. Zuckerberg called upon Gerald McMenamin, emeritus professor of linguistics at California State University, Fresno, to study the alleged Zuckerberg e-mails. (Normally, other data like message headers and server logs could be used to pin down the e-mails’ provenance, but Mr. Ceglia claims to have saved the messages in Microsoft Word files.) Mr. McMenamin determined, in a report filed with the court last month, that “it is probable that Mr. Zuckerberg is not the author of the questioned writings.” Using “forensic stylistics,” he reached his conclusion through a cross-textual comparison of 11 different “style markers,” including variant forms of punctuation, spelling and grammar.
But Mr. McMenamin’s report has raised eyebrows in the forensic linguistics community. Earlier this month, the outgoing president of the International Association of Forensic Linguists, Ronald R. Butters, publicly questioned whether Mr. McMenamin could actually establish that Mr. Zuckerberg likely did not write the e-mails based on such slender evidence. For example, the would-be Zuckerberg e-mails had one instance of uncapitalized “internet,” while a sample of e-mails known to be sent by Mr. Zuckerberg had two capitalized instances of “Internet.” “Are we really doing ‘scientific’ and ‘linguistic’ analysis at all when we simply note instances or absences of this or that superficial textual feature?” Mr. Butters asked.
Some experts are more optimistic. Carole E. Chaski, president of Alias Technology and executive director of the Institute for Linguistic Evidence, has taken on what she terms “the keyboard dilemma,” that is, “the problem of identifying the authorship of a document that was produced by a computer to which multiple users had access.” She has developed computer software that categorizes grammatical structures as “marked” and “unmarked”: an unmarked noun phrase, for instance, has its main noun at the end of a simple phrase (“our marriage,” “a divorce”), while a marked one has the noun in the beginning of a phrase (“anything you ask”) or in the middle (“the rest of our lives”). These aspects of a writer’s syntax are relatively stable across different styles of writing, Ms. Chaski argues. They are also less prone to technological intervention — compared to spelling and punctuation, which can be changed on the fly by spell-check and autocorrect features.
Recently, a team of computer scientists at Concordia University in Montreal took advantage of an unusual set of data to test another method of determining e-mail authorship. In 2003, the Federal Energy Regulatory Commission, as part of its investigation into Enron, released into the public domain hundreds of thousands of employee e-mails, which have become an important resource for forensic research. (Unlike novels, newspapers or blogs, e-mails are a private form of communication and aren’t usually available as a sizable corpus for analysis.)
Using this data, Benjamin C. M. Fung, who specializes in data mining, and Mourad Debbabi, a cyber-forensics expert, collaborated on a program that can look at an anonymous e-mail message and predict who wrote it out of a pool of known authors, with an accuracy of 80 to 90 percent. (Ms. Chaski claims 95 percent accuracy with her syntactic method.) The team identifies bundles of linguistic features, hundreds in all. They catalog everything from the position of greetings and farewells in e-mails to the preference of a writer for using symbols (say, “$” or “%”) or words (“dollars” or “percent”). Combining all of those features, they contend, allows them to determine what they call a person’s “write-print.”
Many linguists, however, would challenge the notion that the “fingerprint,” a supposedly unique identifier, can be metaphorically applied to writing. Surely we all have our own written quirks and mannerisms — I tend to overuse em-dashes, for instance. But there is just too much internal variability in any person’s body of writing to imagine that we could take just a bit of it — a handful of e-mails — and recognize some sort of linguistic DNA. That is all the more true when it comes to digital genres like text messages, instant messages and tweets, full of unusual spellings and innovative abbreviations, and often sensitive to the type of device we’re using.
Still, these new quantitative approaches hold out the hope of at least differentiating one author from another with a reasonable degree of confidence. This can provide the kind of reliable foundation for research that forensic stylistics as traditionally practiced cannot. Hmm, or is that “can not”?


