THE TWITTER social network has fostered new ways of communicating, some of which I’m apparently never meant to understand. Fortunately, though, Prof. Jacob Eisenstein of the School of Interactive Computing, Georgia Institute of Technology, is an avid tweeter (@jacobeisenstein) and an accomplished researcher. He and his colleagues have published Diffusion of Lexical Change in Social Media, in which they analyzed Twitter messages over 2009 to 2012. Here are several tidbits about their methodology and findings.

Their approach uses high-level statistics, specifically something called vector autoregression. Loosely (make that very loosely), VAR looks at sets of data, determines relationships among them and allows predictions of future relationships. It’s especially useful for analyzing dynamic behavior of economic, financial and, in this case, social phenomena over time.

Eisenstein and his colleagues had huge data sets, 107 million Tweets from 2.7 million Tweeters and, where determinable, the Tweeters’ geographical and cultural environments. In overview, they note, “Rather than moving towards a single unified ‘netspeak’ dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.”

For example, the abbreviation ikr (I know, right?) occurs four times more frequently in the Detroit area than in the U.S. overall. The emoticon ^-^ is a Southern California thing. The phonetic spelling suttin (something) remains a unique New York City affectation.


Some Twitter usage spreads around the country. Others remain localized. Image from Diffusion of Lexical Change in Social Media.

Terms may begin locally, then spread uniformly. For example, the emoticon -_- appeared in the researchers’ 2009 data predominately around a few cities. By 2012, its usage had spread considerably.

By contrast, the abbreviation ctfu (cracking the f**k up, expressing laughter) entered their data principally in Cleveland. By weeks 100- 150, it was widely used in Pennsylvania and the mid-Atlantic, but rare in Detroit, Chicago or other large cities west of Cleveland.

Researchers also cite “Examples of linguistically linked city pairs that are geographically distant but demographically similar.” For example, Washington, D.C., and New Orleans share a high proportion of African-Americans, found to be “the single biggest predictor of similar usage online.” The high proportions of Hispanics in Los Angeles and Miami gave this pair their own online flavor. By contrast, Boston and Seattle are both relatively ethnic-free, yet shared linguistic commonalities.

Some words are region-specific. The plural pronoun yinz (as in “I’ll see yinz later”) occurs in a tight clump around Pittsburgh. The adjective hella (as in “That movie was hella long”) is uniquely Northern California. It’s not cited in the paper, but I’d conjecture that wicked as an adverb (“It’s wicked cold today!”) would be popular in Maine tweets.


Geographical distribution of some Twitter terms. Image from ScienceShot, February 15, 2015.

The terms bogus and legit appear to be citified. By contrast frfr (for real, for real) is widespread throughout the Deep South. And researchers observe, without explanation, the clustering of lls (laughing like sh*t) around Maryland.

By the way, Eisenstein and his colleagues also cite the # hashtag as having a neutralizing effect on tweets. Apparently reaching for a larger audience, those using # links tend to tweet less dialectally.

