Lsa Admin,+Maybaum
Lsa Admin,+Maybaum
3877
39 Published by the Linguistic Society of America
REBECCA MAYBAUM
University of Haifa
One of the major concerns of sociolinguists is to better understand and explain the
mechanisms driving language change, in particular the process by which
innovative variants appear and subsequently spread throughout a population.
Questions regarding the diffusion of new variants over time have been explored
from a variety of perspectives (most prominently in socio- and historical
linguistics), and a consistent finding is that the diffusion of innovative variants
through the linguistic system forms an S-shaped curve with respect to time
(Labov 2001).
Similar observations are reported from the field of innovation diffusion
research, an interdisciplinary area of the social sciences concerned with how, why,
and at what rate innovative ideas and technologies spread through social systems.
Studies from innovation diffusion research have shown that the rate of diffusion
of (non-linguistic) innovations—including medical, agricultural, political, and
technological examples—also forms an S-shaped curve with respect to time
(Rogers 1995).
The similarities between findings from language change research and
innovation diffusion research suggest that language change may be explained by
the same mechanisms that govern the social diffusion of non-linguistic
innovations. In this paper I apply the theoretical framework of innovation
diffusion research to an instance of language change. By approaching the
diffusion of linguistic innovations as a social process, I hope to gain insights into
the mechanisms of language change from a new perspective.
Section 1 gives a background of the S-curve model of diffusion from both the
152
Rebecca Maybaum
153
Social Diffusion of Lexical Innovations in Twitter
reach a critical mass of adoption and fail to diffuse widely. Furthermore, even in
the case of successful innovations, some diffusion patterns may not form an S-
shaped curve due to specific conditions related to the social system or to the
innovation itself. The exact shape of the diffusion curve must be empirically
determined for each individual innovation; deviations from the prototypical S-
shaped curve may be interpreted based on the idiosyncratic conditions of the
specific innovation and the specific social system.
154
Rebecca Maybaum
(1) Diffusion through a linguistic system (left) and social system (right)
In both graphs the x-axis indicates time; the y-axes, however, represent
different measures. On the left, the y-axis represents all occurrences of a particular
linguistic context y—a context in which evidence of variation of some sort has
been identified, and which is the suspected locus of a change in progress. Each
point in the curve represents the percentage of all instances of context y that are
realized as variant z, at each point in time (for instance, the percentage of ne
deletion in negation contexts in Montreal French).
On the right, the y-axis represents the population of potential adopters—that is,
individuals within the community who could conceivably be exposed to the
innovation and might eventually adopt it themselves. Each point in the curve
represents the percentage of the total potential adopter population who have
adopted the innovation at each point in time.
The distinct y-axis labels in both graphs mean that the S-curves of diffusion
discussed in sociolinguistics and innovation diffusion in fact measure two entirely
different concepts. Socio- and historical linguists too often fail to make this
distinction, referring to the increase of a linguistic variant relative to the text data
as social diffusion, when in fact social diffusion is measured by the proportion of
individual language users who adopt the variant.
155
Social Diffusion of Lexical Innovations in Twitter
(3) For all you working tweeps out there...apparently tweeting at work is a
good thing
(4) Time for bed - busy day tomorrow. Goodnight Tweethearts! Thanks for
the fun and tweet dreams:)
The paper addresses the following research questions: 1) Do the Twitter People
variants follow an S-curve pattern of social diffusion? If so, that would suggest
that language change shares characteristics with other kinds of social processes,
and may be governed by the same mechanisms that shape the social diffusion
patterns of non-linguistic innovations. 2) Are the diffusion patterns consistent
across all Twitter People variants? If not, are these differences correlated with
other factors, such as the relative success/failure of the variant (as measured by
overall frequency)? 3) How many times must a user post a Twitter People variant
to be considered an adopter? And, 4) is the shape of the diffusion curve affected
156
Rebecca Maybaum
by the criteria used to define adoption? (See below for discussion of adopter
criteria.)
The Twitter People Corpus consists of all Twitter posts that contained any of the
Twitter People variants listed previously, posted from 2006 through 2011. The
data collected includes the full text of the tweets, the user name of the author, a
timestamp of the publication accurate to the second, and a variety of additional
metadata. Table (5) displays the word count, number of tweets, and number of
individual users for each Twitter People variant subcorpus.
(5) Twitter People variants—word count, tweet count, and user count
In total the corpus contains close to 19 million words. By far the most popular
and most widespread variant of the group is tweeps, appearing in more than 800
thousand tweets. At the other extreme is twittertwatters, appearing just 16 times
throughout the entire time period represented by the corpus.
One possible way to represent the rate of diffusion would be to calculate the raw
frequencies of each keyword based on the total number of occurrences in the
corpus. However, the diffusion rate of a socially-diffused innovation is more
often—and more usefully—measured based on the time at which individuals
adopt the innovative term, with no regard to the times of subsequent productions.
For the individual Twitter People variant diffusion graphs, time of adoption is
defined by the Unix timestamp that corresponds to the earliest tweet of each
unique user in each variant subcorpus.
157
Social Diffusion of Lexical Innovations in Twitter
158
Rebecca Maybaum
The resulting value represents proportional adoption rates of each variant adjusted
for the simultaneous increase in the total population of potential adopters.
The question of how to define adoption in the context of Twitter People variants
in Twitter is one that must be carefully considered, as it may have significant
consequences for the analysis itself and for the interpretations of the results. The
simplest and most straightforward definition would be to consider any single use
of the variant in question as adoption. However, it is possible that some users
adopted a Twitter People variant on a trial basis (as part of the innovation-
decision process) process before subsequently rejecting it. In this case, a single
post containing the variant would not constitute final adoption. Two adopter
criteria conditions, based on number of posts per user (all users vs. multiple-post
users only), are assessed in the diffusion analysis.
3 Results
The results of the analysis are displayed using graphical representations of the
adoption/diffusion patterns of each Twitter People variant, as well as the entire
group of Twitter People terms combined, over the time period represented by the
data.
Although the data collection spanned the time range from March 2006
through January 1, 2012, none of the variants appeared prior to 2007. Because of
this, all of the diffusion graph results are presented with an x-axis time scale of
January 2007 until January 2012. The y-axis range varies according to the overall
frequency of each variant for best visual comparison of the overall trajectories of
the diffusion curves.
When looking at the diffusion patterns for the Twitter People variants based on all
users, two common patterns emerge. The first pattern resembles the prototypical
S-curve predicted by both the sociolinguistic and the innovation diffusion
literature. Examples of this pattern are shown in (8). The theoretical S-curve
model measures cumulative frequency of the innovation over time, which would
mean that the y-value can never decrease over time. However, because the
diffusion curves of this study are based on proportional frequency relative to a
simultaneously increasing Twitter population, it is possible for the number of
potential adopters to increase more rapidly over time than the number of
cumulative adopters, as seen in (7).
159
Social Diffusion of Lexical Innovations in Twitter
Tweeple!–!adop2on!over!2me!(all!users)& Twee$es!–!adop2on!over!2me!(all!users)&
0.018%$ 0.030%$
(as!%!of!total!Twi0er!popula2on)&
(as!%!of!total!Twi0er!popula2on)&
0.016%$
0.025%$
0.014%$
0.012%$ 0.020%$
Twee$es!users!
Tweeple!users!
0.010%$
0.015%$
0.008%$
0.006%$ 0.010%$
0.004%$
0.005%$
0.002%$
0.000%$ 0.000%$
Jan,07$
May,07$
Sep,07$
Jan,08$
May,08$
Sep,08$
Jan,09$
May,09$
Sep,09$
Jan,10$
May,10$
Sep,10$
Jan,11$
May,11$
Sep,11$
Jan,12$
Jan-07$
May-07$
Sep-07$
Jan-08$
May-08$
Sep-08$
Jan-09$
May-09$
Sep-09$
May-10$
Sep-10$
Jan-11$
May-11$
Sep-11$
Jan-10$
Jan-12$
Examples of non-S-curve diffusion patterns are shown in (8). These diffusion
curves are stepwise or near-linear in pattern, and are characterized by continuous
increase over time, in some cases interspersed with periods of stable proportional
frequency.
Tweeps!–!adop2on!over!2me!(all!users)! Tweethearts!–!adop2on!over!2me!(all!users))
0.08%$ 0.008%$
(as!%!of!total!Twi0er!popula2on)!
(as!%!of!total!Twi0er!popula2on))
0.07%$ 0.007%$
0.06%$ 0.006%$
Tweethearts!users!
0.05%$ 0.005%$
Tweeps!users!
0.04%$ 0.004%$
0.03%$ 0.003%$
0.02%$ 0.002%$
0.01%$ 0.001%$
0.00%$ 0.000%$
Jan007$
May007$
Sep007$
Jan008$
May008$
Sep008$
Jan009$
May009$
Sep009$
Jan010$
Sep010$
Jan011$
May011$
Sep011$
Jan012$
Jan007$
May007$
Sep007$
Jan008$
May008$
Sep008$
Jan009$
May009$
Sep009$
Jan010$
Sep010$
Jan011$
May011$
Sep011$
Jan012$
May010$
May010$
Table (9) shows the distribution of diffusion patterns for all variants.
160
Rebecca Maybaum
As discussed in Section 2, the criteria used to determine whether a Twitter user is,
in fact, an adopter of the Twitter People variants may be adjusted based on the
total number of posts per user. Over 70 percent of users posted only once, while
less than 14 percent posted three or more times.
Figure (10) shows the proportion of adopter type (based on total posts per
user) in each Twitter People variant subcorpus, arranged in order of overall
frequency from left to right. The graph shows a positive correlation between
overall frequency of the variant (indicating relative success of diffusion) and the
percentage of users with multiple posts (2+ and 3+ post users), and a negative
correlation between overall frequency of the variant and the percentage of users
who posted only a single time throughout the period represented in the corpus.
When the adopter criteria are limited to include only those users who posted
multiple times (2+ and 3+ users), the resulting diffusion curves are altered. More
of the Twitter People variants exhibited S-shaped diffusion curves under the
multiple-posts-per-user condition than in the all-users condition. Figure (11)
shows the diffusion curves for tweeps and tweethearts for users with three or
more posts each. The same variants that did not produce S-shaped diffusion
curves in the all-user filter (8) now follow the “slow-quick-slow” S-curves under
the multiple-posts-per-user filter.
161
Social Diffusion of Lexical Innovations in Twitter
Tweeps!–!adop2on!over!2me!(3+!post!users)! Tweethearts!–!adop2on!over!2me!(3+!post!users))
0.02%$ 0.001%$
(as!%!of!total!Twi0er!popula2on)!
(as!%!of!total!Twi0er!popula2on))
0.01%$ 0.001%$
0.01%$ 0.001%$
Tweethearts!users!
0.01%$
Tweeps!users!
0.000%$
0.01%$
0.000%$
0.01%$
0.00%$ 0.000%$
0.00%$ 0.000%$
0.00%$ 0.000%$
Jan*07$
May*07$
Sep*07$
Jan*08$
May*08$
Sep*08$
Jan*09$
May*09$
Sep*09$
Jan*10$
Sep*10$
Jan*11$
May*11$
Sep*11$
Jan*12$
Jan)07$
May)07$
Sep)07$
Jan)08$
May)08$
Sep)08$
Jan)09$
May)09$
Sep)09$
Jan)10$
Sep)10$
Jan)11$
May)11$
Sep)11$
Jan)12$
May*10$
May)10$
Figure (12) summarizes the diffusion patterns for Twitter People variants in
the multiple-posts-per-user filter. (Twittertwatters was excluded because there are
not enough data points for multiple-post users to produce a diffusion curve for
that variant.) The only variant that does not follow an S-curve is tweetheads.
4 Discussion
In this section I discuss the significance of the main findings for the diffusion
analysis, beginning with the S-shaped diffusion patterns for the all-user adopter
criteria exemplified in (7) and (8), and summarized in (9). Although Rogers
(1995) claims that the S-curve pattern occurs only in cases of successful diffusion,
the results show an even distribution of S-curve versus non–S-curve diffusion
patterns across the range of frequencies of the variants. I found no significant
difference between the likelihood of a popular slang term (e.g. tweeps) vs. an
unpopular slang term (e.g. twittertwatters) to diffuse in an S-shaped pattern.
While it is not a trivial finding that five out of ten of the Twitter People
variants follow the S-curve pattern of diffusion—this at least partially confirms
the hypothesis that language change may diffuse socially via the same
mechanisms as non-linguistic innovations—neither is it overwhelmingly
162
Rebecca Maybaum
conclusive. With the introduction of the varying adopter criteria, however, the
results become much more telling.
The clear majority of all Twitter users in the corpus only authored a single
post employing the Twitter People variant. In other words, most of the Twitter
users tried out the new slang term once, but never fully integrated it into their
permanent lexicon. This raises the question of the degree of perceived
trialability—the ability to try out an innovation on a trial basis without making a
commitment—of Twitter People variants. This allows the individual to judge the
merits and/or consequences of the innovation under real conditions. In this case, a
Twitter user can try out one of the innovative Twitter People variants one time
with a minimum of risk or inconvenience. The attribute of trialability is positively
correlated with rate of adoption (Rogers 1995), meaning the Twitter People
variants (and likely for the same reasons, other innovations within Twitter and on
the Internet as a whole) are predicted to diffuse rapidly.
The relationship found between the number of posts per user and overall
frequency of the variant (10) also supports the interpretation that single-post users
were engaging in a trial period before deciding whether or not to adopt the
innovation. The more popular variants retained more adopters after the trial period
than did the less popular variants, thus the more successful variant subcorpora
have a higher proportion of repeat posters than the less successful variants.
The innovation-diffusion process (Rogers 1995), briefly described in Section
1, conceptualizes the act of adoption as a five-stage process. The first stage is
knowledge, or exposure to the innovation, followed by persuasion, when the
individual forms an initial favorable or unfavorable attitude toward the innovation.
The third stage is the decision stage, and it is here that the trialability of an
innovation comes into play. The first time a Twitter user tries out a Twitter People
variant (or any other innovative element), he or she is engaged in a decision-stage
activity with the purpose of informing the decision to adopt or reject the
innovation. If at this point the individual decides to adopt the innovation, this
stage is followed by implementation (with possible re-invention) and
confirmation. Rejection can occur at any stage in the innovation-diffusion process.
The minimum requirement for an individual to be considered an adopter is the
implementation (post-trial) stage; in the Twitter People data, this can be defined
as a user’s second post using the same Twitter People variant. We can be even
more confident of the adopter classification, however, if the individual has
advanced through the confirmation stage—signaled by a user’s third post.
Following this model, single-post users should not be considered adopters.
A comparison of (8) and (11) illustrates the effect of altering the adopter
criteria to exclude non-adopter single-post trial users from the diffusion data. The
most dramatic transformation occurred for the most successfully diffused variant
in the corpus, tweeps. The summary of diffusion patterns for Twitter People
variants using the multiple-posts-per-user adopter criteria (12) reveals that all but
163
Social Diffusion of Lexical Innovations in Twitter
5 Conclusion
The results of the Twitter People diffusion analysis lend support to the view that
language change is a socially driven process, and can be successfully analyzed
using methods and theoretical frameworks from social science disciplines beyond
linguistics. While some of the details of the Twitter People analysis varied from
specific assumptions of the innovation diffusion theoretical framework (for
instance, the failed Twitter People variants were as likely as successful ones to
follow an S-curve pattern of diffusion), the major tendencies found across
innovation diffusion studies held true for the Twitter People variants. The
established concepts of the innovation-decision process and innovation attributes
(in particular the notion of trialability) also provided a cohesive framework and
valuable explanations for interpreting the results.
Applying classic innovation diffusion research methods to the study of
language change gives sociolinguists a powerful tool for verifying and
interpreting the results of both theoretical simulations of large-scale linguistic
diffusion and in-depth empirical research investigating real-world language
change on a smaller scale. Although some quantitative methods were used, this
has remained essentially a qualitative study of diffusion over time. In the future, a
fully quantitative research design may be able to more precisely compare the
diffusion patterns than was possible here. The intersection between innovation
diffusion and language change is a relatively unexplored area, which will benefit
from the analysis of new data sources, as well as further theoretical development.
As a whole, this study represents the successful application of a new (to
linguistics) approach that can add another dimension to the study of the
mechanisms of language change as a social process.
References
164
Rebecca Maybaum
Cameron, Deborah, and Don Kulick. 2003. Language and Sexuality. Cambridge:
Cambridge University Press.
Coleman, James S., Elihu Katz, and Herbert Menzel. 1966. Medical Innovation: A
Diffusion Study. Indianapolis: Bobbs-Merrill Co.
Cukor-Avila, Patricia, and Guy Bailey. 2001. The Effects of the Race of the
Interviewer on Sociolinguistic Fieldwork. Journal of Sociolinguistics,
5(2):252–270.
Denison, David. 2002. Log(ist)ic and Simplistic S-curves. In R. Hickey, ed.,
Motives for Language Change, 54–70, Cambridge: Cambridge University Press.
Eckert, Penelope. 2000. Linguistic Variation as Social Practice: The Linguistic
Construction of Identity in Belten High. Oxford: Wiley-Blackwell.
Granovetter, Mark, and Roland Soong. 1988. Threshold Models of Diversity:
Chinese Restaurants, Residential Segregation, and the Spiral of Silence. In C. C.
Clogg, ed., Sociological Methodology, 69–104, Washington, DC: American
Sociological Association.
Katz, Elihu, Martin L. Levin, and Herbert Hamilton. 1963. Traditions of Research
on the Diffusion of Innovation. American Sociological Review, 28(2):237–252.
Ke, Jinyun, Tao Gong, and William S.-Y. Wang. 2008. Language Change and
Social Networks. Communications in Computational Physics, 3(4):935–949.
Labov, William. 1964. Phonological Correlates of Social Stratification. American
Anthropologist, New Series, 66(6):164–176.
Labov, William. 1972. Sociolinguistic Patterns. Philadelphia: University of
Pennsylvania Press.
Labov, William. 1994. Principles of Linguistic Change, vol 1: Internal Factors.
Oxford: Wiley-Blackwell.
Labov, William. 2001. Principles of Linguistic Change, vol 2: Social Factors.
Oxford: Blackwell.
Landsbergen, Frank, and Robert F. Lachlan. 2004. A Cultural-Evolutionary
Perspective on Semantic Change: The Role of Non-Social Factors in the Spread
of Innovations. In Proceedings of the Sixth Annual High Desert Linguistics
Society Conference, 6:47–58. Presented at the High Desert Linguistics Society
Conference, Albuquerque, New Mexico.
Mahajan, Vijay, and Robert A. Peterson. 1978. Innovation Diffusion in a
Dynamic Potential Adopter Population. Management Science, 24(15):1589–
1597.
Milroy, James, and Leslie Milroy. 1985. Linguistic Change, Social Network and
Speaker Innovation. Journal of Linguistics, 21(02):339–384.
Nettle, Daniel. 1999. Using Social Impact Theory to Simulate Language Change.
Lingua, 108:95–117.
Ochs, Elinor. 1992. Indexing Gender. In A. Duranti and C. Goodwin, eds.,
Rethinking Context: Language as an Interactive Phenomenon, 335–358,
Cambridge: Cambridge University Press.
165
Social Diffusion of Lexical Innovations in Twitter
Rebecca Maybaum
Department of English Language & Literature
University of Haifa
Mount Carmel
199 Aba-Hushi Avenue
Haifa 31905
166