A_Faceted_Classification_Scheme_for_Computer-Media
A_Faceted_Classification_Scheme_for_Computer-Media
net/publication/31590573
CITATIONS READS
599 5,095
1 author:
Susan Herring
Indiana University Bloomington
188 PUBLICATIONS 19,653 CITATIONS
SEE PROFILE
All content following this page was uploaded by Susan Herring on 07 August 2014.
Susan C. Herring
Indiana University, Bloomington
Abstract
This article describes a classification scheme for computer-mediated discourse that classifies samples in terms of
clusters of features, or “facets”. The goal of the scheme is to synthesize and articulate aspects of technical and social
context that influence discourse usage in CMC environments. The classification scheme is motivated, presented in
detail with support from existing literature, and illustrated through a comparison of two types of weblog (blog) data.
1. Introduction
It is by now a truism that computer-mediated communication (CMC) – defined here as
telephony – provides an abundance of data on human behavior and language use. Confronted with
such abundance, researchers and practitioners have naturally sought to group, label, or otherwise
organize CMC into categories that would facilitate its analysis and uses. However, there has been
neither systematic discussion of how this should be done nor consensus regarding individual
attempts to do so, many of which have been implicit and ad hoc. As a consequence, how to
This article is concerned with the classification of CMC for research purposes, with a focus
on online language and language use, hereafter referred to as computer-mediated discourse (CMD;
Herring 1996, 2001). Specifically, it proposes an approach to the classification of CMD based on
multiple categories or “facets”, a concept borrowed from classification theory in the field of
library and information science. In contrast to applications in that field, however, which are
primarily concerned with information storage and retrieval, the goal of the CMD scheme is to
articulate aspects of context – both technical and social – that potentially influence discourse
usage in CMC environments, and thereby to bring them to the conscious attention of the
researcher. In this, it is akin in spirit to Hymes’ (1974) etic grid, also known as the SPEAKING
context.
The organization of this article reflects its goal to motivate, articulate, and illustrate a
model. The next section identifies the basic problem that gave rise to the need for a CMD
overview of the proposed faceted classification scheme for CMD and describe its dimensions and
1990s. Accustomed to dealing with two basic modalities of language – speech and writing – these
linguists first asked: Is it a type of writing, because it is produced by typing on a keyboard and
read as text on a computer screen? Is it “written speech” (Maynor 1994), because it exhibits
features of orality, including rapid message exchange, informality, and representations of prosody?
Or is it a third type, intermediate between speech and writing, or in any event characterized by
unique production and reception constraints (Ferrara, Brunner & Whittemore 1991; Murray
1990)?
language, as if CMD were a single, homogeneous genre or communication type. Even in recent
years, “Netspeak” has been posited as an emergent, global variety of online language
However, as awareness of CMC spread with the popularization of the Internet, it soon
became apparent that computer-mediated discourse was sensitive to a variety of technical and
situational factors, making it complex and variable (Baym 1995; Cherny 1999; Herring 1996).
Simultaneously, the focus of much CMD research shifted to describing the linguistic features of
individual genres of CMD, e.g., email discussion lists, Usenet newsgroups, Internet Relay Chat
(IRC), and MUDs.1 Elsewhere, I have termed these “socio-technical modes” (Herring 2002) –
following Murray’s (1988) use of the term “mode” to refer to technologically-defined CMC
subtypes – to reflect the fact that labels such as “IRC”, “Usenet”, “email”, and so forth are
commonly understood to refer not just to CMC systems, but also to the social and cultural
The genre and mode approaches, however, while preferable to lumping all CMC into a
single type, are also limited as a basis for classification of CMD. First, the concept of genre can
is thus imprecise. For example, is the appropriate level of genre classification “email discussion
lists”, “academic discussion lists” (cf. Grüber 2000) or “academic discussion lists on
masculine/feminine topics” (cf. Herring 1996) – each of which is associated with characteristic
linguistic practices? The mode approach partially addresses this problem, in that it refers primarily
to technologically-defined CMD types,2 but it neglects social distinctions of the sort identified by
Another limitation of both the genre and mode approaches is that they are most easily
applied to classify discourse that takes place using established, named technologies (cf. Swales
1990), such as those that are popular on the Internet. It is less clear how either approach could be
1
See, for example, Werry (1996) for IRC, Baron (1998) for email, Cherny (1999) for social MUDs, and Grüber
(2000) for academic discussion lists.
2
In the case of the example of email-based discussion, “listservs” are a mode, as distinct from “newsgroups” and
“Bulletin Board Systems (BBS)”, based first and foremost on their different technical configurations (e.g., push vs.
pull delivery; subscription/registration requirements).
Language@Internet 1/2007 (http://www.languageatinternet.de, urn:nbn:de:0009-7-7611, ISSN 1860-2029)
4 SUSAN HERRING
used to classify new and emergent forms of CMD, or discourse that takes place via customized
systems that operate within restricted (e.g., educational, governmental, organizational) domains.
is based on multiple categories or “facets”. These categories cut across the boundaries of socio-
technical modes, and combine to allow for the identification of a more nuanced set of computer-
mediated discourse types, while avoiding the imprecision associated with the concept of genre.
Since the classification scheme does not rely on pre-existing modes, it can also be applied to
discourse mediated by emergent and experimental CMC systems. The scheme is intended
primarily as a faceted lens through which to view CMD data in order to facilitate linguistic
analyses, which can provide a convenient shorthand for categorizing CMD types, but are less
2. Background
(CMDA) approach developed by Herring (2001, 2004a);4 the scheme is presented here in detail
for the first time. CMDA adapts methods from the study of spoken and written discourse to
computer-mediated communication data. Similarly, the central role of classification in CMDA can
Discourse analysts have traditionally classified discourse into types according to various
criteria. These include modality, number of discourse participants, text type or discourse type, and
3
For a recent overview of research in the sociolinguistics tradition, see Androutsopoulous (2006).
4
The other core components of CMDA are levels of analysis and operationalization of concepts; see Herring (2004a).
genre or register (table 1). While the definitions and boundaries of these distinctions have been
much debated, they can be understood as being in a generally non-exclusive and hierarchical
relationship to one another (e.g., casual chat is a type of conversation, typically a dialogue and
typically produced via speech). As noted above, however, genre can be analyzed on multiple
levels of generality, and thus all of the types in table 1 have also been characterized as “genres”.5
Further, Biber (1988) has challenged the validity of the spoken/written language distinction,
facilitates analysis. This is because exemplars of the same type of discourse tend to share features
5
It is also possible to identify sub-genres of the genres in table 1, for example, a job interview as compared to an
interview on a radio or television talk show, a personal Christmas letter as compared to a personal letter breaking off
relations with one’s paramour (i.e., a Dear John letter).
6
In the sense of Biber (1988). “Register” has another usage in linguistics (as a shorthand for formal/informal style)
that is not intended here.
Language@Internet 1/2007 (http://www.languageatinternet.de, urn:nbn:de:0009-7-7611, ISSN 1860-2029)
6 SUSAN HERRING
that distinguish them collectively from other discourse types; classification makes this explicit,
Classification may also serve to remind the analyst to attend to important properties of the
data under consideration, even when no overt comparison is involved. For example, spoken
discourse typically has shorter sentences and words, more sentence fragments, and more markers
of interpersonal relations than discourse produced in writing (Chafe & Danielewicz 1987). A
researcher interested in studying sentence complexity might analyze both spoken and written texts,
but to do so without taking modality into account could result in overlooking systematic,
conditioned patterns in the data. Moreover, certain linguistic and rhetorical phenomena occur
regularly only in certain discourse or text types. Examples include turn taking in spoken dialogue,
Virtanen 1992). A researcher interested in turn taking, for example, must identify text type as a
The setting refers to the time and place, while scene describes
Setting/Scene
the “psychological setting” or “cultural definition” of a scene.
Participants Speaker and audience.
Ends Purposes, goals, and outcomes.
Act sequence Form and order of events.
Key The “tone, manner, or spirit” of the speech.
Instrumentalities Channels, forms, and styles of speech.
Social rules governing the event and the participants’ actions
Norms
and reaction.
Genres The type of speech or event.
sequence, Key, Instrumentalities, Norms, and Genres, which together form the acronym
SPEAKING. This model has been widely applied to characterize novel or exotic speech
communities (e.g., Nevins 2004), serving as what Hymes calls an “etic grid”, or preliminary
descriptive framework, that draws the researcher’s attention to aspects of the speech situation that
Analysts of computer-mediated discourse have many of the same needs for classification as
traditional spoken and written discourse analysts: Properties of the medium that predict language
variation must be identified; CMD modes must be characterized, and novel CMD situations call
for etic description. These needs are compounded by the rapid pace with which new computer-
mediated communication technologies, such as SMS (text messaging through mobile phones),
instant messaging, and blogs, have emerged into popular use over the past decade (Herring
2004b). Other technologies will inevitably follow, placing a continuing demand on linguists to
discourse as a whole, often based on limited data.7 Ferrara et al. (1991), for example, described
CMD as an “emergent register” based on their study of one type of experimental, synchronous
CMD. Crystal’s (2001) characterization of the language of the Internet as “Netspeak” is a more
recent example of this globalizing approach. Relatedly, early attempts to classify CMD in relation
to speaking and writing tended to consider only one form of CMD (Werry 1996; Yates 1996),
although some researchers have suggested a continuum along which asynchronous CMD occupies
7
Notable exceptions are Murray (1988) and Severinson-Eklundh (1986).
Language@Internet 1/2007 (http://www.languageatinternet.de, urn:nbn:de:0009-7-7611, ISSN 1860-2029)
8 SUSAN HERRING
a position closer to writing, and synchronous CMD occupies a position closer to speaking (e.g.,
Herring 2001).
linguistic perspective is Cherny’s (1999) extended ethnographic study of a social MUD. Cherny
(1999) emphasized that the norms for discourse in a social MUD are not the same as those for
Internet Relay Chat, despite the fact that both are synchronous chat environments. Linguistic
variation can be observed between one social MUD and another, based on the histories, norms,
and user demographics of each group, leading Cherny to characterize individual MUDs as “speech
communities”.
The third approach, which most closely resembles that taken in the present article, involves
classifying CMD data according to a pre-defined set of categories. As early as 1988, Murray
applied a Hymesian grid to characterize different forms of CMD in use among workers in a large
U.S. technology organization. Collot and Belmore (1996) also adopted Hymes’ taxonomy to
describe asynchronous BBS data, as a preliminary to quantitative analysis. Although their focus
was not on language, Rice and Gattiker (2000) developed an extensive classification grid in which
they situated CMC in relation to other forms of mediated communication. However, they did not
In her analysis of television soap opera fan newsgroups, Baym (1995: 141) drew on
previous research to identify five factors that condition variation in CMD: the external contexts –
physical, cultural, and subcultural – in which CMC use is situated; the temporal structure of the
group; the computer system infrastructure; the purpose of communication; and the characteristics
of the group and its members. Baym’s approach has a number of advantages: It is grounded in
empirical observations; it is tailored to CMD data and takes the contributions of the computer
8
For an overview of this research, see Herring (2002).
system into account; and its utility has been demonstrated through application to data. A
disadvantage is that it is limited to only five factors; it does not include, for instance, the
languages of the participants or the fonts available to express them (cf. Danet & Herring 2007).
In none of the studies mentioned above was classification the primary objective. Rather,
CMD researchers have characterized their data in pursuit of other goals, to distinguish them from
other kinds of data, and to invoke factors that explain their characteristics. The goal of the present
article is to systematize and extend these efforts in a classification scheme intended to highlight
those features of CMC that most directly affect users’ linguistic choices.
3. Faceted classification
Faceted classification is an approach to the organization of information with origins in the field of
library and information science. First systematized as a science by Ranganathan (1933) to classify
books in libraries, it was later developed by the U.K. Research on Classification Group (Vickery
1960) for the organization of document collections in scientific fields, where it proved effective in
the storage and retrieval of compound and complex subjects. More recently, faceted classification
has been implemented to assist automated search and retrieval of information (Prieto-Diaz 1991),
including on the Web (Broughton & Lane 2000), and has been extended to other fields and
Facets are categories or concepts of the same inherent type. A faceted scheme has several
facets and each facet may have several terms, or possible values, e.g., a faceted classification
scheme for wine might include the facets (and terms) “grape varietal” (riesling, cabernet
sauvignon, etc.), “region” (Napa Valley, Rhine, Bordeaux, etc.), and “year” (2001, 2002, etc.).
domain is first analyzed into component facets, and relevant facets are then synthesized into
combinations to characterize items of interest. Thus many facets may be applied to the description
of wine, but only a subset of them – such as varietal and region – may be relevant to classifying
wines for the purpose of marketing them to casual consumers. The flexibility of faceted
classification lies in its ability to describe a large number of items within the subject domain,
including novel items, on the basis of a relatively economical, pre-defined set of facets and terms.
The facets need not be ordered, nor be of the same type, although they should be clearly defined
The present model involves faceted classification in the general sense described above,
although it does not adhere to the specific criteria laid out by Ranganathan (1933) and others
regarding selection of facets for a given subject area. This is in part because the CMD scheme was
not designed from the top down as a faceted classification scheme, but rather evolved from the
bottom up, as in the case of Baym’s (1995) five factors that condition variation in CMC.
Moreover, as noted at the outset, its purpose is not to facilitate information storage and retrieval,
but rather to facilitate data selection and analysis in CMD research. These differences aside, the
CMD scheme functions in many ways like a traditional faceted classification scheme, and has
assumption that computer-mediated discourse is subject to two basic types of influence: medium
relationship, on the further assumption that one cannot be assigned theoretical precedence over the
other for CMD as a whole; rather, the relative strength of social and technical influences must be
Under each influence type, a number of categories (facets) are posited, along with several
possible realizations (terms) for each. The categories were arrived at in an inductive manner on
the basis of empirical evidence from the CMD research literature in answer to the question: What
communication systems. These are determined by messaging protocols, servers and clients, as
well as the associated hardware, software, and interfaces of users’ computers, in as much as it is
possible for the researcher to obtain such information. The inclusion of a set of technological
factors in the approach does not assume that the computer medium exercises a determining
Markus 1994), although each factor has been observed to affect communication in at least some
instances. One reason for including medium factors as a separate set is, precisely, to attempt to
discover under what circumstances specific system features affect communication, and in what
ways.
The second set consists of social factors associated with the situation or context of
communication. These include information about the participants, their relationships to one
another, their purposes for communicating, what they are communicating about, and the kind of
language they use to communicate (cf. Baym 1995; Hymes 1974). The inclusion of a set of
situation factors assumes that context can shape communication in significant ways, although it
does not assume that any given factor is always influential. The particular factors included in the
model described below have all been observed to condition variation in at least some CMD
contexts.
As in traditional faceted classification, these two sets of categories are open ended;
additional factors can be added as justified by evidence that they affect online discourse. Also,
within each set, the categories are unordered and not assumed a priori to be in any particular
relationship to one another. Categories may (or may not) interact, just as there may (or may not)
be patterned correspondences between medium and situation factors, in principle. In fact, modes
of CMD such as “listserv lists” and “Internet Relay Chat” exhibit characteristic combinations of
The categories themselves are each realized by more than one possible value. As in
traditional faceted classification, the categories may be heterogeneous, with values that are binary
presidential elections; marsupials; etc.); the latter type may be open ended.
The most straightforward procedure for applying the scheme is as follows. Once a sample
or corpus of CMD has been identified, the researcher goes through the categories for each set,
assigning the appropriate value for each category based on the information available to him or her
from the data, additional contextual knowledge he or she may possess, or general knowledge of
CMC. One or more categories may not be applicable to a particular CMD sample, in which case
This process should produce a list of all applicable values for the categories in each set.
The researcher may then select from the list of values those that are relevant to his or her
analytical purposes. In this sense, the scheme is analytico-synthetic (cf. Ranganathan 1933). As in
traditional faceted classification, it is also possible to apply the scheme selectively, by assigning
values only to those categories or facets that are relevant to the analysis.
The scheme may be applied to data samples of almost any size, although not all categories
are relevant for very small samples. For example, a sample of a single message does not readily
allow for generalizations about the “group” of which it is a part. Conversely, very large samples
may contain so much internal variation that it is meaningless to assign a single value for each
feature. In such cases, multiple values may be assigned to a feature for purposes of overall
characterization. The researcher may also decide to apply the scheme at the level of contrasting
classification scheme and cite empirical studies to justify their inclusion. The citations are meant
to be indicative only; many other studies could be cited that contribute relevant evidence.
Table 2 lists some of the most important medium factors that have been observed to
condition computer-mediated discourse, and that are therefore posited as categories in the
classification scheme. Although they are not in any necessary order, they are numbered in table 2
M1 Synchronicity
M3 Persistence of transcript
M6 Anonymous messaging
M7 Private messaging
M8 Filtering
M9 Quoting
The first medium factor relates to synchronicity of participation (Kiesler, Siegel &
McGuire 1984). Asynchronous systems do not require that users be logged on at the same time in
order to send and receive messages; rather, messages are stored at the addressee’s site until they
can be read. Email is an example of this type. In synchronous systems, in contrast, sender and
addressee(s) must be logged on simultaneously; various modes of “real-time” chat are the most
common forms of synchronous CMC.9 Most traditional forms of writing are asynchronous, and
comparing different types of CMC with spoken and written discourse (Condon & Cech 1996; Ko
1996; Yates 1996). Synchronicity is also a robust predictor of structural complexity, as well as
1996).
A cross-cutting technological dimension has to do with the granularity of the units that are
transmitted by the CMC system, that is, whether the transmission is message-by-message, or
whether or not simultaneous feedback is available during message exchange. With message-by-
message transmission, the receiver does not typically have any indication that the sender is
composing a message until it is sent and received;10 thus, it is impossible for the receiver to
interrupt or otherwise engage simultaneously with the sender’s message. Cherny (1999) terms this
transmission “one-way”; most CMC systems in current use make use of one-way transmission.
the receiver are able to see the message as it is produced, making it possible for the receiver to
give simultaneous feedback. In two-way CMC systems, participants’ screens split into two
(sometimes more) parts, and the words of each participant appear keystroke-by-keystroke in their
respective parts as they are typed. Examples of two-way synchronous CMC include the VAX
“phone” protocol studied by Anderson, Beard and Walther (forthcoming), UNIX “talk”, and the
split-screen mode of ICQ (Herring 2002). Anderson, Beard, and Walther (forthcoming) have
observed that two-way transmission can profoundly alter the structure of turn taking.
9
CMC systems of intermediate synchronicity also exist; for example, Babble (Erickson et al. 1999), an experimental
chat-like system with a scroll-back log that persists for days, allows users who missed real-time messages to read
them later. Instant messaging clients similarly blur the boundary by allowing users to read messages sent while they
were away from their computer upon their return, as long as their IM client remains open.
10
An exception is instant messaging systems that indicate that a participant is typing a message, without yet
displaying what is being typed.
“Persistence of transcript” refers to how long, relatively speaking, messages remain on the
system after they are received. Email is persistent by default, remaining in users’ mail queues or
files until deleted by the users. Moreover, many listservs archive email messages sent to
discussion lists, and messages posted to Usenet newsgroups have been archived since 1995 (first
by dejanews.com, and since 2000, by Google). In contrast, most chat systems retain only a few
screens of messages in their scrollback buffer, with old messages eventually disappearing as they
are replaced by new ones. Even the messages in the buffer disappear when the user ends a chat
session, unless he or she has chosen to log the interaction. Thus, chat is relatively ephemeral
compared to email, but it is more persistent than spoken conversation, in that one’s typed words
linger before they scroll out of sight. The overall greater persistence of CMD heightens meta-
linguistic awareness: It allows users to reflect on their communication – and play with language –
in ways that would be difficult in speech. It also allows them to keep track of, and participate in,
“Size of message buffer” refers to the number of characters the system allows in a single
message. In most email-based systems, the buffer is effectively limitless – or at least, it is larger
than practical limits on how long most people are willing to type and others are willing to read.
Many chat systems, however, impose limits on message size, and text messaging systems on
mobile telephones limit users to 160 characters per message. Condon and Cech (2001) found that
smaller buffers often mean shorter messages and different discourse organizational strategies (see
also Baron forthcoming); small buffers also increase the likelihood that language will be
into account how many and what kinds of “channels of communication” a CMC system makes
available. Visual channels in addition to text include graphics (static or animated) and video;
videoconferencing systems (such as CUseeMe and audiochat; Chou 1999) provide an audio
channel as well. Herring, Martinson, and Scheckler (2002) found that the presence and content of
video images affected the amount and gender distribution of discourse on an educational website.
makes use of audio (and sometimes video) channels and could be classified as CMD using the
proposed scheme.
“Anonymous messaging”, “private messaging”, “filtering”, and “quoting” all refer here to
technological affordances of CMC systems. It is possible for users to engage in these behaviors
without any special technical means, but when such means are available, they facilitate the
behaviors, presumably making them more likely to occur. Thus, many chat systems require a user
to select a nickname that is different from his or her email address, encouraging the use of
pseudonyms and anonymous interaction (Danet 1998). Some Web-based discussion forums have
registration procedures that do not verify users’ email addresses, encouraging users to make them
up. Anonymity has been found to have important effects in online discourse, including increased
self-disclosure (Kiesler et al. 1984), antisocial behavior (Donath 1999), and play with identity
(Danet 1998).
Similarly, some chat systems (such as IRC and MUDs) have commands that enable users
to carry on private as well as public conversations, while with other systems (such as some forms
of Web chat), it is necessary to open a separate program (such as an instant messaging client) to
converse privately. Along the same lines, a user can always choose to ignore messages from
another user, but a number of CMC systems make this easier by providing technical mechanisms
to filter out such messages (known variously as “kill files”, “gag” commands, etc.). CMC systems
also differ in the extent to which they provide mechanisms to facilitate the quoting of a portion of
a previous message in a response. Some email clients provide the text of the message being
replied to in the new message, as a default. In others, one must copy and paste in the quoted
Eklundh forthcoming) has observed that this can affect the extent and manner in which quoting is
used.
Finally, “message format” determines the order in which messages appear, what
information is appended automatically to each and how it is visually presented, and what happens
when the viewing window becomes filled with messages. Most CMC systems add new messages
to the bottom of a list in the order received by the system, although this is not true of blogs (which
add the newest message on the top), wikis (which allow users to choose where their content will
be inserted), or some experimental systems. Herring (1999) has observed that systems that post
messages in the order in which they are received – which is to say most chat and discussion
forums – result in disrupted turn adjacency and interleaved exchanges. The information provided
in message headers (as in email) and leaders (as in chat systems) has been found to affect online
self-reference and addressivity practices (Herring 1996; Werry 1996). Scrolling direction
determines which messages are on the “top of the deck” and hence more likely to receive a
response.
The list of medium factors in table 1 is open-ended. It is expected that some factors will be
added, others further sub-divided, and others perhaps omitted as new systems are developed and
mediated discourse (cf. Baym 1995) as in spoken discourse (cf. Hymes 1974). The set of features
summarized in table 3 incorporates elements from Hymes’ SPEAKING mnemonic (see figure 1)
and factors identified by Baym (1995), along with additional factors found in empirical CMD
research to affect online language use. As with the medium factors, this list is not presumed to be
exhaustive.
S5 Tone • Serious/playful
• Formal/casual
• Contentious/friendly
• Cooperative/sarcastic, etc.
S7 Norms • Of organization
• Of social appropriateness
• Of language
situation (both actual, i.e., actively participating, and potential); the amount and rate of
anonymously/pseudonymously as opposed to in their “real life” identities11 (Myers 1987); and the
structure has implications for, among other things, politeness: public CMD tends to be less polite
than private CMD (Herring 2002), and individuals who post anonymously tend to “flame” more
than individuals who post in their offline identities (cf. Donath 1999).
well as the real life knowledge, norms, and interactional patterns they bring to bear when they
engage with others online (Baym 1995). For example, participant gender has been found to affect
behavior related to politeness and contentiousness within a social MUD (Cherny 1994) in two
otherwise similar academic discussion lists (Herring 1996) and in a mostly-female Usenet
newsgroup devoted to television soap operas as compared with norms of interaction elsewhere on
Usenet (Baym 1996). Participants’ attitudes, beliefs, ideologies, and motivations relevant to their
11
This value should be assigned independently of how easy or difficult the system makes sending anonymous
messages or using pseudonyms. Assuming that the medium does not preclude such choices, this value encodes the
extent to which users in a particular discourse sample make use of them.
online communication may also affect what they choose to communicate and how. Participants
with ideological differences may be more likely to become involved in conflict discourse, as, for
racism.
“Purpose” is potentially relevant on two levels: “Group purpose” refers in general terms to
a computer-mediated group’s official raison d’être (professional, social, etc.), while “goals of
interaction” are what individual participants hope to accomplish through any given interaction;
these need not, of course, be the same for any two individuals in the same interaction. Even when
the same technologies are used, CMD can vary according to purpose; for example, Herring and
Nix (1997) found differences in topics discussed as well as strategies for topic development in
impressing others with one's intellectual acumen); each activity has associated conventional
linguistic practices that signal when that activity is taking place (cf. “contextualization cues”,
Gumperz 1982). Many studies have noted the existence of computer-mediated contextualization
cues, ranging from emoticons to user IDs (Bechar-Israeli 1995; Danet et al. 1997; Heisler &
Crabill 2006; Herring 2001), that help to signal “what is going on” in online interaction. Flaming,
or the exchange of hostile message content, also has characteristic syntactic and semantic
structures that distinguish it from other computer-mediated activity types (Spertus 1997).
“Topic” at the group level indicates, within broad parameters, what discussion content is
appropriate in that context, according to the group’s definition. Some CMC modes not conceived
as discussion forums but rather as role-playing environments, such as adventure MUDs, may have
contrast, topic at the exchange level is what participants are actually talking about in any given
interaction; this may or may not be on the “official” topic of the group. Distinctions of topic are
important in analyzing topical digression, which has been claimed to be a characteristic of multi-
“Tone” refers to the manner or spirit in which discursive acts are performed (cf. Hymes’
“key”); it can be described along a number of continuous scalar dimensions, including (but not
participant differently than do participants in friendly CMD. Emoticons similarly take on different
pragmatic meanings depending on the tone of an exchange, which they may also help to establish
(Huls 2006).
protocols having to do with how a group is formed (if applicable), how new members are
admitted, whether it has a leader, moderator, or other persons whose role it is to perform official
functions, how messages are distributed and stored (if this is determined by social convention
rather than by the system software), how participants who misbehave are punished, etc. “Norms
of social appropriateness” refer to the behavioral standards that normatively apply in the
computer-mediated context (cf. Hymes’ “norms of interaction”); they may be implicit or written
and publicly available, for example in the form of “netiquette” guidelines (Shea 1994) or lists of
newsgroup, but rudeness may be expected and approved of in the newsgroup alt.flame, which is
users; these may include abbreviations, acronyms, insider jokes, and special discourse genres
interactions are carried out. Although English is still the most common language on the Internet,
and most CMC research has been carried out on English data, this situation is changing rapidly as
more non-English-speaking countries gain Internet access (Danet & Herring 2007). “Language
variety” includes the dialect, and where applicable, the register of language used. The default
dialect is the standard, educated, written variety of the language, although regional, social class or
ethnic dialects may sometimes be used (Androutsopoulos & Ziegler 2004). Register refers here to
specialized sub-languages associated with conventional social roles and contexts (such as
academic discourse, psychotherapeutic discourse, teacher talk); one may also identify an
unmarked register, ordinary conversation, associated with the role of the “everyday” self. Choice
of linguistic code in multilingual computer-mediated groups has been observed to serve different
Relatedy, “writing system” refers to the font used and its relationship to the writing system
of the language: Does the communication make use of a font (such as ASCII text) based on the
Roman alphabet (e.g., for languages such as English, Spanish, and French); does it transliterate a
non-roman writing system (such as those of Arabic and Greek) into Roman letters/ASCII
(Berjaoui 2001; Tseliga 2007); or are special non-ASCII fonts used (such as those available for
Japanese, Chinese, and Korean) to represent a non-Roman writing system? Since the introduction
of the Unicode character encoding standard (see Danet & Herring 2007), it has become easier to
transmit a variety of languages in their native scripts via the Internet, but transliteration into
roman letters persists in some contexts, and script choice may serve different pragmatic functions
Although in principle the eight situation dimensions in table 2 are independent of one
another, in practice, they tend to combine in predictable ways. This is easiest to see when the
classification scheme is applied to familiar CMC modes. For example, discourse in Internet Relay
Chat typically is many-to-many, has a high degree of anonymity (participants use pseudonyms), is
social in function and non-serious in tone, contains a high incidence of flirting and phatic (empty,
social) exchanges, and appears to be engaged in most often by young people between the ages of
18 and 25 (Danet et al. 1997; Reid 1991; Werry 1996). In contrast, discourse in an academic
discussion list is more likely to serve professional purposes, have a serious tone, contain debates
and job announcements, and be engaged in by older, professionally established users (Grüber
1998, 2000; Herring 1992, 1996; Hert 1997). Furthermore, medium factors may correlate with
situation factors; all other things being equal, for example, synchronous CMD is more likely to be
informal in register and playful in tone than is asynchronous CMD (Herring 2001).
However, it is important to note that there are also circumstances under which these
associations do not hold. The classification scheme presented above, because it does not presume
any necessary relationships among features of situational context or between medium and
5. Sample classification
While it is beyond the scope of this article to test the proposed classification scheme formally, a
brief illustration of its application to two samples of CMD may provide a glimpse of the utility of
the scheme. One sample is from a well-known, popular source, and the other from a closed-
access, privately-developed system; both have been analyzed by the author in separate studies,
Both samples are exemplars of the sociotechnical mode “weblogs” (blogs), broadly
construed. Blogs have been characterized as a genre of CMC (Herring et al. 2004; Miller &
Shepherd 2004), although subtypes such as diary and filter blog have also been identified that
manifest distinct patterns of linguistic usage (Herring & Paolillo 2006). In the comparison
12
The LiveJournal data were collected as part of the project reported in Herring et al. (2007), and a preliminary
analysis of the Quest Atlantis blog data is reported in Herring, de Siqueira, Stuckey & Kouper (in review).
Language@Internet 1/2007 (http://www.languageatinternet.de, urn:nbn:de:0009-7-7611, ISSN 1860-2029)
24 SUSAN HERRING
described below, however, it is not sufficient to distinguish subtypes, since one sample is
The first sample is from the popular blog-hosting service LiveJournal.com, which claims to
have hosted over 11.9 million blogs since its inception in 1999. The second sample is from Quest
Atlantis, a game-like online learning environment for children 9-12 years old that was developed
in 2002 by researchers at the author’s institution (Barab et al. 2005), and that has been used by
several thousand children to date, mostly in the United States, Australia, and Singapore, under the
supervision of their classroom teachers. Quest Atlantis (QA) includes blogs as one of several types
of CMC available to its young users. Specifically, our QA sample comes from a blog maintained
by a fictional Atlantian girl, Alim (in reality, an adult female QA researcher), who posts entries on
the theme of “personal agency” for children on Earth; the children post comments in response.
In order to make our samples as comparable as possible, let us consider the LJ of a young,
English-speaking woman. Moreover, although both sources make available data extending over a
period of more than two years, let us further delimit each sample to two months of continuous
activity in spring 2006. The exact time and size of the samples are not important for the purpose
of this illustration, but a multi-message sample is necessary in order to obtain a sense for how
Not suprisingly, since both are known by the genre label “blog”, these two samples share
many medium features. These include asynchronicity (M1); 1-way message transmission (M2);
persistence of messages in archives linked from the sidebar of the blog (M3); Web-based delivery
and a tendency for messages to be text only (M5); and the display of blog entries in reverse
chronological sequence with a “comment” option below each entry (M10). These might be
considered definitional characteristics of the blog genre (see also Herring et al. 2004).
However, the two samples have few situation variables in common, aside from a one-to-
many participation structure and imbalanced participation13 (S1), which are characteristic of blog
discourse in general (Herring et al. 2004). Holding blog author gender (S2) and use of the English
language (S8) constant does not result in any other associated similarities between the two
samples.
In contrast, differences can be observed along both the medium and the situation
dimensions. Whereas LJ allows anyone to create a blog from a made-up name (as our sample LJ
blogger has done), anonymity is impossible in the QA blogs, since all users must register through
their classroom teachers (M6). LJs are publicly available on the Web unless designated as “friends
only” (our sample is not so designated), whereas QA activity is closed to the public (M7). There
are also differences in message format (M10) – the LJ interface is more sophisticated, providing
users with more options (such as “friends” links and a “search” feature) and greater social
translucence (Erickson et al. 1999), such as an indication of the number of comments that have
The number of differences in situation between the two samples is also great. Group size,
construed as the potential audience of each blog, varies widely as a consequence of the
public/private nature of each blog; rate of participation is also slower on the QA blog, and posting
rights are asymmetrical (S1) – only “Alim” can post entries. In the LJ, only the blog owner can
post in her own blog, but commenters all have their own blogs, so everyone has a chance to both
post and comment. Age, roles, previous experience, and the relationships among participants also
differ between the two samples (S2), as does the purpose of each blog (S3), its topic/theme (S5),
the tone of messages and comments (S6), and the norms of interaction and norms of language use
in LJ versus QA (S7).
13
Blog owners post more and longer messages than do visitors to the blog, who typically may only post comments on
the owner's entries.
Language@Internet 1/2007 (http://www.languageatinternet.de, urn:nbn:de:0009-7-7611, ISSN 1860-2029)
26 SUSAN HERRING
The LJ blogger is an experienced, adult Internet user who posts messages about her day-to-
day life to friends and strangers in a tone that aims for cleverness and sophistication, and where
the norms of interaction include profanity and sexual references. In these respects, the LJ sample
is typical of many LJ blogs (cf. Kendall 2005). The considerable contrast between these two
samples reflects QA’s young, inexperienced target audience and its educational context, which is
closely moderated by adults, and which assigns asymmetrical posting rights to adults and children.
These are not prototypical blog features, although the QA blogs recall other uses of CMC in
Clearly, simply classifying these samples as being of the blog mode or genre, while it
would capture more-or-less predictable associations for LiveJournal, would miss much about the
QA data that is interesting and important. Moreover, the LJ data also exhibit characteristic
properties that differentiate them from the blog prototype (cf. Herring et al. 2004), such as the
“'friends only” audience designation feature and “mood” indicators for entries. A faceted
classification approach is thus revealing for LJ blogs as well, and more generally, is essential (in
6. Conclusions
As the Internet expands, it continues to spawn new varieties of discourse that call out for analysis
and classification. This article has proposed, argued for, and briefly illustrated the utility of a
faceted classification scheme for computer-mediated discourse. This scheme classifies discourse
(including overlap across samples) and allowing for focused comparisons within and across
samples.
CMD. Mode classification is especially useful for identifying and invoking prototypical
associations of CMD data of a type that is generally known, such as email, discussion lists, and
IRC; it also captures cultural information that cannot be predicted solely from the component
dimensions of the scheme. However, mode classification is less useful for proprietary or novel
examples of online discourse, such as the Quest Atlantis blogs or the quasi-synchronous “Babble”
chat system developed by Erickson et al. (1999), which do not evoke prototypical associations
except in the minds of users who happen to know the systems. Faceted classification is more
At the same time, the classification scheme presented here has several limitations. First, it
can seem verbose (a “list” of terms) and difficult to condense due to its relatively non-hierarchical
Ranganathan (1933), in which only the most important features of a data set (as determined by the
goals of the research) are selected for characterization, is recommended to help address this
problem.
A second limitation is that the scheme is based primarily on research findings for textual
CMC. It is important, but ultimately not sufficient, to note that multimedia CMC makes use of
classificatory challenges. What are the criteria for identifying types of multiplayer online game
discourse, for example? What are the relevant dimensions that condition variation in video- and
audio-mediated communication? What about in CMD where participants can speak, text chat, and
manipulate a common interface (such as a whiteboard) at the same time? It will be essential to
A more general limitation is that the scheme is not in itself a contribution to a theory of
genre, but is rather a preliminary aggregation of factors that will have to find a place in a theory
of CMD genres. Theoretical questions remain to be addressed concerning the organization and
relationships among the features of the scheme. Conversely, it is conceivable that empirical
investigation of feature co-occurrence patterns based on this descriptive scheme could lead to the
identification of a smaller set of CMC prototypes. If so, these could be compared with genres
already posited for Internet communication (cf. Giltrow & Stein in preparation), lending them an
empirical underpinning. Investigation of this possibility and theoretical development of the scheme
Finally, Hymes cautions that “an ‘etic’ account, however useful as a preliminary grid and
input to an emic (structural) account, or as a framework for comparing different emic accounts,
lacks the emic account’s validity” (1974: 11). Simple descriptive classification should be
supplemented by ethnographic observation of online discourse communities over time, and should
ideally be validated by members of those communities, in order to provide the richest possible
References
Anderson, Jeffery F., Fred K. Beard, & Joseph B. Walther (forthcoming). The local management
Androutsopoulos, Jannis & Volker Hinnenkamp (2001). Code-switching in der bilingualen Chat-
Kommunikation: ein explorativer Blick auf #hellas und #turks. In Beisswenger, Michael (ed.).
367-401.
Androutsopoulos, Jannis & Evelyn Ziegler (2004). Exploring language variation on the Internet:
Eklund, Staffan Fridell, Lise H. Hansen, Angela Karstadt et al. (eds.) Language Variation in
Europe: Papers from the Second International Conference on Language Variation in Europe,
Anis, Jacques (2007). Neography: Unconventional spelling in French SMS text messages. In
Danet, Brenda & Susan C. Herring (eds.) The multilingual Internet: Language, culture, and
Barab, Sasha A., Michael Thomas, Tyler Dodge, Robert Carteaux, & Hakan Tuzun (2005).
Making learning fun: Quest Atlantis, a game without guns. Educational Technology Research
Baron, Naomi (forthcoming). Discourse structures in instant messaging: The case of utterance
Bechar-Israeli, Haya (1995). From (Bonehead) to (cLoNehEAd): Nicknames, play and identity on
http://jcmc.indiana.edu/vol1/issue2/bechar.html.
Berjaoui, Nasser (2001). Aspects of the Moroccan Arabic orthography with preliminary insights
Biber, Douglas (1988). Variation in speech and writing. Cambridge, UK: Cambridge University
Press.
Broughton, Vanda & Heather Lane (2000). Classification schemes revisited: Applications to Web
Chafe, Wallace L. & Jane Danielewicz (1987). Properties of spoken and written language. In
Horowitz, Rosalind & S. Jay Samuels (eds.) Comprehending oral and written language. New
Cherny, Lynn (1994). Gender differences in text-based virtual reality. In Bucholtz, Mary, Anita
Liang, & Laurel Sutton (eds.) Cultural Performances: Proceedings of the Third Berkeley
Women and Language Conference. Berkeley: Berkeley Women and Language Group.
Cherny, Lynn (1999). Conversation and community: Chat in a virtual world. Stanford, CA: Center
Chou, Candace C. (1999). From simple chat to virtual reality: Formative evaluation for
http://www2.hawaii.edu/~cchou/ppdla99/index.htm.
Condon, Sherri L. & Claude G. Cech (1996). Discourse management strategies in face-to-face and
http://www.cios.org/www/ejc/v6n396.htm.
Condon, Sherri L. & Claude G. Cech (2001). Profiling turns in interaction. Proceedings of the
34th Annual Conference of the Hawaii International Conference on System Sciences. Los
Crystal, David (2001). Language and the Internet. Cambridge, UK: Cambridge University Press.
Danet, Brenda (1998). Text as mask: Gender, play and performance on the Internet. In Jones,
Danet, Brenda & Herring, Susan C. (2007). Multilingualism on the Internet. In Hollinger, Marlis
& Anne Pauwels (eds.) Language and communication: Diversity and change. Handbook of
Danet, Brenda, Lucia Ruedenberg & Yehudit Rosenbaum-Tamari (1997). “Hmmm … Where’s
that smoke coming from?” Writing, play and performance on Internet Relay Chat. In Rafaeli,
Sheizaf, Fay Sudweeks & Margaret McLaughlin (eds.) Network and netplay: Virtual groups on
Donath, Judith (1999). Identity and deception in the virtual community. In Smith, Marc A. &
Dooley, Robert A. & Stephen H. Levinsohn (2001). Analyzing discourse: A manual of basic
Erickson, Thomas, David N. Smith, Wendy A. Kellogg, Mark R. Laff, John T. Richards, & Erin
Bradner (1999). Socially translucent systems:: Social proxies, persistent conversation, and the
design of ‘Babble’. In Human Factors in Computing Systems: Proceedings of CHI ‘99. ACM
Press.
Ferrara, Kathleen, Hans Brunner & Greg Whittemore (1991). Interactive written discourse as an
Giltrow, Janet & Dieter Stein (eds.) (in preparation). Theories of genre and their application to
Internet communication.
Grüber, Helmut (2000). Scholarly email discussion postings: A single new genre of academic
communication? In Pemberton, Lyn & Simon Shurville (eds.) Words on the Web: Computer-
Heisler, Jennifer & Scott Crabill (2006). Who are “stinkybug” and “packerfan4”? Email
http://jcmc.indiana.edu/vol12/issue1/heisler.html.
ED345552.
Herring, Susan C. (1996). Two variants of an electronic message schema. In Herring, Susan C.
(ed.). 81-106.
& Heidi Hamilton (eds.) Handbook of discourse analysis. Oxford: Blackwell. 612-634.
online behavior. In Barab, Sasha A., Rob Kling & James H. Gray (eds.) Designing for virtual
communities in the service of learning. New York: Cambridge University Press. 338-376.
Herring, Susan C. (2004b). Slouching toward the ordinary: Current trends in computer-mediated
Press.
Herring, Susan C., Amaury de Siqueira, Bronwyn Stuckey & Inna Kouper (in review).
Herring, Susan C., Anna Martinson & Rebecca Scheckler (2002). Designing for community: The
effects of gender representation in videos on a Web site. Proceedings of the 35th Hawaii
Herring, Susan C. & Carole G. Nix (1997). Is ‘serious chat’ an oxymoron? Academic vs. social
uses of Internet Relay Chat. Paper presented at the American Association of Applied
Herring, Susan C. & John C. Paolillo (2006). Gender and genre variation in weblogs. Journal of
Herring, Susan C., John C. Paolillo, Irene Ramos-Vielba, Inna Kouper, Elijah Wright, Sharon
Stoerger, Lois Ann Scheidt & Benjamin Clark (2007). Language networks on LiveJournal.
Herring, Susan C., Lois Ann Scheidt, Sabrina Bonus & Elijah Wright (2004). Bridging the gap: A
genre analysis of weblogs. Proceedings of the 37th Hawai'i International Conference on System
Hert, Philippe (1997). Social dynamics of an on-line scholarly debate. The Information Society
13: 329-360.
Kendall, Lori (2005). Diary of a networked individual: System design’s effects on online
relationships. In Consalvo, Mia (ed.) Internet research annual. New York: Peter Lang. 41-50.
Kiesler, Sara, Jane Siegel & Timothy W. McGuire (1984). Social psychological aspects of
http://www.cios.org/www/ejc/v6n396.htm.
Longacre, Robert (1996). Typology and salience. The grammar of discourse, 2nd edition. New
Maingueneau, Dominique (2002). Analysis of an academic genre. Discourse Studies 4 (3): 319-
342.
Markus, M. Lynne (1994). Finding a happy medium: Explaining the negative effects of electronic
communication on social life at work. ACM Transactions on Information Systems 12(2): 119-
149.
Maynor, Natalie (1994). The language of electronic mail: Written speech? In Montgomery,
Michael & Greta D. Little (eds.) Centennial usage studies. Publications of the American Dialect
Society Series. Tuscaloosa : Published for the Society by the University of Alabama Press.
Miller, Caroline R. & Dawn Shepherd (2004). Blogging as social action: A genre analysis of the
weblog. In Gurak, Laura J., Smiljana Antonijevic, Laurie Johnson, Clancy Ratliff & Jessica
Reyman (eds.) Into the Blogosphere: Rhetoric, Community, and Culture of Weblogs.
http://blog.lib.umn.edu/blogosphere/blogging_as_social_action_a_genre_analysis_of_the_weblog.ht
ml
Murray, Denise E. (1988). The context of oral and written language: A framework for mode and
Myers, David (1987). ‘Anonymity is part of the magic’: Individual manipulation of computer-
Nevins, M. Eleanor (2004). Learning to listen: Confronting two meanings of language loss in the
14(2): 269.
Paolillo, John C. (forthcoming). Conversational codeswitching on Usenet and Internet Relay Chat.
Association.
Reid, Elizabeth M. (1991). Electropolis: Communication and community on Internet Relay Chat.
http://www.aluluei.com/.
Rice, Ron & Urs E. Gattiker (2000). New media and organizational structuring. In Jablin, Fredric
& Linda L. Putnam (eds.) The new handbook of organizational communication. Thousand
Robertson, Judy, Judith Good & Helen Pain (1998). BetterBlether: The design and evaluation of a
219-236.
Rowe, Charley (forthcoming). Genesis and evolution of an e-mail-driven sibling code. In Herring,
Susan C. (ed.).
A Study of Letters in the COM System. Linköping Studies in Arts and Science 6. Department
Severinson Eklundh, Kersten (forthcoming). To quote or not to quote: Setting the context for
Severinson Eklundh, Kersten & Clare Macdonald (1994). The use of quoting to preserve context
202.
Swales, John (1990). Genre analysis: English in academic and research settings. Cambridge:
Tseliga, Theodora (2007). “It’s all Greeklish to me!”: Linguistic and sociocultural perspectives on
Tudhope, Douglas, Ceri Binding, Dorothee Blocks & Daniel Cunliffe (2002). Representation and
Vickery, Brian C. (1960). Faceted classification: A guide to construction and use of special
Virtanen, Tuija (1992). Issues of text typology: Narrative – a ‘basic’ type of text? Text 12(2): 293-
310.
Werry, Christopher C. (1996). Linguistic and interactional features of Internet Relay Chat. In
Yates, Simeon J. (1996). Oral and written linguistic aspects of computer conferencing. In Herring,