Teaching_language_variation_using_Italia
Teaching_language_variation_using_Italia
Abstract
This paper aims at evaluating reference corpora of spoken and written Italian as tools for
teaching variation to learners of Italian as a second language. Specifically, some of the main
Italian reference corpora will be compared, focusing on their classification of varieties and
their potential as sources of information on register, domains, genre, and text types.
1. Introduction
The aim of this paper is to evaluate reference corpora of spoken and written Italian as tools
for teaching variation to learners of Italian as a second language. Language variation is
traditionally one of the most crucial and complex abilities that learners are expected to
acquire: aspects such as genre, register, text type, domain, topic, activity type, sublanguage
and style are key in developing language awareness. In corpus design, there is a wide variety
of approaches to capturing variation, a variety reflected in the possibilities offered by different
query systems. What kind of information about linguistic varieties is encoded in Italian
reference corpora? And how can learners acquire knowledge about variation from the
available electronic resources, either autonomously or in guided classroom activities?
The fruitfulness of corpora in language teaching has been widely stressed in the literature, be
it in a general didactic environment (Sinclair 2004; Gavioli 2005; Scott & Tribble 2006), in
translation studies (Bernardini & Zanettin 2000; Botley et al. 2000; Wang 2000; Zanettin et
al. 2003), in classroom work (Johns 1991a; Partington 1998; Godwin-Jones 2001; Timmis
2003), and in learner corpus exploitation (Granger et al. 2000; Granger et al. 2002; Tono
2002). The advantages for the learner include easy access to authentic texts, autonomous
management of the learning process in a data-driven framework (Johns 1991b, c; Wible et al.
2002), and stimulation of metalinguistic reflection. For the teacher they include the possibility
to obtain real examples of language use, assistance in syllabus and materials design, and other
functions (Kahn 1985; Szendeffy 2005).
The interaction between language teaching using corpora and the need to develop
understanding of language variation has been addressed by a number of scholars. The general
framework proposed by Biber (1988, 1995), being both theoretical and methodological, has
led to considerable discussion of the relationship between language variation and corpora.
Biber’s attention has generally been focused on exploiting corpora for variation studies (Biber
& Finegan 1991; Biber 1992, 1993a, b; Biber et al. 1998), and he attempts to clarify the
terminological jungle that characterises this area by associating the term genre and register to
“situationally defined text categories (such as fiction, sports, broadcasts, psychology
1
articles)”, and text type to “linguistically defined text categories” (1993a: 244-5).1 Biber
argues that precedence should be given to external criteria in corpus design, using genre and
register as the main principles in distinguishing subcorpora. But reference corpora of major
languages have extremely dissimilar designs, and exhibit different capabilities for analysing
language varieties. Not only are registers differently defined and represented (qualitatively
and quantitatively), but the possibility to retrieve data about variation varies in different
corpus querying systems.
Once the corpus has been designed and constructed, other questions arise as to the possibility
of investigating different aspects of variation. Since corpora cannot be said to be
representative of a single genre tout court – we can always think of features that are only
marginally present in the corpus because they are rare, because they refer to a level of analysis
which was not foreseen in the corpus design, or simply because they show marginal
tendencies – we must always take into account the limits of data retrieval from different
corpora, and from different subcorpora within a corpus. For the study of language variation,
especially directed towards language learning, the obvious sources are reference corpora. A
reference corpus has been defined as
Reference corpora are not the only source of such data, but constitute, by definition, the most
comprehensive portrait of a language’s main varieties.
Variation must play a role in the language learning process, since it is a key aspect of
communicative competence (Hymes 1971; Spolsky 1989) But a decision has to be made
whether to adopt a genre-driven or a text-type-driven approach. If we accept Biber and
Finegan’s claim that “linguistic variation is generally conditioned by some combination of
social, situational, discourse and processing characteristics” (1991: 216), then we will tend to
prefer a functional approach that will give precedence to genre/register in presenting variation
to language learners. This seems a sensible way to move if we wish to maintain a
communicative methodology in teaching.
1
While the term register is commonly used as a “general cover term associated with all aspects of variation in
use“ (Biber 1995: 9), it is most closely connected to linguistic patterns and choices instantiated by specific
genres (Lee 2001).
2
possible to detect variation through comparison. Features which are to be considered specific
to a genre or a text type can only be detected through the observation of diversity or of
different degrees of similarity. When it comes to analysing variation using corpora, the first
condition is the possibility to keep different genre labels apart.
Reference corpora are becoming widely used for teaching purposes and a number of
theoretical investigations of their potential for teaching variation have been produced
(Hunston 2002; Conrad 2004). Maximum coverage of variation is considered a prerequisite of
reference corpora aiming at representativeness, register-diversity and balance among genres.
So if we want to look at variation for teaching purposes, reference corpora should be the ideal
tool. In this section I compare some of the main Italian reference corpora, focusing on their
classification of varieties and their potential as sources of information on register, domains,
genre, and text types. I will also look at examples of how variation can be observed, and how
students can examine the main characteristics of register diversification identified by Biber
(1993b: 221): “individual linguistic features are distributed differently across registers, and
[…] the same (or similar) linguistic features can have different functions in different
registers”.
Among the corpora of written Italian developed during the past decades (Chiari 2005), some
can be considered reference corpora in their design and scope. Excluding some which are no
longer available, such as LIF (Bortolini et al. 1971), I will focus on two of these: the large
100-million word COrpus di Riferimento dell'Italiano Scritto, 1998-2001 (CORIS/CODIS:
Rossini Favretti 2000; Rossini Favretti et al. 2002), and the smaller 4-million word Corpus e
Lessico di Frequenza dell'Italiano Scritto (COLFIS: Laudanna et al. 1995).
3.1.1. CORIS/CODIS
Developed by the Centre for theoretical and applied linguistics of the University of Bologna
(CILTA), this is currently the largest publicly available balanced corpus of written Italian. It
exists in two forms: a static one (COrpus di Riferimento dell'Italiano Scritto (CORIS)), and a
dynamic one (COrpus Dinamico dell'Italiano Scritto (CODIS)). The latter is a monitor corpus
updated every two years, modelled on the Bank of English. The corpus is divided into
subcorpora of traditional genres: press, fiction, administrative and legal prose, academic
prose, miscellanea and ephemera. These subcorpora are divided into sections and subsections
based on external parameters (see Table 1), including both national and local varieties of the
language. The hierarchical structuring of sections and subsections involves a mixture of
subgenres and domains. Thus the fiction subcorpus contains crime, adventure, and science-
3
fiction, along with women’s literature, literature for adults, and literature for children.
Academic prose includes books, reviews, popular history, philosophy, arts, literary criticism,
law, economy, biology, etc.. Other subcorpora seem to be organized on the basis of
differences in readers and purposes.
The only tool available to access the corpus is concordancing over the web
(http://corpora.dslo.unibo.it). The dynamic version allows the user to choose which
subcorpora to include, and how large each should be (seeFig. 1), adapting to different needs
and working hypotheses The query interface admits single word forms, specific sequences of
word forms (AND/OR logical operator), sequences at a distance (which retrieves two words
with up to a given number of words in between), and returns the total number of hits
satisfying the query, displaying a maximum of 300. Collocational values using mutual
information, t-score or raw frequency are optionally shown. No grammatical annotation is
provided to the end user, and at the moment the public online version of the corpus is not
lemmatized (accessed May 2008). The extremely synthetic online help seems to suggest the
presence of grammatical annotation, though no explanation of the tagset and query language
is given . A major limitation is the impossibility of exploring entire texts (probably due to
copyright issues), or of restricting queries to specific sections and subsections within the
subcorpora. Distributions over subcorpora, sections and subsections are not given.
4
Fig. 1. CODIS query form
We can search for specific wordforms or sequences, such as praticamente or a lui gli piace,
and obtain the actual number of hits found (309 and 2 respectively), a concordance display of
up to 300 occurrences with a maximum of 160 characters of context, and an indication of
which subcorpus each line comes from (see Fig. 2). If the total number of hits is over 300 but
not too large, we can obtain all the concordance lines by doing separate queries for each
subcorpus. But when faced, say, with the 2,232 hits for the form diciamo (we say/let’s say),
we cannot analyse all the hits in order to distinguish uses as an ordinary verb from those as a
discourse marker. No hints on the use of searches for lemmas is given in the help file or in the
documentation. Information about register variation is only obtainable via separate queries in
different subcorpora.
5
Fig. 2. CODIS concordance: praticamente
COLFIS (Laudanna et al. 1995) is a recently released frequency list of Italian words derived
from a 3,798,275 word corpus of written texts collected in the early ‘90s. While the frequency
list is fully available (in a variety of different formats), the corpus at the moment is in
prototype form, and queries can only be performed on those texts authorized by the copyright
holders (http://www.ge.ilc.cnr.it/strumenti.php). COLFIS is an extremely interesting design,
being based on reading statistics for a sample of the Italian population from 11 years of age,
processed by the National Institute of Statistics (ISTAT). The relative proportions of texts
included in the corpus were based on 1992-94 data for newspapers, magazines and books. In
Biber’s terms (1993a:245) the corpus is designed on text reception principles, aiming to
represent language use from a demographic perspective.
2
A popular subgenre of comics in which the story is told not by drawings, but by photos.
6
For each of the three national newspapers sampled, there are 9 canonical categories of texts
(corresponding to sections in the newspapers). Magazines are divided into 12 categories using
a mixture of genre, domain, and text-type criteria. Books are distributed into 13 categories,
again with mixed genre and domain criteria.
The authorized portion of the corpus can be concordanced on the web, using either the raw
text or the lemmatized version. The three subcorpora can be searched individually, but not
subsections of these. The raw corpus has a very intuitive interface (see Fig. 3). As there is no
query syntax, only word types and word sequences can be searched for. The output is a
tabular concordance showing the hit paragraph with information on subcorpus, subsection,
newspaper or magazine source, author, title, publisher, year, and date. Context length cannot
be varied.
A similar query window is available for the lemmatized search, producing a similar output
with POS tagging (12 categories) included, but only a sentence of context.
7
Most queries performed with CODIS can be performed in COLFIS, except that the latter does
not support wildcards. Frequency information is not provided in the concordance tables, but
can be obtained through a specific search mask giving frequencies, relative frequencies and
dispersion values for all the wordforms and lemmas in the corpus. The general frequency lists
can be downloaded, while results of single queries are not exportable. As with CODIS, access
to the full text is not permitted.
If looking for specific words or sequences, COLFIS can be used to determine distribution
across the main subcorpora, but morphological and syntactic patterns cannot be investigated.
The impossibility of querying specific subsections significantly reduces the suitability of the
corpus for studying variation.
Many researchers have favoured the use of spoken language corpora in classroom work.
Cresti (2007) notes how artificially-constructed materials differ radically from natural
conversations performed on the same tasks, and how the latter present far greater diaphasic
variability. While for formal written language, real-life uses may be relatively similar to those
presented to the learner in the classroom, the distance between the speech of teaching
materials and that of real life communication is enormous, making the presentation of spoken
language variation a key task for teaching.
If, as we have seen, written corpora can vary considerably in their coverage of language
variation due to an ambiguous distinction between external and internal factors, for speech
corpora the obstacles are multiplied by the absence of a shared classification of genres and
registers.
Over the last fifteen years a number of Italian speech corpora have been created, starting with
the pioneering Lessico di frequenza dell’italiano parlato (LIP: De Mauro et al 1993). Some
multilingual corpora also include spoken Italian components, such as C-ORAL-ROM,
Integrated reference corpora for spoken romance languages (Cresti & Moneglia 2005). Other
corpora have been designed to explore phonetic and phonological properties, such as Archivio
delle Varietà di Italiano Parlato (AVIP: Pettorino & Giannini 2003), Archivio di Parlato
Italiano (API), Italiano Parlato (IPar: Albano Leoni & Giordano 2005) e Corpora e Lessici di
Italiano Parlato e Scritto (CLIPS: http://www.clips.unina.it/).3 Here we will focus attention
on two corpora that aim at representativeness of variation, namely LIP (500,000 words) and
the Italian component of C-ORAL-ROM (300,000 words).
3.2.1 The LIP corpus
LIP was designed for use in the production of the first frequency list of spoken Italian (De
Mauro et al. 1993). It was modelled on the corpus used for the first frequency list of written
Italian (LIF: Bortolini et al. 1971). By current standards the corpus is rather small (500,000
words), consisting of 57 hours of speech recorded in the early ’90s, transcribed and
3
The wide interest in spoken language corpora is testified by the Parlaritaliano project, directed by Miriam
Voghera, aiming at the constitution of a web portal of electronic resources and bibliographies on Italian spoken
language (http://www.parlaritaliano.it/).
8
lemmatized. It was designed along two dimensions of variation: diatopic (4 cities: Rome,
Florence, Milan and Naples) and interaction typology (5 classes: A: bidirectional, exchange,
face to face, with free turn-taking; B: bi-directional exchange, not face to face, with free turn-
taking; C: bi-directional exchange, face to face, with regulated turn-taking; D: unidirectional
exchange, with the addressee being present; E: distanced unidirectional exchange). Criteria
are thus mainly external ones (see Table 3).
The strength of the LIP corpus lies in its coverage of different situational contexts. This
avoids the vicious circle caused by designs based on internal criteria (the main problem with
which is circularity in text selection and data retrieval). LIP’s external classification seems
particularly useful to investigate variation in language teaching contexts. Not only can
expressions, tags, and syntactic patterns be queried over different situation-types, but access is
available to the entire transcript, allowing the learner to explore this freely.
Among the disadvantages of LIP is its small size, which makes it unreliable for medium
frequency words and patterns. A small scale investigation of the most frequent loan words in
Italian (Chiari 2008) obtained sparse data, which was heavily influenced by single text
domains. Another drawback is the inaccessibility of the original audio files, and the absence
of phonetic and prosodic annotation. However it remains unrivalled for its balanced typology
based exclusively on situational parameters.
9
‐ oral exams at the university;
‐ interrogations in the courtroom;
‐ interviews on radio or television
D: unidirectional exchange, with the ‐ lessons in the elementary school;
addressee being present ‐ lessons in the secondary school;
‐ university lectures;
‐ speeches held during party conventions or labor union
meetings;
‐ presentations at scientific meetings;
‐ speeches held during electoral campaigns;
‐ sermons;
‐ presentations at non‐specialist meetings;
‐ court pleadings.
E: distanced unidirectional exchange ‐ television programs;
‐ radio programs.
3.2.2 C-ORAL-ROM
10
The corpus design is specifically aimed at capturing variation, “covering a wide range of
semantic and pragmatic domains of application” (Cresti & Moneglia 2005b, see Fig. 5). The
highest level distinction is one of register, between formal and informal speech. The
definition of these terms is rather peculiar, since informal is intended as an “un-scripted low
variety of language, used for everyday interactive purposes”, and the formal as a “partially-
scripted task-oriented high variety of language” ((Cresti & Moneglia 2005b). The second
level takes into account the channel of communication for formal speech (face-to-face,
broadcast and telephonic), and public versus private for informal speech. At the third level
informal speech is divided into monologue, dialogue and (multi-party) conversation. For
some formal speech categories a further distinction is made on the basis of domain: natural
context (political speech; political debate; preaching; teaching; professional explanation;
conference; business; law) and media (news; sport; interviews; science; weather forecast;
scientific press; reportage; talk show). As Cresti notes, the criteria adopted for the formal and
informal sections are rather different:
The definition of a finite list of typical domains of use is the main criterion applied in documenting the formal
uses of the four romance languages, while variations in dialogue structure and social context of use is the
sampling criterion of the informal part. The choice of the specific semantic domain of use is left random in the
informal sampling.
(Cresti & Moneglia 2005b)
The architecture of the corpus is not consistent, including both internal and external criteria.
In the domain distinction for formal natural contexts we can find topic-related categories
(preaching “religion”, business, law), but also ones that represent situational aspects and text-
types (teaching, professional explanation and conference), as well as mixed topic- and
activity-centered categories, such as political speech and political debate.
[…] while it can be assumed that in western societies the formal use of language is applied in a closed series of
typical domains, the same does not hold for the informal use of language. The list of possible domains of use for
informal language is by definition open, and no domain can in principle be considered more typical than others.
(Cresti & Moneglia 2005b)
While the open nature of the domains of informal language use can be easily agreed with, the
idea that formal language use is set in a predefined (or at least closed) set of domains is
debatable. The formal-informal parameter seems a continuum rather than a clear-cut
distinction, and hence theoretically weak. The sampling reflects these differences in the
treatment of the formal and informal categories:
Fig. 6):
11
Fig. 6. Quantitative balance of C-ORAL-ROM (Italian component)
When it comes to the issue of language variation, it is extremely difficult to evaluate the
project. On the one hand, if we suppose that the formal/informal distinction holds and has
been applied consistently, it would be extremely practical to observe and compare the two
sections in classroom work. On the other hand the different quantitative balance of the
subsections makes it hard to go beyond this dichotomy.4
Turning to retrieval tools, C-ORAL-ROM is distributed with Véronis’ Contextes. The query
syntax accepts word types, lemmas, pos tagging, multiple word queries, and regular
expressions, and the interface permits the export of concordances and frequency lists of
selected words, as well as direct access to the full text. No general frequency and dispersion
data is given for sections of the corpus, but a non-exportable frequency list for the whole
corpus is provided (without dispersion and usage counts). Like the LIP corpus, C-ORAL-
ROM is rather small and hence unreliable for medium frequency words.
Italian reference corpora treat language variation in very different ways, posing a number of
problems for their exploitation in teaching. The speech corpora discussed have a number of
features which are highly interesting from this point of view: access to both concordances and
full text (the latter being essential to understand the speech acts involved), aligned audio,
tagged versions for the investigation of grammatical patterns. Their disadvantages concern
query limitations, their small size, and design inconsistencies. If the learner wants to
investigate similarities and differences between spoken and written Italian, she will encounter
a number of obstacles (common in some cases to the language researcher): differences in size,
design, and tools that do not permit statistical conclusions to be drawn from the different
sources available. Italian corpus linguistics has been mainly concerned with the construction
4
The frame is of course different for research purposes, since it is always possible to evaluate representativeness
related to single parameters (Biber 1993a; b).
12
of balanced resources, and the debate over large vs. register-diversified balanced corpora
(Biber 1993b) has barely touched the Italian scene, mainly because we do not have any very
large corpora at the moment. But if we consider the problem from the language learner’s
perspective, it is obvious that a balanced diversified corpus will respond better to their needs.
Although language variation in all its aspects is at the centre of debate in corpus linguistics,
there are still inconsistencies in the classification and labelling of variation, not only in
reference to Italian. These lead to difficulties in managing corpus material, since text-types
are grouped in different classes in unpredictable ways (a situation common to many English
resources as well, as Lee 2001 has shown). At the level of query capabilities, there are severe
limitations for Italian reference corpora (which have been largely overcome for English
corpora like the BNC, with Dodd’s Xaira (http://www.oucs.ox.ac.uk/rts/xaira/), Davies’ View
(http://corpus.byu.edu/bnc/), and Fletcher’s Pie (http://pie.usna.edu/)). The flexibility of
retrieval tools is indispensable for correct data extraction, and an interface offering
accessibility, user-friendliness, clean and intuitive layout is a vital requirement from a
teaching perspective, where aims potentially differ from those of linguistic research, .
One of the main obstacles for teachers is the unpredictability of query results and thus a sense
of unexpectedness, making them reluctant to use corpora extensively (it implies a lot of
planning). A clear description of corpus features and the variation dimensions represented
should be a priority in corpus distribution. Explicit evaluation of the main tasks that teachers
and learners need to perform with single corpora should include: looking for patterns of
variation from a lexical, syntactic, pragmatic, and sociolinguistic point of view; looking for
specific features that characterize different genres, different domains and different text types;
looking for textual specificity of sublanguages. For spoken corpora a requisite should be
access to aligned audio, so that the user can examine spoken realisations of different registers
(phonetic and prosodic features). C-ORAL-ROM and CLIPS (www.clips.unina.it) both go in
this direction, albeit at different levels.
In order to explore such issues retrieval tools and documentation should provide:
• full classification of subcorpora and their subsections;
• querying in subsections;
• querying over POS tags, preferably using regular expressions or intuitive matching
systems;
• explicit frequency data (including relative frequency and dispersion), perhaps indicated
roughly with learner-friendly symbols like those in dictionaries);
• exact numbers of hits found and display of complete concordances with adjustable context
length;
• full text access for traditional textual analysis and pragmatic observation.
Language teaching may benefit from the corpus-based study of linguistic variation in many
ways: in the selection and grading of content for syllabus and materials design, in developing
activities to enhance learner awareness and to help learners exploit resources, and in
assessment procedures.
References
Albano Leoni F. & R. Giordano (eds.) (2005), Italiano parlato. Analisi di un dialogo. Napoli: Liguori.
13
Bernardini S. & F. Zanettin (eds) (2000) I corpora nella didattica della traduzione. Bologna: CLUEB.
Biber D. (1988) Variation across speech and writing. Cambridge: Cambridge University Press.
Biber D. (1992) The multi-dimensional approach to linguistic analyses of genre variation: an overview
of methodology and findings. Computers and the humanities, 26, 331-345.
Biber D. (1993a) Representativeness in corpus design. Literary and linguistic computing, 8, 243-257.
Biber D. (1993b) Using register-diversified corpora for general language studies. Computational
Linguistics, 19, 219-241.
Biber D. (1995) Dimensions of register variation: a cross-linguistic comparison. Cambridge:
Cambridge University Press.
Biber D., S. Conrad & R. Reppen (1998) Corpus linguistics: investigating language structure and use.
Cambridge: Cambridge University Press.
Biber D. & E. Finegan (1991) On the exploitation of computerized corpora in variation studies. In
Aijmer K. & B. Altenberg (eds), English corpus linguistics: studies in honour of Jan Svartvik.
London: Longman, 204-220.
Biber D. & E. Finegan (eds) (1994) Sociolinguistic perspectives on register. Oxford: Oxford
University Press.
Bortolini U., C. Tagliavini & A. Zampolli (1971) Lessico di frequenza della lingua italiana
contemporanea. Milano: IBM Italia.
Botley S., J. Glass, T. McEnery & A. Wilson (eds) (1996) Proceedings of teaching and language
corpora 1996. Lancaster: UCREL.
Botley S., A. McEnery & A. Wilson (eds) (2000) Multilingual corpora in teaching and research.
Amsterdam: Rodopi.
Burnard L. & T. McEnery (eds) (2000) Rethinking language pedagogy from a corpus perspective.
Frankfurt am Main: Peter Lang.
Chiari I. (2005) Linguistica e informatica: la linguistica dei corpora in Italia. Bollettino di Italianistica,
4, 101-118.
Chiari I. (2008) Ingresso, uso, integrazione e produttività delle parole nuove in italiano. Metodi e
problemi dell’indagine quantitativa sul lessico. In M. Pettorino, A. Giannini, M. Vallone & R.
Savy (eds.) Atti del Convegno internazionale sulla comunicazione parlata. Napoli: Liguori, 402-
421. Conrad S. (2004) Corpus linguistics, language variation and language teaching. In J.M.
Sinclair (ed), How to use corpora in language teaching. Amsterdam: John Benjamins, 67-85.
Cresti E. (2007) Some comparisons between UBLI and C-ORAL-ROM. In Zaima K.Y.S. & T.
Takagaki (eds), Spoken language corpus and linguistics informatics. Amsterdam: John Benjamins,
125-115.
Cresti E. & M. Moneglia (2005a) C-ORAL-ROM: integrated reference corpora for spoken Romance
languages. Amsterdam: John Benjamins.
Cresti E. & M. Moneglia (2005b) C-ORAL-ROM: integrated reference corpora for spoken Romance
languages. URL: http://lablita.dit.unifi.it/corpora/descriptions/coralrom/ (date accessed May 15
2008)._
De Mauro T., F. Mancini, M. Vedovelli & M. Voghera (1993) Lessico di frequenza dell'italiano
parlato (LIP). Milano: Etaslibri.
Eagles (1996). Preliminary recommendations on Corpus Typology, EAG--TCWG--CTYP/P,
http://www.ilc.cnr.it/EAGLES96/corpustyp/corpustyp.html. May, 1996. [Accessed April 2007].
Ellis R. (1987) Second language acquisition in context. Englewood Cliffs, NJ: Prentice-Hall.
Gavioli L. (2005) Exploring corpora for ESP learning. Amsterdam: John Benjamins.
Godwin-Jones B. (2001) Emerging technologies. tools and trends in corpora use for teaching and
learning. Language Learning & Technology, 5(3), 7-12.
Granger S. & M. Wynne (2000) Optimising measures of lexical variation in EFL learner corpora. In
Kirk J.M. (ed) Corpora galore: analyses and techniques in describing English. Amsterdam:
Rodopi, 249-257.
Hunston S. (2002) Pattern grammar, language teaching, and linguistic variation: applications of a
corpus-driven grammar. In Reppen R., S.M. Fitzmaurice & D. Biber (eds) Using corpora to
explore linguistic variation. Amsterdam: John Benjamins, 167-183.
14
Hymes D. (1971) On communicative competence. Philadelphia: University of Pennsylvania Press.
Johns T. (1991a) Classroom concordancing: University of Birmingham, Centre for English Studies.
Johns T. (1991b) From printout to handout: grammar and vocabulary learning in the context of data-
driven learning. English Language Research Journal, 4, 27-45.
Johns T. (1991c) Should you be persuaded – two examples of data-driven learning materials. English
Language Research Journal, 4, 1-16.
Johns T. (2002) Data-driven learning: the perpetual challenge. In Kettemann B. & G. Marko (eds)
Teaching and learning by doing corpus analysis. Amsterdam: Rodopi, 107-117.
Kahn B. (1985) Computers in science: using computers for learning and teaching. Cambridge:
Cambridge University Press.
Laudanna A., A.M. Thornton, G. Brown, C. Burani & L. Marconi (1995) Un corpus dell'italiano
scritto contemporaneo dalla parte del ricevente. In Bolasco S., L. Lebart & A. Salem (eds), III
Giornate internazionali di analisi statistica dei dati testuali. Roma: CISU, 103-109.
Lee, D.Y. (2001) Genres, registers, text types, domains, and styles: clarifying the concepts and
navigating a path through the BNC jungle. Language Learning & Technology, 5(3), 37-72.
Lewandowska-Tomaszczyk B. (ed) (2003) PALC 2001: practical applications in language corpora.
Frankfurt am Main: Peter Lang.
Lewandowska-Tomaszczyk B. (ed) (2004) Practical applications in language and computers: PALC
2003. Frankfurt am Main: Peter Lang.
Lewandowska-Tomaszczyk B. & P.J. Melia (eds) (1997) International conference on practical
applications in language corpora. Lodz: Lodz University Press.
Lewandowska-Tomaszczyk B. & P.J. Melia (eds) (2000) PALC'99 – Practical applications in
language corpora. Frankfurt am Main: Peter Lang.
Meyer C.F. (2004) Can you really study language variation in linguistic corpora? American Speech,
79, 339-355.
Paltridge B. (1996) Genre, text type, and the language learning classroom. ELT Journal, 50, 237-243.
Partington A. (1998) Patterns and meanings: using corpora for English language research and
teaching. Amsterdam: John Benjamins.
Pettorino M, & A. Giannini (2003) Progetti AVIP e API-unità di ricerca dell’Università degli Studi di
Napoli l’Orientale. In Albano Leoni F., Cutugno F., Pettorino M., Savy R (eds.), Il parlato
italiano Atti del Convegno Nazionale, D'Auria, Napoli, N06.
Rossini Favretti R. (2000) Progettazione e costruzione di un corpus di italiano scritto: CORIS/CODIS.
In Rossini Favretti R. (ed.), Linguistica e informatica. Multimedialità, corpora e percorsi di
apprendimento. Roma: Bulzoni, 39-56.
Rossini Favretti R., F. Tamburini & C. De Santis (2002) A corpus of written Italian: a defined and a
dynamic model. In Wilson A., P. Rayson & T. McEnery (eds), A rainbow of corpora: corpus
linguistics and the languages of the world. Munich: Lincom-Europa, 27-38.
Scott M. & C. Tribble (eds) (2006) Textual patterns: key words and corpus analysis in language
education. Amsterdam: John Benjamins.
Sinclair J.M. (ed) (2004) How to use corpora in language teaching. Amsterdam: John Benjamins.
Spolsky B. (1989) Communicative competence, language proficiency, and beyond. Applied
Linguistics, 10, 138-156.
Szendeffy J. (2005) A practical guide to using computers in language teaching. Ann Arbor: University
of Michigan Press.
Timmis I. (2003) Corpora, classroom and context: the place of spoken grammar in English language
teaching. Ph.D. Thesis, University of Nottingham.
Tono Y. (2002) The role of learner corpora in SLA research and foreign language teaching: the
multiple comparison approach. Ph.D Thesis, University of Lancaster
Wang L. (2000) The use of parallel texts in language learning: computer software and teaching
materials for English and Chinese. Birmingham: University of Birmingham.
Wible D., F.-y. Chien, C.-H. Kuo & C.C. Wang (2002) Toward automating a personalized
concordancer for data-driven learning: a lexical difficulty filter for language learners. In Ketteman
15
B. & G. Marko (eds) Teaching and learning by doing corpus analysis. Amsterdam: Rodopi, 147-
154.
Wichmann, A., S. Fligelstone, T. McEnery & G. Knowles (eds) (1997) Teaching and language
corpora. London: Longman.
Zanettin F., S. Bernardini & D. Stewart (eds) (2003) Corpora in translator education. Manchester: St.
Jerome.
16