SlideShare a Scribd company logo
Corpus Approaches to the
Language of Literature
Martin Wynne
Oxford Text Archive
University of Oxford
martin.wynne@oucs.ox.ac.uk
OTA
PALA
PALA 2008
Corpus Approaches to the Language of Literature 2008
6
What is corpus stylistics?
The use of the resources, tools and
methodologies of corpus linguistics to
carry out literary analysis on the basis
of the language of literature.
7
Corpus Stylistics - Methods
 Examining and analysing texts and
corpora
 Comparing texts and corpora
 Building and annotating resources
8
“I'm just going out to
commit certain deeds.”
In an episode of The Simpsons, Homer has
planned with Moe to steal Moe's car and drive it
into the water, so that Moe can claim the
insurance money. Before Homer goes out to steal
the car, he is eating dinner with the family, and is
trying to act innocently, as if it is a normal
evening. He makes various mistakes, and when
he gets up to leave, he says, “I'm just going out to
commit certain deeds.”
9
Consult a corpus to see how a
word / phrase / construction /
collocation 'normally' occurs.
For example, we can look at 'commit' and 'deeds'
in the British National Corpus, and try to answer
questions like “Why is this funny?”, and “Why are
commit and deeds the wrong words to use here?”
Requirements: access to a general reference
corpus and analysis tools (preferably online) for
concordance, collocation, cluster, distribution,
word frequency lists
10
Analyse an electronic
version of a literary text,
using text analysis tools.
 How does author X use expression Y?
 How often does she use Y?
 Does she prefer another expression in certain
contexts?
 In what parts of the novel / play / poem does
she tend to use Y?
Requirements: (reliable) electronic version of the
text (in an appropriate format), plus relevant tools
(preferably online)
11
Analysing a literary corpus
Ask questions like those above, but across the
oeuvre of an author, or across a literary genre or
time period.
Furthermore, analyse variation in an author's work
(e.g. compare one novel with the rest)
Requirements: a relevant corpus, plus tools that
allow for internal comparisons
12
Analysing an author's work
Clusters >4 words in Dickens -
among the top 25:
AS IF HE HAD BEEN
IN THE COURSE OF THE
A QUARTER OF AN HOUR
AT THE BOTTOM OF THE
WHAT DO YOU THINK OF
IN THE MIDDLE OF THE
AS IF IT HAD BEEN
AT THE TOP OF THE
ON THE OTHER SIDE OF
AT THE END OF THE
AS A MATTER OF COURSE
THE OTHER SIDE OF THE
UP AND DOWN THE ROOM
– Names and labels
21
– Speech 16
– “As if” 6
– Body parts 12
– other 22
Categorisation of cluster types
(more than 5 words):
Mahlberg, M. “Corpus stylistics: bridging the gap
between linguistic and literary studies” In M. Hoey,
M. Mahlberg, M. Stubbs, W. Teubert. Text,
Discourse, and Corpora. London: Continuum. 2007.
13
Making internal
comparisons within a text
 Comparing the speech of one character with the
rest, e.g. Romeo and Juliet .
 Comparing one act or scene with the rest.
 Comparing the style of one section of a novel
with the rest.
Requirements: text processing tools to separate
text elements, or markup to tag text structure and
markup-aware tools, plus keywords software
14
Comparing a text to a
reference corpus
Compare the frequency, distribution and usage of
words in the text with a reference corpus.
E.g. A Conneticut Yankee in King Arthur's Court
by Mark Twain, compared to the British National
Corpus (BNC)
Requirements: many reference corpora, literary
and non-literary, different languages, genres, time
periods, etc.
15
Comparing texts and
corpora
I, AND, SIR, KING, YE, IT, MY, LAUNCELOT, ME, WAS,
KNIGHTS, MERLIN, KNIGHT, ARMOR, CLARENCE, THING,
SANDY, HIM, MARHAUS, THAT, UPON, TOWARD, MORDRED,
GAWAINE, CAMELOT, SAGRAMOR, SO, DOWLEY, YES,
COULDN'T, MILRAYS, THEN, BUT, THEY, HUNDRED,
PRESENTLY, KING'S, ARTHUR'S, WOULD, MAN, HAD, WE,
ALL, YONDER, THOU, SLAVE, MIRACLE, OUT, ARTHUR,
GOOD, UNTO, COULD, AH, HATH, MYSELF, ERRANTRY, LET,
SMOTE, ALONG, WELL, MAGICIAN, NOBLE, HIS, GOT,
WHEREFORE, SWORD, HE, EVERYBODY, THEE, SPEAR,
YOU, ABBOT, PERADVENTURE, OFFENSE, HERMIT, THEM,
PROCESSION, STRAIGHTWAY, A, YET, MONKS, KAY, EVER,
GUENEVER
16
Comparing a literary corpus
to a general reference corpus
Identifying and characterizing an author's style,
e.g. comparing all of Mark Twain's work with US
fiction in the period 1870-1910;
Identifying and characterizing literary style (of a
period, or genre, etc),
e.g. comparing a corpus of US fiction with a
corpus of non-fiction from the same period, or
comparing dramatic dialogue in plays with real
conversation in a spoken corpus.
Requirements: More literary corpora, more
reference corpora, more computing power!
17
Tracing historical change
Diachronic studies of the language of literature,
studying language change, changes in style,
genre, etc.
Requirements: sets of historical literary corpora
of various time periods, or a diachronic corpus
which allows internal comparisons, or a collection
of texts (with dates) which can be cross-searched
18
Annotating and manually
analyzing texts and
corpora
Can be used to test, refine and develop theories
about the language of literature.
Theories are forced to demonstrate textual
evidence, account for all textual phenomena.
Frequencies and relevant frequencies can be
calculated.
Requirements: lots of time, money and expertise!
19
Building and Annotating
The Speech, Thought and Writing Presentation Corpus
Elena Semino, Mick Short, Martin Wynne et al
Lancaster University
Identifying, categorising and analysing the functions of all
occurrences of reported speech, thought and writing (e.g.
direct speech, indirect speech, free indirect speech, direct
thought, etc.) in a small corpus of fictional and non-fictional
texts (and later also speech)
20
Building and annotating (2)
VICI
Free University of Amsterdam
Gerard Steen et al
Identifying and categorising metaphorical
expressions in a subset of the BNC corpus;
analysing usage and distributions across text
types and modes
21
Further types of analysis
 More levels of annotation: parsing, semantic
tagging, etc.
 Stylometry
 Text mining
 Multilingual, parallel, comparable, translation
corpora
 Socio-cultural and historical investigations in literary
corpora
But note, please, that you don't need annotation for
many useful techniques!
Requirements: various!
22
A new type of Shakespeare
dictionary: Jonathan Culpeper
A proposal for a dictionary of the language of Shakespeare, involving
better integration of linguistic description, frequency information and
non-linguistic information.
− How often does X occur?
− How often do the particular meanings of X occur?
− What kind of words does X tend to co-occur with?
− How often do the particular ‘grammatical categories’ of X occur?
− What kinds of register does X co-occur with?
− What kinds of speaker/addressee does X co-occur with?
− Is X part of a particular lexical field (semantic category) and how does
that field distribute across the plays?
− How can the above help differentiate X word from Y word?
− Etc.
(1) a particular theoretical approach to meanings, (2) a particular
methodology ….. enter Corpus Linguistics
23
Using large-scale literary
corpora
 For example, Matthew Jockers, Sarah Allison
and others at Stanford University, using large
collections of literary texts, from commercial
providers, applying corpus linguistic and data
mining techniques to address literary research
questions
e.g. Joe Shapiro comparing quantity of narrative
v. descriptive passages in US 19th
Century
literature
 Perhaps, particular potential for historical literary
and linguistic studies
24
Basic methods: summary
1. Examine the norms in a general reference corpus
2. Perform text analysis on an electronic literary text
3. Make internal comparisons in a literary text
4. Analyse a literary corpus
5. Make internal comparisons in a literary corpus
6. Compare a text to a reference corpus
7. Compare a literary corpus to a non-literary corpus
8. Compare different literary corpora with each other
9. Build and annotate corpora
10. Others!
25
Methods: conclusion
It is becoming increasingly possible to test
empirically claims about the language of literature,
to search for and provide evidence from texts, and
to establish the norms of literary and non-literary
style.
Stylistics typically makes use of a toolkit of
linguistic techniques, methods and resources.
Corpus stylistics will become a powerful addition
to this toolkit in the future.
26
Resources for Corpus
Stylistics
What do we need?
● Reliable electronic editions of literary texts
● Relevant reference corpora
● Analysis tools
● Interoperability
● Shared access
● Sustainability
● Methodology
● Expertise
27
Research Infrastructure
The vision is for a set of relevant texts, corpora and tools,
hosted in various locations around the world, available
online from the user's desktop, via a single sign-on; all
the resources and tools working together using high-
speed connections and high-performance computing.
Plus tools for showing, sharing and collaborating in a
virtual workspace.
CLARIN is working to build this infrastructure for the use
of language resources and technologies across the
humanities and social sciences.
28
Links
Oxford Text Archive (OTA)
http://www.ota.ox.ac.uk/
PALA Corpus Stylistics Special Interest Group
http://www.pala.ac.uk/sigs/corpus-style/
Corpus-style mailing list
http://www.jiscmail.ac.uk/lists/corpus-style.html
Speech, Thought and Writing Presentation Project
http://bowland-files.lancs.ac.uk/stwp/
British National Corpus
http://www.natcorp.ox.ac.uk/
Brigham Young University Corpora from Mark Davies
http://corpus.byu.edu/

More Related Content

PDF
THE ART OF PUBLIC SPEAKING
Dr Ian Ellis-Jones
 
PPTX
Applied Linguistics.pptx
BrhoomMansoor
 
PPTX
Pragmatics
Dhanne Mae Cabilitasan
 
PPTX
Lesson 1. linguistics and applied linguistics 2
Prisci Jara
 
PPT
Lexicography
Sadia Irshad
 
PPTX
Content based instruction ( CBI )
Mark Jayson Zonio
 
PPTX
Fossilization
Christian Añamisi
 
PPTX
Essay Writing 10 Golden Rules
Andy Fisher
 
THE ART OF PUBLIC SPEAKING
Dr Ian Ellis-Jones
 
Applied Linguistics.pptx
BrhoomMansoor
 
Lesson 1. linguistics and applied linguistics 2
Prisci Jara
 
Lexicography
Sadia Irshad
 
Content based instruction ( CBI )
Mark Jayson Zonio
 
Fossilization
Christian Añamisi
 
Essay Writing 10 Golden Rules
Andy Fisher
 

What's hot (20)

PPT
Language systems
Gaby Zaja
 
PPTX
Etymology
Hoang Dai
 
PPT
Approaches to Language Teaching
Tania Gomez Posso
 
PPTX
Semantics change over period of time In linguistics
Ridazaman2
 
PPTX
Case studies william labov & trudgill
Beesh Ahmed
 
PPTX
Academic Writing
Dr. Khaled OUANES
 
PDF
Academic English Skills: Introduction to Academic Writing Skills
Iwan Syahril
 
PPT
Esp.language descriptions
Roksana Novruzova
 
DOCX
Register theory
Farooq Niazi
 
PPTX
"Writing Skill"
Angy Lagos
 
PPTX
Tefl
Parviz Yousefi
 
PPTX
Discourse structure chapter 4 by Ahmet YUSUF
أحمد يوسف
 
PPTX
Introduction to Academic Writing
Dilip Barad
 
PPTX
Conversational Structure
Phalangchok Wanphet
 
PPTX
The ethnography of communication
Sara Pacheco
 
PPTX
Teaching speaking brown
shohreh12345
 
PPTX
Esp
Haree Shariff
 
PPTX
Corpus linguistics
Irum Malik
 
PPT
The communicative approach kk
อ. อังชรินทร์
 
Language systems
Gaby Zaja
 
Etymology
Hoang Dai
 
Approaches to Language Teaching
Tania Gomez Posso
 
Semantics change over period of time In linguistics
Ridazaman2
 
Case studies william labov & trudgill
Beesh Ahmed
 
Academic Writing
Dr. Khaled OUANES
 
Academic English Skills: Introduction to Academic Writing Skills
Iwan Syahril
 
Esp.language descriptions
Roksana Novruzova
 
Register theory
Farooq Niazi
 
"Writing Skill"
Angy Lagos
 
Discourse structure chapter 4 by Ahmet YUSUF
أحمد يوسف
 
Introduction to Academic Writing
Dilip Barad
 
Conversational Structure
Phalangchok Wanphet
 
The ethnography of communication
Sara Pacheco
 
Teaching speaking brown
shohreh12345
 
Corpus linguistics
Irum Malik
 
The communicative approach kk
อ. อังชรินทร์
 
Ad

Similar to Corpus Approaches to the Language of Literature 2008 (20)

PPT
Computationalstylistics tbpresented
eiza_89
 
PPTX
Computational stylistics ppt
syila239
 
PPTX
Computational stylistic 24 april
syila239
 
PPTX
Intro. to Stylistics
Freelancer
 
PPTX
MODULE-1-STYLISTICS BSED-ENG 3A (1).pptx
MayAnnIggua
 
PPTX
Computational stylistics (2)[1]
Hajj Latiff
 
PPT
20130901155614339978.ppt20130901155614339978.ppt
RolandoJrBelarmino
 
DOCX
Research methods and materials
Garret Raja
 
PPTX
Comparative Literature Studies
Dilip Barad
 
PPTX
National trust 3
Cherwelllearning
 
PPTX
Essays And Grammar
guest61dc4ad
 
PDF
MacroMicroZoom.pdf
Martin Wynne
 
PPT
Comparative literature- summary
robinsonia
 
PPT
Literary criticismpowerpoint
Nishant Pandya
 
PPT
Vivian
cyutafl
 
PDF
Stylistics And Shakespeares Language Transdisciplinary Approaches Jonathan Cu...
hynesscolas
 
PDF
Poetry and The Merchant of Venice and The Poet X
AbrilRodriguez37
 
DOC
12. literary criticism. fb college
Archie ibay
 
Computationalstylistics tbpresented
eiza_89
 
Computational stylistics ppt
syila239
 
Computational stylistic 24 april
syila239
 
Intro. to Stylistics
Freelancer
 
MODULE-1-STYLISTICS BSED-ENG 3A (1).pptx
MayAnnIggua
 
Computational stylistics (2)[1]
Hajj Latiff
 
20130901155614339978.ppt20130901155614339978.ppt
RolandoJrBelarmino
 
Research methods and materials
Garret Raja
 
Comparative Literature Studies
Dilip Barad
 
National trust 3
Cherwelllearning
 
Essays And Grammar
guest61dc4ad
 
MacroMicroZoom.pdf
Martin Wynne
 
Comparative literature- summary
robinsonia
 
Literary criticismpowerpoint
Nishant Pandya
 
Vivian
cyutafl
 
Stylistics And Shakespeares Language Transdisciplinary Approaches Jonathan Cu...
hynesscolas
 
Poetry and The Merchant of Venice and The Poet X
AbrilRodriguez37
 
12. literary criticism. fb college
Archie ibay
 
Ad

More from Martin Wynne (10)

PDF
CLARIN Supporting Horizon Europe proposals
Martin Wynne
 
PDF
CLARIN - Corpora, corpus tools and collaboration
Martin Wynne
 
PDF
Forty-five Years of the OTA
Martin Wynne
 
PDF
Exploring rhetoric in the Electronic Enlightenment
Martin Wynne
 
PDF
Corpus Linguistics for Language Teaching and Learning
Martin Wynne
 
PDF
Forty Years of the OTA
Martin Wynne
 
PDF
Big data and Digital Transformations in the Humanities
Martin Wynne
 
PDF
Hacking EEBO: colour terms
Martin Wynne
 
PDF
When will there be a digital revolution in the humanities?
Martin Wynne
 
PDF
Annotated Corpora for Research in the Humanities
Martin Wynne
 
CLARIN Supporting Horizon Europe proposals
Martin Wynne
 
CLARIN - Corpora, corpus tools and collaboration
Martin Wynne
 
Forty-five Years of the OTA
Martin Wynne
 
Exploring rhetoric in the Electronic Enlightenment
Martin Wynne
 
Corpus Linguistics for Language Teaching and Learning
Martin Wynne
 
Forty Years of the OTA
Martin Wynne
 
Big data and Digital Transformations in the Humanities
Martin Wynne
 
Hacking EEBO: colour terms
Martin Wynne
 
When will there be a digital revolution in the humanities?
Martin Wynne
 
Annotated Corpora for Research in the Humanities
Martin Wynne
 

Recently uploaded (20)

PPTX
PPTs-The Rise of Empiresghhhhhhhh (1).pptx
academysrusti114
 
PPTX
ACUTE NASOPHARYNGITIS. pptx
AneetaSharma15
 
PPTX
Skill Development Program For Physiotherapy Students by SRY.pptx
Prof.Dr.Y.SHANTHOSHRAJA MPT Orthopedic., MSc Microbiology
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
PDF
UTS Health Student Promotional Representative_Position Description.pdf
Faculty of Health, University of Technology Sydney
 
PPTX
How to Manage Global Discount in Odoo 18 POS
Celine George
 
PPTX
Understanding operators in c language.pptx
auteharshil95
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
Presentation on Janskhiya sthirata kosh.
Ms Usha Vadhel
 
PDF
The Picture of Dorian Gray summary and depiction
opaliyahemel
 
PDF
Module 3: Health Systems Tutorial Slides S2 2025
Jonathan Hallett
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
Congenital Hypothyroidism pptx
AneetaSharma15
 
PPTX
vedic maths in python:unleasing ancient wisdom with modern code
mistrymuskan14
 
PPTX
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
RAKESH SAJJAN
 
PDF
Types of Literary Text: Poetry and Prose
kaelandreabibit
 
PDF
Arihant Class 10 All in One Maths full pdf
sajal kumar
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PPTs-The Rise of Empiresghhhhhhhh (1).pptx
academysrusti114
 
ACUTE NASOPHARYNGITIS. pptx
AneetaSharma15
 
Skill Development Program For Physiotherapy Students by SRY.pptx
Prof.Dr.Y.SHANTHOSHRAJA MPT Orthopedic., MSc Microbiology
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
UTS Health Student Promotional Representative_Position Description.pdf
Faculty of Health, University of Technology Sydney
 
How to Manage Global Discount in Odoo 18 POS
Celine George
 
Understanding operators in c language.pptx
auteharshil95
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Presentation on Janskhiya sthirata kosh.
Ms Usha Vadhel
 
The Picture of Dorian Gray summary and depiction
opaliyahemel
 
Module 3: Health Systems Tutorial Slides S2 2025
Jonathan Hallett
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
Congenital Hypothyroidism pptx
AneetaSharma15
 
vedic maths in python:unleasing ancient wisdom with modern code
mistrymuskan14
 
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
RAKESH SAJJAN
 
Types of Literary Text: Poetry and Prose
kaelandreabibit
 
Arihant Class 10 All in One Maths full pdf
sajal kumar
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 

Corpus Approaches to the Language of Literature 2008

  • 1. Corpus Approaches to the Language of Literature Martin Wynne Oxford Text Archive University of Oxford [email protected]
  • 2. OTA
  • 6. 6 What is corpus stylistics? The use of the resources, tools and methodologies of corpus linguistics to carry out literary analysis on the basis of the language of literature.
  • 7. 7 Corpus Stylistics - Methods  Examining and analysing texts and corpora  Comparing texts and corpora  Building and annotating resources
  • 8. 8 “I'm just going out to commit certain deeds.” In an episode of The Simpsons, Homer has planned with Moe to steal Moe's car and drive it into the water, so that Moe can claim the insurance money. Before Homer goes out to steal the car, he is eating dinner with the family, and is trying to act innocently, as if it is a normal evening. He makes various mistakes, and when he gets up to leave, he says, “I'm just going out to commit certain deeds.”
  • 9. 9 Consult a corpus to see how a word / phrase / construction / collocation 'normally' occurs. For example, we can look at 'commit' and 'deeds' in the British National Corpus, and try to answer questions like “Why is this funny?”, and “Why are commit and deeds the wrong words to use here?” Requirements: access to a general reference corpus and analysis tools (preferably online) for concordance, collocation, cluster, distribution, word frequency lists
  • 10. 10 Analyse an electronic version of a literary text, using text analysis tools.  How does author X use expression Y?  How often does she use Y?  Does she prefer another expression in certain contexts?  In what parts of the novel / play / poem does she tend to use Y? Requirements: (reliable) electronic version of the text (in an appropriate format), plus relevant tools (preferably online)
  • 11. 11 Analysing a literary corpus Ask questions like those above, but across the oeuvre of an author, or across a literary genre or time period. Furthermore, analyse variation in an author's work (e.g. compare one novel with the rest) Requirements: a relevant corpus, plus tools that allow for internal comparisons
  • 12. 12 Analysing an author's work Clusters >4 words in Dickens - among the top 25: AS IF HE HAD BEEN IN THE COURSE OF THE A QUARTER OF AN HOUR AT THE BOTTOM OF THE WHAT DO YOU THINK OF IN THE MIDDLE OF THE AS IF IT HAD BEEN AT THE TOP OF THE ON THE OTHER SIDE OF AT THE END OF THE AS A MATTER OF COURSE THE OTHER SIDE OF THE UP AND DOWN THE ROOM – Names and labels 21 – Speech 16 – “As if” 6 – Body parts 12 – other 22 Categorisation of cluster types (more than 5 words): Mahlberg, M. “Corpus stylistics: bridging the gap between linguistic and literary studies” In M. Hoey, M. Mahlberg, M. Stubbs, W. Teubert. Text, Discourse, and Corpora. London: Continuum. 2007.
  • 13. 13 Making internal comparisons within a text  Comparing the speech of one character with the rest, e.g. Romeo and Juliet .  Comparing one act or scene with the rest.  Comparing the style of one section of a novel with the rest. Requirements: text processing tools to separate text elements, or markup to tag text structure and markup-aware tools, plus keywords software
  • 14. 14 Comparing a text to a reference corpus Compare the frequency, distribution and usage of words in the text with a reference corpus. E.g. A Conneticut Yankee in King Arthur's Court by Mark Twain, compared to the British National Corpus (BNC) Requirements: many reference corpora, literary and non-literary, different languages, genres, time periods, etc.
  • 15. 15 Comparing texts and corpora I, AND, SIR, KING, YE, IT, MY, LAUNCELOT, ME, WAS, KNIGHTS, MERLIN, KNIGHT, ARMOR, CLARENCE, THING, SANDY, HIM, MARHAUS, THAT, UPON, TOWARD, MORDRED, GAWAINE, CAMELOT, SAGRAMOR, SO, DOWLEY, YES, COULDN'T, MILRAYS, THEN, BUT, THEY, HUNDRED, PRESENTLY, KING'S, ARTHUR'S, WOULD, MAN, HAD, WE, ALL, YONDER, THOU, SLAVE, MIRACLE, OUT, ARTHUR, GOOD, UNTO, COULD, AH, HATH, MYSELF, ERRANTRY, LET, SMOTE, ALONG, WELL, MAGICIAN, NOBLE, HIS, GOT, WHEREFORE, SWORD, HE, EVERYBODY, THEE, SPEAR, YOU, ABBOT, PERADVENTURE, OFFENSE, HERMIT, THEM, PROCESSION, STRAIGHTWAY, A, YET, MONKS, KAY, EVER, GUENEVER
  • 16. 16 Comparing a literary corpus to a general reference corpus Identifying and characterizing an author's style, e.g. comparing all of Mark Twain's work with US fiction in the period 1870-1910; Identifying and characterizing literary style (of a period, or genre, etc), e.g. comparing a corpus of US fiction with a corpus of non-fiction from the same period, or comparing dramatic dialogue in plays with real conversation in a spoken corpus. Requirements: More literary corpora, more reference corpora, more computing power!
  • 17. 17 Tracing historical change Diachronic studies of the language of literature, studying language change, changes in style, genre, etc. Requirements: sets of historical literary corpora of various time periods, or a diachronic corpus which allows internal comparisons, or a collection of texts (with dates) which can be cross-searched
  • 18. 18 Annotating and manually analyzing texts and corpora Can be used to test, refine and develop theories about the language of literature. Theories are forced to demonstrate textual evidence, account for all textual phenomena. Frequencies and relevant frequencies can be calculated. Requirements: lots of time, money and expertise!
  • 19. 19 Building and Annotating The Speech, Thought and Writing Presentation Corpus Elena Semino, Mick Short, Martin Wynne et al Lancaster University Identifying, categorising and analysing the functions of all occurrences of reported speech, thought and writing (e.g. direct speech, indirect speech, free indirect speech, direct thought, etc.) in a small corpus of fictional and non-fictional texts (and later also speech)
  • 20. 20 Building and annotating (2) VICI Free University of Amsterdam Gerard Steen et al Identifying and categorising metaphorical expressions in a subset of the BNC corpus; analysing usage and distributions across text types and modes
  • 21. 21 Further types of analysis  More levels of annotation: parsing, semantic tagging, etc.  Stylometry  Text mining  Multilingual, parallel, comparable, translation corpora  Socio-cultural and historical investigations in literary corpora But note, please, that you don't need annotation for many useful techniques! Requirements: various!
  • 22. 22 A new type of Shakespeare dictionary: Jonathan Culpeper A proposal for a dictionary of the language of Shakespeare, involving better integration of linguistic description, frequency information and non-linguistic information. − How often does X occur? − How often do the particular meanings of X occur? − What kind of words does X tend to co-occur with? − How often do the particular ‘grammatical categories’ of X occur? − What kinds of register does X co-occur with? − What kinds of speaker/addressee does X co-occur with? − Is X part of a particular lexical field (semantic category) and how does that field distribute across the plays? − How can the above help differentiate X word from Y word? − Etc. (1) a particular theoretical approach to meanings, (2) a particular methodology ….. enter Corpus Linguistics
  • 23. 23 Using large-scale literary corpora  For example, Matthew Jockers, Sarah Allison and others at Stanford University, using large collections of literary texts, from commercial providers, applying corpus linguistic and data mining techniques to address literary research questions e.g. Joe Shapiro comparing quantity of narrative v. descriptive passages in US 19th Century literature  Perhaps, particular potential for historical literary and linguistic studies
  • 24. 24 Basic methods: summary 1. Examine the norms in a general reference corpus 2. Perform text analysis on an electronic literary text 3. Make internal comparisons in a literary text 4. Analyse a literary corpus 5. Make internal comparisons in a literary corpus 6. Compare a text to a reference corpus 7. Compare a literary corpus to a non-literary corpus 8. Compare different literary corpora with each other 9. Build and annotate corpora 10. Others!
  • 25. 25 Methods: conclusion It is becoming increasingly possible to test empirically claims about the language of literature, to search for and provide evidence from texts, and to establish the norms of literary and non-literary style. Stylistics typically makes use of a toolkit of linguistic techniques, methods and resources. Corpus stylistics will become a powerful addition to this toolkit in the future.
  • 26. 26 Resources for Corpus Stylistics What do we need? ● Reliable electronic editions of literary texts ● Relevant reference corpora ● Analysis tools ● Interoperability ● Shared access ● Sustainability ● Methodology ● Expertise
  • 27. 27 Research Infrastructure The vision is for a set of relevant texts, corpora and tools, hosted in various locations around the world, available online from the user's desktop, via a single sign-on; all the resources and tools working together using high- speed connections and high-performance computing. Plus tools for showing, sharing and collaborating in a virtual workspace. CLARIN is working to build this infrastructure for the use of language resources and technologies across the humanities and social sciences.
  • 28. 28 Links Oxford Text Archive (OTA) http://www.ota.ox.ac.uk/ PALA Corpus Stylistics Special Interest Group http://www.pala.ac.uk/sigs/corpus-style/ Corpus-style mailing list http://www.jiscmail.ac.uk/lists/corpus-style.html Speech, Thought and Writing Presentation Project http://bowland-files.lancs.ac.uk/stwp/ British National Corpus http://www.natcorp.ox.ac.uk/ Brigham Young University Corpora from Mark Davies http://corpus.byu.edu/