0% found this document useful (0 votes)
15 views50 pages

Week 3 Corpora, Collocations and the Study of Patterns NEW

Uploaded by

Khang Ninh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views50 pages

Week 3 Corpora, Collocations and the Study of Patterns NEW

Uploaded by

Khang Ninh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Corpora, collocations and the study of patterns in language

Dr Amy Wang
Work in pairs and discuss the questions below:

1. Is the information useful for your students?


2. Can you use the information directly in your
class? Why? Why not?
3. If you can use it in the class, how can you
use it?
Information in the previous slide:
generated from corpus
What is corpus?
What is a corpus?
Corpus (pl. corpora) = a collection of texts
sampled to represent one or more
varieties of a language
Focus on the central or typical usage in
language – everyday speakers/writers
Systematic sampling – usually parts of
texts, rather than whole texts
Contain many words (typically thousands
or millions) – but not the whole Internet
Aspect Details Examples Insights
Corpus focuses on natural, real- Reflects actual language
Definition life usage by everyday conversations patterns, not prescribed
speakers/writers. grammar.
Common in spoken
Colloquial “Gonna” instead
Informal phrases and idioms. contexts, rare in formal
Expressions of “going to”.
writing.
Contractions dominate
High-frequency words, phrases, “Don’t” vs. “do casual communication but
Frequency
and grammatical structures. not”. are less frequent in formal
contexts.
Data sourced from conversations,
Contextual Ensures the corpus reflects
emails, social media, newspapers, -
Diversity diverse real-world usage.
etc.
Hesitations, repetitions, “Where you at?” Ellipsis and simplifications
Natural
contractions, and variations based (informal are typical in casual spoken
Features
on context. grammar). English.
Improves relevance and
Language learning, NLP, AI trained on practicality in language
Applications
lexicography. real-life data. teaching and computational
tools.
Aspect Details Examples Insights
Selecting specific Sampling parts Ensures
parts of texts for instead of full texts representativeness
Definition
inclusion in a (e.g., first 200 words while maintaining
corpus. of news articles). manageability.
Capture typical Focus on relevant
Provides a practical
Goal language use sections, avoid
yet balanced dataset.
efficiently. redundancy.
Include varied Avoids
Why
Efficiency, focus, contexts without overrepresentation of
Systematic
and balance. overwhelming data specific styles or
Sampling?
size. topics.
Areas in which corpora help the study of
meaning

A. Ordering of senses in a dictionary

B. Revealing of conceptual associations

C. Revealing of attitudinal meanings

And…

Language learning and teaching


A. Ordering of senses in a dictionary
• assist lexicographers in arranging word senses in
a dictionary in a more precise and accurate way,
e.g.
A corpus analysis can reveal the different senses of
“bat”, such as a flying mammal, a sports
equipment, or a quick movement
• determine the most common or salient sense,
which should appear first in a dictionary entry,
e.g.
"bat" is most frequently used to refer to the sports
equipment, this sense will be listed first in the
dictionary
B. Revealing of conceptual associations
(denotation)
• identify the associations between words and
concepts, shedding light on the semantics of
language, e.g.
the word "apple" is frequently collocated with words
like "fruit," "red," "tree," and "juice," which reveals the
typical associations that speakers have with this word
• gain insights into how people perceive and
represent concepts, which can be especially useful
for fields like cognitive linguistics and
psycholinguistics
C. Revealing of attitudinal meanings (connotation)
• uncover attitudinal meanings associated with words
and phrases, e.g.
the word "freedom" can have both positive and negative
connotations, depending on its context. A corpus
analysis can reveal the contexts in which "freedom" is
used positively (e.g., "cherish freedom") and negatively
(e.g., "abuse of freedom")
• how words carry evaluative or emotional content and
can be useful in sentiment analysis or understanding
the nuances of language in different social and
cultural contexts (semantic prosody) *
D. Language learning and teaching
• use corpora to explore how words and phrases
are used in authentic contexts, gaining a better
understanding of word meanings, collocations,
and typical usage, e.g.
a learner can use a corpus to see how the word
"run" is used in various phrases like "run a
business," "run a marathon," or "run an errand."
• help learners develop a more nuanced
understanding of the word.
• design more effective lessons and exercises
based on authentic language data to improve
their students' language skills.
Quiz (A)
What senses (ie. meanings) does the
word LIABLE have?

Which of those senses is the most


common?
A. Corpus evidence used to
(re)order dictionary senses
LIABLE.
Has two main meanings
1. legally responsible for the cost of
something: [+for] Tenants are liable for any
damage they cause

2. be liable to do sth: to be likely to do


or say something or to behave in a
particular way, esp. because of a fault
or natural tendency: The car is liable to
overheat on long trips
Corpus evidence used to reorder
senses (cont’d)
Eg. LIABLE

First meaning listed in the Longman


Dictionary of Contemporary English (2nd
edition) was sense 1. (legally responsible)
After consulting the BNC, Longman
lexicographers swapped the order of the
senses for Longman Dictionary of
Contemporary English (3rd edition)
More helpful for learners of English!
Collocations
Collocation = The habitual co-occurrence of
words/linguistic items in close proximity to one
another (Hoffmann et al. 2008:264)
E.g. environment + look after, harm
E.g. fish + chips, catch, stocks
Collocations: some caveats

Some people also insist on ignoring basic


grammatical words like this and the
Stylistic Analysis:
researchers might want to focus on content words,
such as nouns, verbs, and adjectives
Topic Modeling:
In natural language processing and topic modeling,
researchers identify topics within a corpus of text for
more coherent and meaningful topics
function words like "this" or "the" are often ignored
or treated as "stop words" because they don't
provide much information about the grammatical
structure or meaning of a sentence
Aspect Explanation Example

A technique in NLP for discovering Identifying topics like "sports,"


Definition hidden topics within a large corpus "politics," and "economy" from a
of text. collection of news articles.
In medical research, the model might
The algorithm finds groups of
group words like "cancer,"
How It Works words that frequently appear
"treatment," and "genetics" into one
together and assigns them to topics
topic.
Used in analyzing large amounts of Analyzing social media posts for
Applications unstructured text like documents, trending topics or identifying research
social media posts, or articles. trends in academic journals.
Helps organize and summarize Helps researchers find recurring
Benefits large text datasets, uncover hidden themes in texts and improve document
patterns, and improve search. retrieval.
Identifying topics in research papers,
Example Use Discovering emerging topics in a
such as "pharmaceutical research" or
Case large corpus of texts.
"healthcare policy."
Collocations: some caveats
analyzing collocations, some
researchers or linguists also check for
statistical significance
• to ensure that the word combinations they
identify are not merely a random
occurrence, but rather have a high
likelihood of being meaningful or
purposeful
• To determine that the identified
collocations are likely to be a result of
language patterns or associations, not
chance (by accepting the results only if
they have a very low probability of being
accidental)
E.g. "heavy rain" is a statistically significant
collocation in weather discussions
Why are collocations important?

Collocations are an important window on


the meaning of words (see Sinclair 1995,
Stubbs 2001, Partington 2002)
They reveal phraseology in language –
i.e. units of meaning bigger than the
single word, but distinct from syntax
Analysis of collocations is connected with
the corpus-driven approach to language
study
Collocations are an important window on the
meaning of words (see Sinclair 1995, Stubbs
2001, Partington 2002)
Col. provide insights into their semantic relationships
and contexts in which they are typically used
Semantic Nuances:
Each of "warm smile," "polite smile," "forced smile," or
"broad smile" provides a different shade of meaning
with various types of smiles and the emotions or
situations they convey
Word Associations:
Col. reflect the associations between words.
E.g. reveal how "fast" in “fast food”, “Fast internet” is
conceptually linked to specific domains, emphasizing
Collocations are an important window on
the meaning of words (see Sinclair 1995,
Stubbs 2001, Partington 2002)
Cultural and Contextual Insights:
Col. provide cultural and contextual insights.
In English, "tea" in "morning tea" or "afternoon
tea," reflects cultural practices related to tea
consumption
Collocational Patterns:
reveal grammatical and syntactic aspects of a
language. e.g. "red apple," "big house") helps
learners and linguists grasp the language's
word order rules.
They reveal phraseology in language – i.e. units of
meaning bigger than the single word, but distinct from
syntax
Col. sheds light on phraseology in terms of meaning and grammar
• Non-motivated/idiomatic meaning (non-literal):
E.g. "In a nutshell" is a phraseological unit where the combination
of words "in" and "a nutshell" conveys a specific meaning of a
whole chunk that cannot be deduced from the individual words
alone.
• Beyond rules of syntax (clause/sentence structure)
The analysis of "Take a shower” as a phraseological unit with a
specific meaning is not governed by the usual rules of syntax: V +
Od
* What/ did/ you/ take? (-) but What did you do? (+)

Obj Aux Subj V


I /took /a shower (-) but I /took a shower.
*Subj V Obj (-) S/ Vphrase
What’s the matter with you? (Odd: What have you made?)
I’ve made a big mistake.
Analysis of collocations is connected with
the corpus-driven approach to language
study
Corpus-Driven Approach (CDA):
What It Is: involves collecting and analyzing
real-world language data from diverse sources, such
as books, articles, websites, transcripts, or spoken
conversations.
Why It's Used:
• allows researchers to study language as it is
naturally used by speakers and writers
• provides a rich source of linguistic patterns,
vocabulary, and usage that reflects the actual
language in context.
Analysis of collocations is connected with the corpus-
driven approach to language study
Collocations:
What They Are:
recurrent word combinations or phrases where certain
words tend to appear together more often than would be
expected by chance.
Why They Are Important:
vital for comprehending how words are used in real
language and how their meanings and associations are
shaped by the context in which they appear. E.g.
"make a decision," "make a mistake," "make an effort," and
"make money." reveal the patterns of word combinations of
native speakers’ natural use when employing the word
"make."
The corpus-driven approach helps uncover these recurring
Collocational patterns with naked eye

The phrase naked eye is quite common


in English

- What collocations are found with


naked eye?
- Can we group the collocations into
larger patterns of meaning?
Collocates of naked eye (BNC)

What pattern can you observe? Please discuss in a


group of three.
Collocations as evidence of semantic
preferences
Semantic preferences

generalisations you can make about the semantic


categories of the collocates (i.e. the words in the
neighbourhood of the node item - a specific word or
term that serves as a central point of focus within a
linguistic or text analysis).
 At N-3 (or nearby) you will almost always find a
collocate like see(n), observed, (in)visible, apparent,
e.g.
see with the naked eye (in)visible to the naked
eye
seen/observed by the naked eye apparent to the
naked eye

 suggests a semantic preference for the meaning of


VISIBILITY in the neighbourhood of naked eye.
Other collocation patterns with naked eye
• Adjectives such as visible take
preposition to
– Visible/invisible/obvious to the ~

• Verbs such as see take preposition with


– See/seen/viewed with the ~
– Grammatical collocations of the above
type are called colligations

(Source Tognini-Bonelli 2001: 104-6)


Other collocation patterns with naked eye
To the left we find collocates such as difficult, barely,
faint, invisible, just

These convey the notion of DIFFICULTY

In corpus linguistics, connotational meaning of this


type is called semantic prosody, or discourse prosody
(Source Tognini-Bonelli 2001: 104-6)

• Positive or Negative Connotations: *


E.g. the collocation "economic growth" has a positive
connotation, while "economic downturn" has a
negative connotation. When you group these
collocations, you can see a pattern of positive and
negative economic outcomes.
Semantic Prosody (/Discourse Prosody)
Tendency for a word or linguistic item to convey a positive or
negative attitude
Semantic prosody operates at the level of Pragmatics, or
Discourse, rather than Semantics
• connotations or associations a word may carry beyond its literal
meaning
• how words or phrases contribute to the overall meaning of a larger
discourse or conversation, rather than just their individual
dictionary definitions, e.g.
Semantics: "Water is a liquid” (no connotations/associations)
Pragmatics: “He offered her a glass of water, and her face lit up with a
smile.“ (associated with the idea of refreshment and delight)
Discourse: “He offered her a glass of water, and her face lit up with a
smile. As she took a refreshing sip, she felt a sense of relief wash over
her. She looked at John with gratitude, and in that moment, the cool
water became not just a simple drink, but a symbol of hope and care
in the scorching desert.
Aspect Deep Profound
Refers to something that extends far down Describes sth that has great meaning,
General
or has great intensity, either physically, significance/depth, in intellectual,
Meaning
emotionally, or intellectually. philosophical, or emotional contexts.
Deep thinking, deep emotion, deep Profound impact, profound wisdom,
Common
understanding, deep feelings, deep profound effect, profound insight,
collocations
knowledge. profound change.
Often used in both physical (depth of Primarily used in intellectual,
Context of
space) and abstract contexts (intensity of philosophical, emotional, or reflective
Use
emotion or thought). contexts.
Can describe both physical depth (e.g., Used mainly in abstract or intellectual
Physical vs.
deep hole, deep sea) and abstract contexts, describing significance or
Abstract
concepts (e.g., deep thoughts). intensity.
- He made a profound statement about
- She has deep knowledge of the subject. life.
Examples
- The deep ocean is mysterious. - The profound effects of the pandemic
are still being felt.
Can overlap with profound in some
used in similar contexts as deep but
Interchan intellectual or emotional contexts but
emphasizes significance/ importance
geability focuses more on the intensity or
of sth, rather than just its depth.
depth.
Deep or profound?
1. She has a __________ knowledge of history, covering many periods and
cultures.
2. The river is too __________ to swim across safely.
3. There is a __________ variety of dishes to choose from in this restaurant.
4. The company is aiming for a __________ international presence.
5. The car has __________ tires, making it ideal for off-road driving.
6. He has a __________ understanding of the topic, but he lacks specific details.
7. The team made a __________ impact in the community with their charity
work.
8. They explored __________ areas of interest in their research.
"The new policy was met with universal praise."
o Collocation: "Universal praise" (Positive or Negative?)
"The manager was appalled by the team's lack of effort."
o Collocation: "Appalled by" (Positive or Negative?)
"The company’s efforts were rewarded with tremendous success."
o Collocation: "Tremendous success" (Positive or Negative?)
"The decision led to an unfortunate disaster."
o Collocation: "Unfortunate disaster" (Positive or Negative?)
Rewrite the following sentences to shift the semantic prosody from
positive to negative, or vice versa. (Modify to give a negative semantic
prosody)
"She achieved remarkable success in her career."
"The outcome of the meeting was a great victory."
"The customer was delighted with the service."
"It was a great opportunity to network with professionals."
Other semantic prosodies
Work in pairs.

What kind of semantic prosody do the following


semantically similar verbs have? Words with
positive or negative meanings?
CAUSE vs. BRING ABOUT?
COMMIT
END UP + -ing
GET-passive
PROVIDE
UNDERAGE
TEENAGER
Corpus Linguistics Week 4 Lecture
3
Corpus Linguistics Week 4 Lecture
3
Please discuss in a group of three.

How can you use corpus for your English


classes?

Please think of different stages, i.e. ‘before


your class’, ‘during your class’ and ‘after
your class’?
Before Your Class:
Preparation and Planning:
• Access a relevant corpus, such as BNC
• Select specific texts/examples that match the
topic/ language point you plan to teach
• Identify common language patterns or
collocations within the corpus with focus on in
your lesson
Creating Learning Materials:
• Develop worksheets, presentations, or exercises
that use real-life examples from the corpus, e.g.
extract sentences or phrases that demonstrate the
use of specific grammar structures or vocabulary
During Your Class:
Language Analysis:
• Introduce the corpus to your students, explaining its
purpose and how it's used for language analysis.
• Engage your students in hands-on activities, such as
analyzing concordance lines (lines showing how a word
is used in context) from the corpus.
• Encourage students to identify patterns and learn from
real-world language usage, e.g.
• analyze how certain phrasal verbs are used in different
contexts.
Interactive Exercises:
• Conduct exercises that involve students searching the
corpus for specific language patterns or vocabulary
usage, e.g.
ask students to find examples of idiomatic expressions or
After Your Class:
Homework and Self-Study:
• to explore the corpus further, reinforcing what they
learned in class.
• use the corpus for self-study to research topics, explore
new vocabulary, or practice using specific language
structures.
Feedback and Reflection:
• review the findings and insights from the corpus-based
exercises.
• students share their observations and discuss any
questions or challenges they encountered while using the
corpus.
Language Production:
• Incorporate the language patterns and vocabulary from
the corpus into speaking and writing exercises during
subsequent classes.
Semantic prosody (cont.)– is the missing
word perfectly or utterly?
Corpus evidence of phraseological patterns
– collocations and semantic prosody

• Most instances of utterly – in context – convey a


negative semantic prosody: an attitude of the
writer/speaker that the situation is ‘bad’ or unfortunate
Corpus evidence of phraseological patterns
(cont.)
“You shall know a word by the company it keeps”
(Firth 1957: 11)
distributional analysis:
• to truly grasp the meaning and usage of a word,
you need to consider its context and associations
• to examine the words that tend to occur in its
proximity or within the same context
Corpus evidence of phraseological patterns
(cont.)
“So strong are the co-occurrence tendencies of words,
word classes, meanings and attitudes that we must
widen the horizons and expect the units of meaning to
be much more extensive and varied than is seen in a
single word” (Sinclair 1995)
• words, their grammatical categories (word classes),
meanings, and the attitudes they convey often show
strong patterns of co-occurrence
• tend to appear in predictable combinations or
contexts, often associated with specific meanings and
attitudes
• expand our perspective beyond analyzing single words
in isolation
• the "unit of meaning" is not just a single word but a
Collocation and teaching (1)
1. Demonstrating the usefulness of L2 collocation
knowledge. These are publications that show
strong associations between learners’
mastery of collocation and their general levels
of (speaking and/or writing) proficiency.
2. Assessing L2 learners’ collocation
knowledge. This theme includes comparisons
of natives’ and learners’ use of collocation, and
also the development and validation of test
instruments to measure collocation knowledge.

(Boers and Webb, 2017:


79)
Summary / conclusions
Collocations as a window on the meaning
of words and linguistic items – a useful way
to help distinguish near-synonymous words
Concept of node and collocate
Semantic preference
Semantic prosody

We will be looking at all of these in the


seminar.
BNCweb registration

http://bncweb.lancs.ac.uk/bncwebSignup/user/l
ogin.php

1. Please find out the collocations of the word


‘jolly’ in BNC.

2. What differences are there between ‘little’


and ‘small’?

https://www.english-corpora.org/
Find out more (* = strongest
recommended)
* Hoffman, S., S. Evert, N. Smith, D. Lee and Y. Berglund-
Prytz. 2008. Corpus linguistics with BNCweb – a
Practical Guide. Frankfurt: Peter Lang, pp.139-160.

* McEnery, T., R. Xiao & Y. Tono (2005): Corpus-based


Language Studies: An Advanced Resource Book.
London: Routledge, pp. 80-85, 148-152.
Teubert, W. and A. Cermakova. 2004. Corpus Linguistics:
A Short Introduction. London: Continuum.
• Tognini-Bonelli, E.. 2001. Corpus Linguistics at Work.
Amsterdam: Benjamins, eg. pp.104-6.

You might also like