0% found this document useful (0 votes)

12 views

Chapter2-answers

The document provides answers to exercises from Brezina's book on statistics in corpus linguistics, detailing the identification of tokens, types, lemmas, and lexemes in various examples. It also includes calculations of relative frequencies, predictions using Zipf's law, and statistical measures such as range, standard deviation, and Juilland's D. Additionally, the book serves as a practical guide for understanding statistical principles in linguistic research and offers supplementary online resources.

Uploaded by

dercioalbertoj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Chapter2-answers

Uploaded by

dercioalbertoj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide.

Cambridge: Cambridge
University Press.
PHOTOCOPIABLE

Chapter 2: Exercises – answers

1) Identify the no. of tokens, types, lemmas and lexemes.

Tokens (26) Types (231) Lemmas (232) Lexemes (23)

The; City; is; braced; the; city; is; braced; the; City; be; brace; THE; CITY; BE;
for; far; worse; for; far; worse; for; far; bad; figure; BRACE; FOR; FAR;
figures; to; come; in; figures; to; come; in; to; come; in; BAD; FIGURE; TO;
the; coming; coming; months; coming; month; COME; IN; COMING;
months; unless; the; unless; government; unless; Government; MONTH; UNLESS;
Government; recovery; package; recovery; package; GOVERNMENT;
recovery; package; produces; a; produce; a; startling; RECOVERY;
produces; a; startling; turn; turn; round; PACKAGE;
startling; turn; round; optimism optimism PRODUCE; A;
round; in; optimism STARTLING; TURN;
ROUND; OPTIMISM

Tokens (293) Types (274) Lemmas (245) Lexemes (24)

Of; 354; fifth-; and; of; 354; fifth-; and; Of; <NUMBER>; OF; <NUMBER>;
sixth-formers; who; sixth-formers; who; fifth-; and; sixth- FIFTH-; AND; SIXTH-
left; Sharon's; left; sharon's; formers; who; leave; FORMERS; WHO;
school; in; the; school; in; the; Sharon; school; in; LEAVE; SHARON;
summer; of; 1981; summer; 1981; the; summer; forty; SCHOOL; IN; THE;
forty; had; found; forty; had; found; have; find; real; job; SUMMER; FORTY;
real; jobs; by; 18; real; jobs; by; 18; by; November; four; HAVE; FIND; REAL;
November; four; of; november; four; these; enter; JOB; BY;
these; having; these; having; military; service NOVEMBER; FOUR;
entered; military; entered; military; THESE; ENTER;
service service MILITARY; SERVICE

1
An alternative solution: 24 if the case sensitive option is selected – The and the would be counted as two types.
2
Alternative solutions: a) 22 if turn round is understood as one lexical unit b) 22 if coming is lumped under the
headword come.
3
An alternative solution: 30 if hyphen considered as a token separator; in that case sixth and formers would be
considered as two tokens.
4
An alternative solution: 28 if the case sensitive option is selected – Of and of would be counted as two types.
5
An alternative solution: 25 if possessive suffix ’s is counted as a separate lemma.

1
Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge
University Press.
PHOTOCOPIABLE

Tokens (14) Types (126) Lemmas (12) Lexemes (107)

Erm; erm; erm; but; erm; but; yeah; and; erm; but; yeah; and; BUT; YEAH; AND;
yeah; and; people; people; er; have; people; er; have; PEOPLE; HAVE;
er; have; great; great; areas; of; great; area; of; that; GREAT; AREA; OF;
areas; of; that; taken that; taken take THAT; TAKE

d) This is a very specific example which includes meta-linguistic comments on the meanings/uses of the
form bow.

Tokens (26) Types (18) Lemmas (19) Lexemes (20)

Homonyms; are; homonyms; are; Homonyms; be; Homonyms; be;

headwords; to; headwords; to; headword; to; headword; to;
different; entries; different; entries; different; entry; different; entry;
that; are; spelt; in; that; spelt; in; the; that; spell; in; the; that; spell; in; the;
the; same; way; e.g.; same; way; e.g.; same; way; e.g.; same; way; e.g.;
bow; the; weapon; bow; weapon; bow; weapon; bow; weapon; bow;
bow; the; action; action; verb; action; bow; verb; action; bow; verb;
bow; the; verb; expressing; expressing; expressing;
expressing; the;
action

2) and 3) –

4) Calculate the relative frequencies.

a) muggle: 0.2 per 10k

b) intriguingly: 0.3 per million

b) worse: 49.6 per million

6
An alternative solution: 12 if the case sensitive option is selected – Erm and erm would be counted as two types.
7
The paralinguistic hesitation sounds (erm and er) in this utterance from a transcript of spoken conversation were
excluded because they do not have a semantic meaning.

2
Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge
University Press.
PHOTOCOPIABLE

5) Use Zipf’s law to predict absolute frequencies.

rank word absolute frequency

1. the 6,041,234
2. of 3,020,617
3. and 2,013,745
4. to 1,510,309
5. a 1,208,247
10. was 604,123
50. so 120,825
100. way 60,412
1,000. limited 6,041
10,000. conveniently 604

6) N.B. Zipf’s law is only an approximation and the actual absolute frequencies in the table below differ
to some extent from the predicted ones.

rank word absolute frequency

1. the 6,041,234
2. of 3,042,376
3. and 2,616,708
4. to 2,593,729
5. a 2,164,238
10. was 881,473
50. so 239,116
100. way 95,701
1,000. limited 10,312
10,000. conveniently 622

7) Calculate the Range, the Standard deviation, the Coefficient of variation and Juilland’s D.

Note that the first step is to convert all absolute frequencies to relative frequencies as seen in the
table below.

3
Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge
University Press.
PHOTOCOPIABLE

BNC section Total no. of some (RF) smile (RF) theory (RF) chance (RF)
tokens
Fictionandverse 16,143,913 1,525 341 21 164
News-papers 9,412,174 1,118 32 28 275
Non-academic 24,178,674 1,785 16 164 91
proseand
biography
Academic prose 15,778,028 1,920 4 418 58
Otherwritten 22,390,782 1,691 22 57 148
material
Spoken 10,409,858 1,978 11 35 109

a) Range

some: 6

smile: 6

theory: 6

chance: 6

b) Standard deviation

some: 287.74

smile: 121.06

theory: 141.54

chance: 69.46

c) the Coefficient of variation

some: 0.17

smile: 1.71

theory: 1.17

chance: 0.49

4
Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge
University Press.
PHOTOCOPIABLE

d) Juilland’s D

some: 0.92

smile: 0.24

theory: 0.47

chance: 0.78

8) Use Juilland’s U usage coefficient to rank the words some, smile, theory and chance according to their
relative importance.

Juilland's D AF (whole Juilland's U (Juilland's

corpus) D × AF)

1. some 0.92 167,050 153,686.00

2. chance 0.78 12,809 9,991.02

3. theory 0.47 12,809 6,020.23

4. smile 0.24 6,848 1,643.52

9) Calculate the ARF of the selected words in the BE06 corpus (985,628 tokens):

a) frigid: ARF = 1.02

b) chemistry: ARF = 3.17

c) porn: ARF = 4.6

5
Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge
University Press.
PHOTOCOPIABLE

Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical

Guide. Cambridge: Cambridge University Press.

Do you use language corpora in your research or study, but find

that you struggle with statistics? This practical introduction will
equip you to understand the key principles of statistical thinking
and apply these concepts to your own research, without the need
for prior statistical knowledge. The book gives step-by-step
guidance through the process of statistical analysis and provides
multiple examples of how statistical techniques can be used to
analyse and visualise linguistic data. It also includes a useful
selection of discussion questions and exercises which you can use
to check your understanding.

The book comes with a Companion website, which provides additional materials (answers to
exercises, datasets, advanced materials, teaching slides etc.) and Lancaster Stats Tools online, a free
click-and-analyse statistical tool for easy calculation of the statistical measures discussed in the book.

Greek Frequency Dictionary 1 - Essential Vocabulary - 2500 Most Common Greek Words
No ratings yet
Greek Frequency Dictionary 1 - Essential Vocabulary - 2500 Most Common Greek Words
226 pages
Zgusta Ladislav Manual of Lexicography PDF
100% (4)
Zgusta Ladislav Manual of Lexicography PDF
356 pages
What Constitutes A Basic Spoken Vocabulary PDF
No ratings yet
What Constitutes A Basic Spoken Vocabulary PDF
4 pages
A Detailed Lesson Plan Grade 7 Second Quarter I. Objectives
100% (1)
A Detailed Lesson Plan Grade 7 Second Quarter I. Objectives
5 pages
Ex Words Answers
No ratings yet
Ex Words Answers
5 pages
LabSet 02
No ratings yet
LabSet 02
12 pages
A
No ratings yet
A
291 pages
Krámský, Jiří: Verb-Form Frequency in English
No ratings yet
Krámský, Jiří: Verb-Form Frequency in English
11 pages
Week 4 note
No ratings yet
Week 4 note
5 pages
2020b Frequency & Word Lists
No ratings yet
2020b Frequency & Word Lists
36 pages
Full Download Weighting Evidence in Language and Literature A Statistical Approach 1st Edition Barron Brainerd PDF DOCX
100% (6)
Full Download Weighting Evidence in Language and Literature A Statistical Approach 1st Edition Barron Brainerd PDF DOCX
40 pages
Erker Guy 2012
No ratings yet
Erker Guy 2012
32 pages
(Parker & Riley) CHP 5 Morphology
No ratings yet
(Parker & Riley) CHP 5 Morphology
20 pages
Curriculum 2020 Lexical Elementary
No ratings yet
Curriculum 2020 Lexical Elementary
377 pages
Weighting Evidence in Language and Literature A Statistical Approach 1st Edition Barron Brainerd all chapter instant download
100% (3)
Weighting Evidence in Language and Literature A Statistical Approach 1st Edition Barron Brainerd all chapter instant download
41 pages
elal_1_Laws_Ryder
No ratings yet
elal_1_Laws_Ryder
15 pages
Natural Langauge Processsing Unit 2
No ratings yet
Natural Langauge Processsing Unit 2
16 pages
E-Pack_ALL2425_c7d539b8f54e05a1162dee85e2a5b28a
No ratings yet
E-Pack_ALL2425_c7d539b8f54e05a1162dee85e2a5b28a
25 pages
023 - 2002 - V1 - Frantisek Cermak - Types of Language Nomination - Universals, Typology and Lexicographica
No ratings yet
023 - 2002 - V1 - Frantisek Cermak - Types of Language Nomination - Universals, Typology and Lexicographica
11 pages
How Many Words Do You Need To Know in Spanish (Or Any Other For PDF
No ratings yet
How Many Words Do You Need To Know in Spanish (Or Any Other For PDF
12 pages
2004 2001
No ratings yet
2004 2001
2 pages
Lexis and Morphology 1
33% (3)
Lexis and Morphology 1
24 pages
1 - The Goals of Vocabulary Learning - 3rd Edition
No ratings yet
1 - The Goals of Vocabulary Learning - 3rd Edition
41 pages
N-Grams - Text Representation
No ratings yet
N-Grams - Text Representation
23 pages
The Goals of Vocabulary Learning: Counting Words
No ratings yet
The Goals of Vocabulary Learning: Counting Words
35 pages
Linguistic Learning Practice Portfolio
No ratings yet
Linguistic Learning Practice Portfolio
28 pages
Top secret
No ratings yet
Top secret
6 pages
Larimerla 2 1
No ratings yet
Larimerla 2 1
3 pages
1 Some Aspects of Word Frequencies
No ratings yet
1 Some Aspects of Word Frequencies
25 pages
Formal Methods Public
No ratings yet
Formal Methods Public
307 pages
What Is The Lexical Approach? Michael Lewis PAGS 7 A 14
No ratings yet
What Is The Lexical Approach? Michael Lewis PAGS 7 A 14
8 pages
Ilovepdf Merged (1)
No ratings yet
Ilovepdf Merged (1)
63 pages
BF03200731
No ratings yet
BF03200731
11 pages
Methods and Procedures of Lexicological Analysis
No ratings yet
Methods and Procedures of Lexicological Analysis
4 pages
Zgusta, Ladislav - Manual of Lexicography-Academia (1971)
No ratings yet
Zgusta, Ladislav - Manual of Lexicography-Academia (1971)
356 pages
Methods of Data Gathering
No ratings yet
Methods of Data Gathering
34 pages
Componential Analysis (I) Classical Structuralism
No ratings yet
Componential Analysis (I) Classical Structuralism
23 pages
The Academic Vocabulary List: Sample of Core Academic Words. Complete Lists and Detailed Information at
No ratings yet
The Academic Vocabulary List: Sample of Core Academic Words. Complete Lists and Detailed Information at
1 page
Swadesh1952-78sga
No ratings yet
Swadesh1952-78sga
2 pages
Significance Testing of Word Frequencies in Corpora
No ratings yet
Significance Testing of Word Frequencies in Corpora
52 pages
The Effects of Type and Token Frequency On Word Le-1
No ratings yet
The Effects of Type and Token Frequency On Word Le-1
37 pages
2023 01 Overview Students
No ratings yet
2023 01 Overview Students
23 pages
2023_01_Overview_students
No ratings yet
2023_01_Overview_students
23 pages
1. Book Interior - Portuguese Frequency Dictionary 1st Edition
No ratings yet
1. Book Interior - Portuguese Frequency Dictionary 1st Edition
223 pages
The OEC: Facts About The Language: How Many Words Are There in English?
No ratings yet
The OEC: Facts About The Language: How Many Words Are There in English?
5 pages
Exercises For Week 5
100% (1)
Exercises For Week 5
9 pages
Academic Vocabulary Frequency List A 20000 Word List
No ratings yet
Academic Vocabulary Frequency List A 20000 Word List
998 pages
Spoken Language A Major Challenge To Linguistic Theory and Methodology
No ratings yet
Spoken Language A Major Challenge To Linguistic Theory and Methodology
30 pages
Linguistic Learning Practice Portfolio
No ratings yet
Linguistic Learning Practice Portfolio
21 pages
Most Common Words in English - Wikipedia
No ratings yet
Most Common Words in English - Wikipedia
8 pages
Vocabulary II - Day 1
No ratings yet
Vocabulary II - Day 1
85 pages
SInclair Corpus Concordance Collocation
71% (7)
SInclair Corpus Concordance Collocation
20 pages
SInclair Corpus Concordance Collocation
No ratings yet
SInclair Corpus Concordance Collocation
20 pages
2. Book Interior - Portuguese Frequency Dictionary 1st Edition
No ratings yet
2. Book Interior - Portuguese Frequency Dictionary 1st Edition
225 pages
(Original PDF) Quantitative Corpus Linguistics with R Second Edition download
100% (9)
(Original PDF) Quantitative Corpus Linguistics with R Second Edition download
44 pages
Kornai2020 Chapter Lexemes
No ratings yet
Kornai2020 Chapter Lexemes
28 pages
2019 Main
No ratings yet
2019 Main
9 pages
Morpho Syntax
No ratings yet
Morpho Syntax
136 pages
Student Grammar Workbook
No ratings yet
Student Grammar Workbook
5 pages
The Pronunciation of New Testament Greek: Judeo-Palestinian Greek Phonology and Orthography from Alexander to Islam
From Everand
The Pronunciation of New Testament Greek: Judeo-Palestinian Greek Phonology and Orthography from Alexander to Islam
Benjamin Kantor
No ratings yet
Morphological Variation in a Population of the Snake, Tantilla gracilis Baird and Girard
From Everand
Morphological Variation in a Population of the Snake, Tantilla gracilis Baird and Girard
Charles J. Cole
No ratings yet
Modes of Presentation: 1) Extemporaneous 2) Manuscript 3) Impromptu 4) Memorization
100% (1)
Modes of Presentation: 1) Extemporaneous 2) Manuscript 3) Impromptu 4) Memorization
15 pages
Vedic Characters
No ratings yet
Vedic Characters
25 pages
English Grammar
100% (2)
English Grammar
51 pages
Level Four Finale Exam
No ratings yet
Level Four Finale Exam
2 pages
Abdalla MAHAJNEH ScoreReport Vtest Eng LR Ge 20240919-072757-UTC
No ratings yet
Abdalla MAHAJNEH ScoreReport Vtest Eng LR Ge 20240919-072757-UTC
2 pages
Kindergarten DLL Q3W4d5
No ratings yet
Kindergarten DLL Q3W4d5
4 pages
RPP
No ratings yet
RPP
6 pages
Spell Check and Soundex
No ratings yet
Spell Check and Soundex
19 pages
Final Parsing Exercise: Derived Stems: Stem Conjugation PGN Lexical Form Prefix/Suffix
No ratings yet
Final Parsing Exercise: Derived Stems: Stem Conjugation PGN Lexical Form Prefix/Suffix
6 pages
XI Descriptive English Bi-Annual
No ratings yet
XI Descriptive English Bi-Annual
3 pages
TAN - Meaning in The Cambridge English Dictionary
No ratings yet
TAN - Meaning in The Cambridge English Dictionary
9 pages
Feminin Et Masculin
No ratings yet
Feminin Et Masculin
2 pages
Main Features of Linguocultural Concepts in Linguistics
No ratings yet
Main Features of Linguocultural Concepts in Linguistics
4 pages
Academic Language in Diverse Classrooms English Language Arts Grades 3 5 Promoting Content and Language Learning 1st Edition Margo Gottlieb All Chapters Instant Download
100% (2)
Academic Language in Diverse Classrooms English Language Arts Grades 3 5 Promoting Content and Language Learning 1st Edition Margo Gottlieb All Chapters Instant Download
81 pages
Music Notation As Analysis: Nicolas Meeùs
100% (1)
Music Notation As Analysis: Nicolas Meeùs
7 pages
Mastering_Advanced_Punctuation
No ratings yet
Mastering_Advanced_Punctuation
10 pages
Dell Hymes 1 PDF
No ratings yet
Dell Hymes 1 PDF
16 pages
Lesson Plan Vocabulary
No ratings yet
Lesson Plan Vocabulary
3 pages
TOEFL Grammar-Review
No ratings yet
TOEFL Grammar-Review
11 pages
Multilingualism Report
No ratings yet
Multilingualism Report
78 pages
Performance: Criterion
No ratings yet
Performance: Criterion
12 pages
Elementary - New Year Resolution: Visit The - C 2008 Praxis Language LTD
No ratings yet
Elementary - New Year Resolution: Visit The - C 2008 Praxis Language LTD
3 pages
SOT 1st Year
No ratings yet
SOT 1st Year
9 pages
Translating Literary Prose Problems and Solutions
No ratings yet
Translating Literary Prose Problems and Solutions
16 pages
Eng8 Wk4.1
No ratings yet
Eng8 Wk4.1
8 pages
Conditional Sentences
100% (1)
Conditional Sentences
6 pages
Digital-Dialects-The-Relationship-Between-Sociolects-and-Netizens-Grammar-Awareness-on-Social-Media5
No ratings yet
Digital-Dialects-The-Relationship-Between-Sociolects-and-Netizens-Grammar-Awareness-on-Social-Media5
6 pages
Mock Interview Rubric
No ratings yet
Mock Interview Rubric
1 page
Francesca-Teaching Conditional Discrimination
No ratings yet
Francesca-Teaching Conditional Discrimination
53 pages

Chapter2-answers

Uploaded by

Chapter2-answers

Uploaded by

Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide.

Chapter 2: Exercises – answers

Tokens (26) Types (231) Lemmas (232) Lexemes (23)

Tokens (293) Types (274) Lemmas (245) Lexemes (24)

Tokens (14) Types (126) Lemmas (12) Lexemes (107)

Tokens (26) Types (18) Lemmas (19) Lexemes (20)

Homonyms; are; homonyms; are; Homonyms; be; Homonyms; be;

4) Calculate the relative frequencies.

a) muggle: 0.2 per 10k

b) intriguingly: 0.3 per million

b) worse: 49.6 per million

5) Use Zipf’s law to predict absolute frequencies.

rank word absolute frequency

rank word absolute frequency

c) the Coefficient of variation

Juilland's D AF (whole Juilland's U (Juilland's

1. some 0.92 167,050 153,686.00

2. chance 0.78 12,809 9,991.02

3. theory 0.47 12,809 6,020.23

4. smile 0.24 6,848 1,643.52

a) frigid: ARF = 1.02

b) chemistry: ARF = 3.17

c) porn: ARF = 4.6

Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical

Do you use language corpora in your research or study, but find

You might also like