Chapter2-answers
Chapter2-answers
Cambridge: Cambridge
University Press.
PHOTOCOPIABLE
a)
The; City; is; braced; the; city; is; braced; the; City; be; brace; THE; CITY; BE;
for; far; worse; for; far; worse; for; far; bad; figure; BRACE; FOR; FAR;
figures; to; come; in; figures; to; come; in; to; come; in; BAD; FIGURE; TO;
the; coming; coming; months; coming; month; COME; IN; COMING;
months; unless; the; unless; government; unless; Government; MONTH; UNLESS;
Government; recovery; package; recovery; package; GOVERNMENT;
recovery; package; produces; a; produce; a; startling; RECOVERY;
produces; a; startling; turn; turn; round; PACKAGE;
startling; turn; round; optimism optimism PRODUCE; A;
round; in; optimism STARTLING; TURN;
ROUND; OPTIMISM
b)
Of; 354; fifth-; and; of; 354; fifth-; and; Of; <NUMBER>; OF; <NUMBER>;
sixth-formers; who; sixth-formers; who; fifth-; and; sixth- FIFTH-; AND; SIXTH-
left; Sharon's; left; sharon's; formers; who; leave; FORMERS; WHO;
school; in; the; school; in; the; Sharon; school; in; LEAVE; SHARON;
summer; of; 1981; summer; 1981; the; summer; forty; SCHOOL; IN; THE;
forty; had; found; forty; had; found; have; find; real; job; SUMMER; FORTY;
real; jobs; by; 18; real; jobs; by; 18; by; November; four; HAVE; FIND; REAL;
November; four; of; november; four; these; enter; JOB; BY;
these; having; these; having; military; service NOVEMBER; FOUR;
entered; military; entered; military; THESE; ENTER;
service service MILITARY; SERVICE
1
An alternative solution: 24 if the case sensitive option is selected – The and the would be counted as two types.
2
Alternative solutions: a) 22 if turn round is understood as one lexical unit b) 22 if coming is lumped under the
headword come.
3
An alternative solution: 30 if hyphen considered as a token separator; in that case sixth and formers would be
considered as two tokens.
4
An alternative solution: 28 if the case sensitive option is selected – Of and of would be counted as two types.
5
An alternative solution: 25 if possessive suffix ’s is counted as a separate lemma.
1
Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge
University Press.
PHOTOCOPIABLE
c)
Erm; erm; erm; but; erm; but; yeah; and; erm; but; yeah; and; BUT; YEAH; AND;
yeah; and; people; people; er; have; people; er; have; PEOPLE; HAVE;
er; have; great; great; areas; of; great; area; of; that; GREAT; AREA; OF;
areas; of; that; taken that; taken take THAT; TAKE
d) This is a very specific example which includes meta-linguistic comments on the meanings/uses of the
form bow.
2) and 3) –
6
An alternative solution: 12 if the case sensitive option is selected – Erm and erm would be counted as two types.
7
The paralinguistic hesitation sounds (erm and er) in this utterance from a transcript of spoken conversation were
excluded because they do not have a semantic meaning.
2
Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge
University Press.
PHOTOCOPIABLE
6) N.B. Zipf’s law is only an approximation and the actual absolute frequencies in the table below differ
to some extent from the predicted ones.
7) Calculate the Range, the Standard deviation, the Coefficient of variation and Juilland’s D.
Note that the first step is to convert all absolute frequencies to relative frequencies as seen in the
table below.
3
Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge
University Press.
PHOTOCOPIABLE
BNC section Total no. of some (RF) smile (RF) theory (RF) chance (RF)
tokens
Fictionandverse 16,143,913 1,525 341 21 164
News-papers 9,412,174 1,118 32 28 275
Non-academic 24,178,674 1,785 16 164 91
proseand
biography
Academic prose 15,778,028 1,920 4 418 58
Otherwritten 22,390,782 1,691 22 57 148
material
Spoken 10,409,858 1,978 11 35 109
a) Range
some: 6
smile: 6
theory: 6
chance: 6
b) Standard deviation
some: 287.74
smile: 121.06
theory: 141.54
chance: 69.46
some: 0.17
smile: 1.71
theory: 1.17
chance: 0.49
4
Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge
University Press.
PHOTOCOPIABLE
d) Juilland’s D
some: 0.92
smile: 0.24
theory: 0.47
chance: 0.78
8) Use Juilland’s U usage coefficient to rank the words some, smile, theory and chance according to their
relative importance.
9) Calculate the ARF of the selected words in the BE06 corpus (985,628 tokens):
5
Materials from Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge
University Press.
PHOTOCOPIABLE
The book comes with a Companion website, which provides additional materials (answers to
exercises, datasets, advanced materials, teaching slides etc.) and Lancaster Stats Tools online, a free
click-and-analyse statistical tool for easy calculation of the statistical measures discussed in the book.