Using Corpora For Language and Linguistic Research: Exercise 1: Learn The Basics
Using Corpora For Language and Linguistic Research: Exercise 1: Learn The Basics
1. Go to http://corpus.byu.edu/coca/
2. Register, so that you can long in every time you need to use the corpus.
3. Go to ACCOUNT Usage Limits, to see how many queries you are allowed per day depending on your
user status.
4. Check out corpus information by clinking on these tabs. This will give you information about the size of the
corpus, and the different genres included in it, etc.
5. Go to SEARCH, and type the word nice, then hit find matching strings.
6. Check out the FREQ of the word, then tick the box next to the word to retrieve all the contexts where the
word has been used.
8. Also notice where each context has been retrieved from, and from what year.
9. Notice also that you can download a random sample from the corpus consisting of 100 – 200 – 500 – 1000
words. You may choose your data sample size, hit the button, then copy and paste the contexts into an excel
sheet.
10. If you’re interested in particular contexts you can save them in a list that you can go back to later. You need to
provide a title for your list – read more about the function of save list under HELP.
11. You can find out more about the distribution of a word (or structure) in the different genres within the corpus
by clicking on Chart.
12. You may also search for strings of words. Type lose weight and see what you can get.
13. REMEMBER: always hit reset before you start a new search .
14. Hit the Collocates button, and then type nice in the given search box, then hit Find collocates.
15. You can limit your search to particular words that collocate with your search word, which you need to provide
in the second box. Again remember, if the corpus gives you error messages, hit the reset button to make sure
you don’t give any conflicting search commands.
16. Notice that you can specify the part of speech of the collocates you’re interested in, as well as the
distance/location of that word in relation to the KWIC (i.e. before or after the KWIC, two/three/four words
after the KWIC, etc.)
17. You may choose the part of speech that you’re interested in of the collocates. What do you find when you type
the following?
18. What do you think the following search command is looking for?
19. If you need to check out the difference between synonymous words, you can do that by hitting the Compare
button, and typing both words in the given search boxes.
20. Notice how the results change when you change the conditions of the search (e.g. part of speech of the
collocates, distance of the collocate, etc.)
22. So far we’ve experimented with inflected forms. However, if you’re searching for a verb, e.g. ‘go’ and would
like to retrieve contexts containing all inflected forms of ‘go’: go, went, gone, going, goes, etc. you need to
type your search word in square brackets.
23. The lexical form in brackets is the lemma (i.e. the un-inflected form of the lexical item, listed in the
dictionary) and typing it in the search box will yield all of its inflected forms; whereas when we remove the
brackets in our search we are looking for particular inflected forms.
24. you can try that again now with [nice]. What inflected forms can you find?
25. Type *more in the search box and examine the frequency tables. Now type more* and see what you get.
26. Type more * than and again check out the frequency table. This is a partial construction where the
asterisk (*) indicates a missing part that the corpus fills in with existing words/morphemes.
27. What does the following string mean and what outcomes do you expect the corpus to provide?
Now that you’ve learned the basics of corpus search, go ahead and have fun experimenting with other lexical
items/constructions!