Data Driven Learning
Data Driven Learning
Data Driven
Learning
Prof. Charles Meyer
Apostolos Koutropoulos
5/12/2009
Data Driven Learning Overview
Data Driven Learning derives from the work of Tim Johns when he suggested
that instructors should use corpora in language learning classrooms. (Braun, 2007)
from corpus linguistics in the service of language learning." (Payne, 2008) The
benefits of data driven learning is that the focus is on "the exploitation of authentic
materials even when dealing with tasks such as the acquisition of grammatical
structures and lexical items [...], on real, exploratory tasks and activities rather than
traditional «drill & kill» exercises, [...] on learner-centred activities," and on "the use
(Rüschoff)
Data driven learning differs from traditional grammar learning in a few ways.
For starters, the pedagogical approach to teaching grammar the traditional way is
students practice with this information, and then the students produce new content.
the language, they hypothesize as to how this grammatical phenomenon works, and
In traditional language and grammar learning, the teacher is the driver and
the students the passengers, while in data driven learning the teacher is more of a
co-pilot and navigator and the students are able to sit in the driver's seat and take
control of their learning. Because of this difference in the pedagogic approach, the
2
materials used are also different in a data driven learning classroom compared to
traditional classrooms.
grammar in an (sic.) system that ignores the nature of English and of authentic
communication using English." (Byrd, 1997) This of course poses a few issues that
are outlined by Byrd, such as the inconsistency in defining what is easy grammar
versus hard grammar, the ability to cover all the material in a given curriculum, and
Data driven learning solves some of these issues by not relying on textbooks,
but rather relying on corpora, "a body of text assembled according to explicit
design criteria for a specific purpose," (Payne, 2008) concordancing programs and
keyword-in-text (KWIC). By using corpora, you are using authentic text from the
target language both in your instruction of that language and the grammar of that
language. Thus you are exposing students to material that they are likely to come
Resource Evaluation
Running a quick Google search one does not find a lot of resources pertaining
deeper I found a number of journals, such as ReCALL, where I was able to find more
information about data driven learning in practice. What I found interesting was that
data driven learning was the focus of experiments in classrooms in Europe, Asia and
the Middle East, however I did not find any articles where the focus of the
3
experiment was an ESL classroom in the United States where corpora are available
and primary materials such as video and audio files are also easily available.
The resources that I found fall into two categories. The first category is the
Passapong Sripicharn's website is one resources that has materials that are
features by providing the students with concordance data and guiding them to
concordancing and the keyword-in-context (KWIC) format in lessons one and two.
Following lessons provide the students with, what seem to be, handpicked
sentences and KWIC examples that are used to illustrate some feature of the
language. Some examples of the types of exercises used are: deducing the
meaning of a word, putting sentences in order so that they make sense, fill-in-the-
because they require interpretation and the online format does not allow for
4
interpretations which may or may not be true. For example, in Unit 251 there is a
concordance using KWIC with hectic being the keyword. In this example there are
five sentences using hectic and the students are asked to pick the word which has
the closest meaning to hectic. If you click on what the author intended you to click
you get a pop-up window that says correct and if you guess wrong you get a pop-up
window that says try again. The available synonym options are boring, busy, bad,
and lively. The answer marked as correct is busy, however I could also see lively
and bad as being viable options. From personal use I know that I've used hectic to
mean bad quite a few times. If things are busy, I just say that it was a busy week.
There are also good examples of using collocation to determine the meaning
us with KWIC text for the expressions on the road to and on the brink of. Based on
the collocating words that the students observe in these examples they then are
asked to pick the right expression to fill in the blank in an exercise. The relationship
that we see is that on the road to is followed by something positive and on the brink
must account for, such as someone using on the road to hell, or on the road to
Usage in my Classroom
The major question about data driven learning is whether or not I would be
information I found in my searches for both materials and ideas to use in the
1
http://www.geocities.com/tonypgnews/ddl_25.htm
2
http://www.geocities.com/tonypgnews/ddl_30.htm
5
classroom and research articles I would say that I would use data driven learning,
but I would not replace a whole curriculum with data driven learning. Instead I
would use data driven learning techniques to either teach specific topics, or I would
use it as a type of exercise for the students to use in the process of learning the
target language.
Since the "DDL approach suggests that grammar learning should consist
(Mansour & Ali Akbar, 2006) I would have to see where I could best fit such an
approach. In addition, the other reason why I wouldn't go all in with data driven
learning is because all of the literature seems to indicate that there is no clear cut
proof that data driven learning is superior to any other methods of teaching.
One of the factors that I would need to consider before I implemented data
driven learning would be class size and required resources for data driven learning
activities. If the class size is too large, it may not be possible to conduct data driven
exercise. As homework other factors need to be taken into account such as if the
students have computers at home, if they've got access to the corpora that you
want to use and if they've got access to the tools to do a KWIC analysis.
The second factor that I would have to consider is the expectations of the
with learning a set of rules, and if such an element is lacking in the classroom, the
students might feel like they have not learned something. In Hadley (Hadley, 2002)
for instance we see that "Kerr (1993) found in his survey of 100 teacher trainees
6
that attitudes toward grammar ranged from viewing it as an abstract set of rules, to
expressing feelings of terror. Similar sentiments are found in Chalker (1994), who
notes that many classroom teachers equate grammar with the acquisition of some
set of rules -- rules that are at times contradictory and at other times confusing."
In Braun (Braun, 2007) we see that several students felt that they hadn't
learned any grammar because they did not write down any grammar rules. Braun
notes that "such statements reflect prevailing perceptions about learning: it is still
seen as something that happens only if, or as soon as, something is being written
down." (Braun, 2007) I think that in order for data driven learning to gain
acceptance from the students the teacher needs to do two things. The first thing is
for the teacher to explain to the students that experiential learning will not only
help them deduce the rules but will also help them in learn how to analyze text
when they are in situations where they don’t know something in a text and they
don’t have the benefit of having someone with them that can explain it. The second
thing that teachers should do is to provide a summary of what the students have
learned at the end of each exercise and tie that in with other established rules. This
way the teacher helps arrange the rules that the students have synthesized through
their analysis of corpora through data driven learning, and the students who feel
that they haven’t learned anything because they didn’t write down any rules can
rest easy because they can now write down a few rules.
One final factor to decide is whether or not to use a full corpus, such as the
corpus of contemporary American English3 and go full speed ahead and let the
students do their own concordances, or whether to filter the material and provide
the students with printed out concordances such as the ones provided on
3
http://www.americancorpus.org/
7
Passapong Sripicharn’s (2005) website. I think this would depend both on the level
of the students in the classroom, the technology limitations, the classroom makeup,
and whether or not I would want my students to focus on a specific corpus. For
instance I may want to focus on blog language for a few lessons to illustrate a few
points. I could develop a corpus on my own for that set of lessons and hand it out to
students to use with their concordancers. Some corpora on the other hand may
concordances.
I think that in the end it comes down to knowing your students. As Hadley
(2001) writes about problems with data driven learning, students can become
demotivated if they get too much on their plate in terms of data, and they might not
be able to have sufficient material to analyze if they don't get enough data. The
conformable with, so students aren't scaffolded, but rather are being asked to leap
and hope they can grab on to the level that the materials are on.
At this point Hadley points out that the teacher is stuck between a rock and a
hard place. The teacher can "simplify the concordance material and lessen its
authenticity," (Hadley, 2001)like what Passapong Sripicharn (2005) did for his
website, "or maintain the authenticity and risk demotivating some students because
Personally I think that if curricula incorporate data driven learning from early
ages in language learning, it's perfectly OK to choose simpler and less authentic
material. As the students mature, you can scaffold them onto more challenging and
8
as an ESL classroom, where you might find mixed levels of background knowledge
and analytical approaches, data driven learning can still be used. In this instance
the teacher needs to do an assessment of the student's skills and prepare material
as needed for each student. This way each student will be performing at the level
that they feel comfortable with, while at the same time scaffolding to a more
advanced level.
9
Bibliography
Boulton, A. (2009). Testing the limits of data-driven learning: language proficiency
and training. (F. Blin, & J. Thompson, Eds.) ReCALL: the journal of EUROCALL.
, 21 (1), 37-54.
Braun, S. (2007, September 6). Beyond Data-Driven Learning: Learning activities for
a spoken multimedia corpus. Retrieved April 30, 2009, from European Youth
Language: http://www.um.es/sacodeyl/data/conferences/eurocall2007/Beyond
%20Data-Driven%20Learning_eurocall2007_sb.ppt
Braun, S. (2007). Integrating Corpus Work into Secondary Education: From Data-
Driven Learning to Needs-Driven Corpora. (F. Blin, & J. Thompson, Eds.)
ReCALL: the journal of EUROCALL. , 19 (3), 307-328.
Cobb, T., Greaves, C., & Horst, M. (2001). Can the rate of lexical acquisition from
reading be increased? An experiment in reading French with a suite of on-line
resources. In P. Raymond, & C. Cornaire, Regards sur la didactique des
langues secondes. (pp. 133-153). Montreal, QC, Canada: Éditions logique.
Infante, P. (2009, April). Explicit Grammar Instruction: Theory & Research. Retrieved
April 30, 2009, from Applied Linguistics Student Association:
http://alsaclub.ning.com/forum/attachment/download?
id=2643024%3AUploadedFi38%3A1481
John's, T. (2000, August 1). Retrieved April 30, 2009, from Tim John's Data-Driven
Learning Page:
http://www.ecml.at/projects/voll/our_resources/graz_2002/ddrivenlrning/whati
sddl/resources/tim_ddl_learning_page.htm
Lamy, M.-N., Klarskov Mortensen, H. J., & Davies, G. (2009). ICT4LT Module 2.4:
Using concordance programs in the Modern Foreign Languages classroom.
10
Retrieved April 30, 2009, from Information and Communication Technology
for Language Teachers: http://www.ict4lt.org/en/en_mod2-4.htm
Lee, D. (2007). Teaching & Misc. Links. Retrieved April 30, 2009, from David Lee's
Bookmarks for corpus-based linguistics: http://devoted.to/corpora
Mansour, K., & Ali Akbar, J. (2006). Data-driven Learning and Teaching collocation of
prepositions: The Case of Iranian EFL Adult Learners. (J. Jung, & P. Robertson,
Eds.) Asian EFL Journal , 8 (4), 192-209.
Mukherjee, J. (2005). Data Driven Learning. Retrieved April 30, 2009, from Anglistik
Language Centre Giessen: http://http://www.uni-
giessen.de/anglistik/ling/ALC/ddlintro.html
Passapong, S. (2005). My DDL Materials. Retrieved April 30, 2009, from Tony's DDL:
http://geocities.com/tonypgnews/units_index_pilot.htm
Payne, J. S. (2008, June 8). Data-Driven South Asian Language Learning. Retrieved
April 30, 2009, from The University of Chicago South Asian Language
Resource Center:
http://salrc.uchicago.edu/workshops/sponsored/061005/DDL.ppt
Rüschoff, B. (n.d.). Data-Driven Learning (DDL): the idea. Retrieved April 30, 2009,
from
http://www.ecml.at/projects/voll/rationale_and_help/booklets/resources/menu
_booklet_ddl.htm
11