The_flipped_classroom_in_second_language_learning_
The_flipped_classroom_in_second_language_learning_
research-article2020
LTR0010.1177/1362168820981403Language Teaching ResearchVitta and Al-Hoorie
LANGUAGE
TEACHING
Article RESEARCH
Joseph P. Vitta
Rikkyo University, Japan
Ali H. Al-Hoorie
Royal Commission for Jubail and Yanbu, Saudi Arabia
Abstract
Flipped learning has become a popular approach in various educational fields, including second
language teaching. In this approach, the conventional educational process is reversed so that
learners do their homework and prepare the material before going to class. Class time is then
devoted to practice, discussion, and higher-order thinking tasks in order to consolidate learning.
In this article, we meta-analysed 56 language learning reports involving 61 unique samples
and 4,220 participants. Our results showed that flipped classrooms outperformed traditional
classrooms, g = 0.99, 95% CI (0.81, 1.17), z = 10.90, p < .001. However, this effect had high
heterogeneity (about 86%), while applying the Trim and Fill method for publication bias made it
shrink to g = 0.58, 95% CI (0.37, 0.78). Moderator analysis also showed that reports published
in non-SSCI-indexed journals tended to find larger effects compared to indexed ones, conference
proceedings, and university theses. The effect of flipped learning did not seem to vary by age,
but it did vary by proficiency level in that the higher proficiency the higher the effects. Flipped
learning also had a clear and substantial effect on most language outcomes. In contrast, whether
the intervention used videos and whether the platform was interactive did not turn out to be
significant moderators. Meta-regression showed that longer interventions resulted in only a slight
reduction in the effectiveness of this approach. We discuss the implications of these findings and
recommend that future research moves beyond asking whether flipped learning is effective to
when and how its effectiveness is maximized.
Keywords
CALL, flipped learning, foreign language learning, research synthesis, second language learning
Corresponding author:
Joseph P. Vitta, Kyushu University, Fukuoka Prefecture, Fukuoka, 819-0395, Japan.
Email: [email protected]
Vitta and Al-Hoorie 1269
I Introduction
Education has traditionally been viewed as the transfer of information from the teacher
to learners within the context of the classroom, though the desire to move away from
this paradigm has existed for some time (e.g., Freire, 1968/1970). Although flipped
learning was not the first paradigm to challenge this traditional model, it has recently
emerged as a popular and topical alternative to teacher-dominated instruction across
various educational domains (van Alten, Phielix, Janssen, & Kester, 2019) and espe-
cially in the second language (L2) field (Mehring & Leis, 2018). Flipped learning (or
flipped classrooms) is colloquially described as a process of ‘flipping’ what has tradi-
tionally been done inside the classroom to independent homework activities preced-
ing the lesson. Thus, the lesson involves problem-solving and higher-order thinking
tasks traditionally assigned to subsequent homework activities (Låg & Sæle, 2019;
Mehring, 2016, 2018). Over the past several decades, flipped learning has become
one of the most discussed trends in education for both practitioners and researchers.
Consider that a non-profit organization, the Flipped Learning NetworkTM (www.
flippedlearning.org; see Hamdan, McKnight, McKnight, & Arfstrom, 2013), has also
been established to help teachers flip their classrooms more effectively, while confer-
ences are regularly taking place around the globe for teachers to share techniques and
tips on this approach.
Another clear indication of the interest flipped learning has generated is the amount
of research conducted on it. Meta-analyses and systematic reviews subsequently appeared
across varied domains such as higher education (Lundin, Bergviken Rensfeldt, Hillman,
Lantz-Andersson, & Peterson, 2018), engineering education (Lo & Hew, 2019), health
professions education (Hew & Lo, 2018), nursing education (Xu et al., 2019), and L2
learning (Turan & Akdag-Cimen, 2019). In all of these review studies, the number of
flipped reports has increased dramatically over time. The same trend is observed in com-
prehensive reviews comparing the effectiveness of flipped learning across educational
domains (e.g. Cheng, Ritzhaupt, & Antonenko, 2019; Låg & Sæle, 2019; Shi, Ma,
MacLeod, & Yang, 2020). In sum, flipped learning has grown to be one of the most
influential phenomena within the broad educational arena.
While flipped classroom research has grown exponentially in recent years and has
been the focus of several meta-analyses and systematic reviews, the effectiveness of the
flipped classroom for L2 learning has admittedly been under-researched. Consider a
recent synthesis by Turan and Akdag-Cimen (2019), who conducted a systematic review
of 43 published L2 reports. Because their work was a systematic review of published
reports, their article does not present summary effect size estimates or moderator analy-
ses, nor does its scope cover unpublished reports, thus raising the risk of publication bias.
In their recent meta-analysis of flipped classroom interventions across educational
domains, Shi et al. (2020) included only six L2 reports that were subsumed under a gen-
eral ‘social sciences’ label. Strelan, Osborn, and Palmer (2020) in a similar vein sub-
sumed second language flipped reports under a broader ‘humanities’ label. L2 reports
were also subsumed under humanities in another comprehensive recent flipped meta-
analysis (Låg & Sæle, 2019) with the authors noting that about 70% (k = 23) of these
humanities reports were L2-focused.
1270 Language Teaching Research 27(5)
II Flipped learning
1 Definition of flipped learning
Despite its popularity, flipped learning has been somewhat inconsistently defined by edu-
cational researchers and practitioners (Mehring & Leis, 2018; van Alten et al., 2019). In a
general sense, there is agreement on the ‘flipped’ or ‘inverted’ aspect of the approach,
where classroom teaching and independent learning are switched. The disagreement is on
what exactly flipping a classroom means. For some (e.g. Bergmann & Sams, 2012;
Mehring, 2018), the essence of the flipped approach is this pedagogical shift to presenting
new content before class, allowing the teacher and students to apply this new content in
meaningful ways during class time. The manner in which the content is presented to stu-
dents outside of the class (i.e. via technology or not) is assumed to be inconsequential.
From this perspective, flipped learning has been described as having its roots in the 1980s
when active learning emerged within educational circles emphasizing learning by doing
(Ryback & Sanders, 1980). For other flipped theorists (e.g. Adnan, 2017; Evseeva &
Solozhenko, 2015), flipped learning is heavily dependent on (digital) technology to allow
students to engage with new content. This latter definition appears to be especially popular
in recent L2 flipped learning scholarship, where both primary studies (e.g. Chen Hsieh,
Wu, & Marek, 2017; Hung, 2015, 2017) and research syntheses (e.g. Turan & Akdag-
Cimen, 2019) have included or emphasized technology in their definitions of flipped appli-
cations. In the context of the present study, we have followed the example of recent flipped
learning meta-analyses (e.g. Låg & Sæle, 2019) and adopted a general definition where: “A
flipped intervention first involves presentation of new content to learners to be indepen-
dently studied before class, and then class time is devoted to reinforcing and engaging
with the ‘flipped’ content”.
Vitta and Al-Hoorie 1271
This body of research has also relied heavily on technology. Some research reflected
the argument that flipped learning went hand in hand with technology by emphasizing
the use of technology, such videos and apps, to deliver the content outside of the class-
room (Alnuhayt, 2018; Chen Hsieh et al., 2017), though little space is usually devoted to
explaining how extra class time was used. Some of these applications adopted a Web 1.0
framework (Lomicka & Lord, 2016). Mori, Omori, and Sato (2016), for instance, used
PowerPoint and other one-way technology to flip their teaching of Japanese writing
characters, kanji. On the other hand, technology was also used to facilitate student inter-
action outside of the class via learning management systems and well-known Web 2.0
applications such as chat boards and blogs (Lin & Hwang, 2018; Lin, Hwang, Fu, &
Chen, 2018).
Given the flexibility of this approach, this wide range manner in which flipped learn-
ing has been applied should not be surprising. Theorists such as Bergmann and Sams
(2012) and Mehring (2016, 2018) emphasized the need for flipped applications to maxi-
mize class time for higher-order thinking activities, while investigators such as Hung
(2015) and AlJaser (2017) detailed how the lesson was used to facilitate cognitively
engaging and student-centered tasks when describing their flipped interventions. On the
other hand, Chen Hsieh et al. (2017) and Alnuhayt (2018) focused more on how the fea-
tures of technology were used to ‘flip’ the content. L2 flipped learning applications have
thus varied in their contexts, learning outcomes, use and engagement with technology,
and focus on class time use.
1. To what extent does the flipped learning approach improve L2 learning compared
to traditional classroom teaching?
2. To what extent does the effectiveness of the flipped learning approach vary by L2
learning outcome?
1274 Language Teaching Research 27(5)
V Method
1 Inclusion criteria
In order to qualify for inclusion in the present meta-analysis, the report had to satisfy the
following inclusion criteria:
2 Literature search
Following standard practice in meta-analyses, we conducted a keyword-driven database
search to build our report pool. However, given the particular features of L2 research
(discussed below), we commenced our search at the journal level and then moved to the
database level. In total, our literature search process had four stages.
Stage 1. As our meta-analysis was L2-specific, we expected the bulk of L2 flipped learn-
ing studies to be found in L2 journals. We therefore focused the initial stage of our search
on these journals (the following stages expanded this scope). We first created a list of 73
L2 and educational technology journals adapted from previous bibliometric work and
relevant flipped literature (Al-Hoorie & Vitta, 2019; Mehring, 2016; Vitta & Al-Hoorie,
Vitta and Al-Hoorie 1275
2017; Zhang, 2020; for the complete list; see Appendix A). Considering the inconsist-
ency in author-supplied keywords in L2 journals (see Lei & Liu, 2019), which could
limit our ability to obtain a comprehensive list of flipped learning reports, we then uti-
lized the Scopus search engine to search articles in these journals. The Scopus search
engine permits searching the title, abstract, keyword list, and other meta-data of each
article (Burnham, 2006). We used the keywords flip*, invert*, and blend*. We included
blend* because L2 researchers tend to view flipped learning as a pedagogic approach to
blended learning (Chen Hsieh et al., 2017; Hung, 2015; Teng, 2017). Journals were
searched with an ‘all time’ parameter where each journal was searched comprehensively
without time range limitations.
Although this step helped us avoid relying on author-supplied keywords, we still
wanted to ensure that our Scopus search was indeed comprehensive. We manually
inspected all articles in all issues of eight relevant journals (CALL-EJ, Computer Assisted
Language Learning, ReCALL, Language Learning & Technology, CALICO Journal,
Teaching English with Technology, JALTCALL, International Journal of Computer-
Assisted Language Learning and Teaching). Each journal was inspected after its auto-
mated processing, and this manual search did not uncover additional reports not captured
by the automated Scopus search, thus raising confidence in our search protocols.
Stage 2. We then expanded the search to EBSCO and ProQuest. Within EBSCO data-
base, our search covered OpenDissertations, Academic Search Ultimate, ERIC, and Edu-
cation Research Complete. Within ProQuest, our search covered Educational Database,
Linguistics Database, Psychology Database, and Social Science Database, as well as
ProQuest Thesis and Dissertation Global. In addition to the search keywords above, we
further limited the search at this stage by adding L2-specific keywords (second language
or foreign language or L2 or ESL or EFL) to filter out research conducted on other par-
ticipants. As with Stage 1, there were no time constraints, and the search was performed
at the ‘full text’ level with subsequent relevance ordering to facilitate a quicker screening
of false negatives.
Stage 3. In an attempt to minimize publication bias, we issued a call for papers request-
ing reports meeting our inclusion criteria. This call for papers was announced in various
L2 outlets including Linguist List, BAALmail, Korea TESOL, and IATEFL Research
SIG, as well as social media.
Stage 4. We finally conducted a saturation search to ensure our search was comprehen-
sive. We performed an ancestry search in three recent L2 flipped learning syntheses
(Filiz & Benzet, 2018; Mahmud, 2018; Turan & Akdag-Cimen, 2019) to find out whether
they included reports not captured by our search. We also searched two generic data-
bases: Google Scholar and AskZad. These two databases contain reports from non-
indexed journals as well as theses and dissertations not found in ProQuest.
Our literature search concluded in August 2019, resulting in 56 unique reports satisfying
our inclusion criteria (for the complete list; see Appendix B). Comparing the number of
reports in our pool to the domain-specific meta-analyses in Table 1, we note that it was
larger than that by Lo and Hew (2019, k = 29), Låg and Sæle (2019, k = 23), and Xu et al.
1276 Language Teaching Research 27(5)
ProQuest Call
Identification:
Total initial EBSCO ProQuest
reports Scopus† Theses for
Papers
n= n=
n = 54,630 n=
1,355 20,600
148,202 n=4
titles and abstracts
after inspection of
Reports retained
Screening:
n= n=
n = 1,781 n = 719 n=4
1,700 1,300
Reports retained
after inspection
of full texts
Eligibility:
n = 41 n = 23 n=7 n = 13 n=4
Saturation Search
(Ancestry Search,
Google Scholar,
AskZad)
n = 14
reports retained
Included: Total
after removing
duplicates
n = 56
(2019, k = 22). It was also larger than the number of quantitative reports found in L2
flipped learning systematic reviews, including Turan and Akdag-Cimen (2019, k = 21) and
Filiz and Benzet (2018, k = 25). Figure 1 presents a flow diagram of our search process.
3 Moderators
To operationalize research questions 2 and 3, we coded the reports for three groups of
moderators related to learners, report source, and design characteristics, the latter sub-
suming flipped application and methodological design features.
Regarding learner characteristics, we coded for educational stage: elementary, inter-
mediate, secondary, and university. We coded adult learners as university learners (k = 2).
Vitta and Al-Hoorie 1277
Report type k
Journal:
SSCI and Scopus 14
Scopus only 12
Neither Scopus nor SSCI 19
Other:
Conference proceeding 4
Thesis/dissertation 7
We eventually compared secondary and university learners only due to the small number
of reports on the other educational stages (k = 3 combined). In previous L2 meta-analyses
(e.g. Bryfonski & McKay, 2019), proficiency was omitted because of the inherent diffi-
culty of standardized proficiency judgments across reports. In light of this, we imple-
mented a three-category proficiency moderator: 1) below intermediate, 2) intermediate,
and 3) above intermediate. Intermediate was anchored to B1 according to the CERF. As
an illustration, Ishikawa et al. (2015) was coded as ‘below intermediate’ as the reported
TOEIC scores were within the A2 range of 250 to 550; Karimi and Hamzavi (2017) was
coded as intermediate since reported Cambridge PET scores established a B1 level. The
remaining studies were coded in the same manner where either empirical evidence or an
argument anchoring the learners’ proficiency (e.g. to the CEFR) was presented as evi-
dence of the learners’ proficiency. Reports spanning multiple levels or omitting such pro-
ficiency evidence were not coded, and those reporting proficiency in a manner that makes
such anchoring not possible were likewise not coded.
As for report source, some reports did not undergo conventional editorial-driven peer
review (e.g. conference proceedings and university theses). Some methodologists rec-
ommend including such reports for comprehensiveness (e.g. Norris & Ortega, 2000), as
they may contain a higher proportion of statistically non-significant results (Dickersin,
2005). Similarly, it has been argued that reports published in journals have a higher like-
lihood of publication bias as significant results with noteworthy effects tend to be favored
(Fanelli, 2010). We therefore coded whether the report was published in a peer-reviewed
journal. Since there is also evidence suggesting that report quality can vary depending on
the indexing of the journal (Al-Hoorie & Vitta, 2019), we also compared these journals
in relation to their indexing in SSCI, Scopus, and other indices. Table 2 presents a break-
down of the report types in our pool.
We also examined the effect of certain design characteristics in two areas: flipped
application features and report methodological features. In relation to flipped applica-
tions, we examined the effect of whether the intervention utilized videos, and whether
the technology employed was interactive (Lo & Hew, 2019). An example of an ‘interac-
tive’ flipped intervention was Lin and Hwang (2018) where the content was presented
via Facebook, and students used the platform to discuss it with their peers and with the
instructor. In relation to methodological features, we examined whether the design
included an empirical pretest before the implementation of the treatment or relied on
1278 Language Teaching Research 27(5)
pre-existing holistic judgements, whether the reliability of dependent variable scores was
reported (Al-Hoorie & Vitta, 2019; Brown, Plonsky, & Teimouri, 2018), and how long
the intervention lasted (Cheng et al., 2019).
Finally, we tested whether the effectiveness of the flipped approach was related to the
L2 outcome targeted in the report. We compared the effectiveness of flipped learning on
the four skills (listening, speaking, reading, and writing) and two competencies (vocabu-
lary and grammar). When scores were combined across two or more L2 outcomes, we
coded the report as ‘multi-outcome’. Four reports had outcomes targeting performance
on standardized tests combining reading and listening scores (e.g. TOEIC Listening and
Reading). We coded these as ‘standardized tests’.
4 Data analysis
a Software. We used Comprehensive Meta Analysis 3.3 (Borenstein, Hedges, Higgins,
& Rothstein, 2014) for all analyses. We applied a random-effects model as we had no
reason to assume one common effect size underlying all reports (see Borenstein, Hedges,
Higgins, & Rothstein, 2009). We also examined heterogeneity using the I2-statistic and
its significance value. Significant heterogeneity suggests that the effect highly varies
from report to report, and this variability could potentially be explained through modera-
tor analysis of certain report characteristics.
b Publication bias. Publication bias can occur because of the tendency of journals to
favor significant results over non-significant ones. As a result, some non-significant
findings may not find their way to the research community, leading to what is commonly
known as the file-drawer problem (Rosenthal, 1979). We tested publication bias using
the Trim and Fill method (Duval & Tweedie, 2000a, 2000b). We also examined the
results of the classic fail-safe N test (Rosenthal, 1979), Orwin’s fail-safe N test (Orwin,
1983), and the p-curve (Simonsohn, Nelson, & Simmons, 2014) to further shed light on
potential bias.
c Coding. Initially, 40 reports were coded independently by two coders against our
inclusion criteria. This procedure resulted in 85% agreement (Cohen’s ᴋ = .70, p <
.001). All discrepancies were subsequently resolved by discussion until 100% agreement
was reached. The two coders then independently coded the effects of 16 reports (approxi-
mately 30%), resulting in 88% inter-coder agreement (ᴋ = .86, p < .001). All discrepan-
cies were also resolved by discussion until 100% agreement was reached. When a study
had multiple data collection points (e.g. several quizzes and a final exam), we used the
last test for the analysis (k = 5). If the report had multiple assessments for one dependent
variable (e.g. essay subdomains and an overall score), we used the most comprehensive
measure (k = 7). In one case, a report had two outcome variables; we selected the one
with the best construct validity corresponding to modern ‘complexity–accuracy–fluency’
theory (Pallotti, 2009).
d Effect size computation. Effect sizes were computed using Comprehensive Meta
Analysis software with Hedge’s g being the effect size metric employed to correct for
Vitta and Al-Hoorie 1279
smaller sample sizes. Each report was weighted by the inverse of its variance including
the estimated between-studies variance. Most effect sizes were directly estimated from
the means, standard deviations, and sample sizes. In cases where these data were una-
vailable, test statistics or other effect size metrics were used in tandem with sample size
to estimate g (for detailed formulae; see Borenstein et al., 2009). Thus, all selected
reports provided enough information to estimate effects. A small number of the reports
(k = 3) used within-participant designs. According to Lakens (2013), such effect sizes
are best estimated with gav when meta-analysing them with between-participant effects.
Nevertheless, gav values are always nearly identical to gs (for between-participant effects;
Lakens, 2013), and this was the case with our data. Therefore, g has been employed sub-
suming gs and gav.
VI Results
The reports included in our pool were interventions conducted in different parts of the
world, though the target language in almost all of these reports was English. Only a
minority of reports tested the effectiveness of the flipped approach on learning other
languages, such as Chinese (k = 2), Japanese (k = 2), and Korean (k = 1). Only a few
studies, also, reported the results for each gender separately (kfemale = 5, kmale = 2)
whereas the remainder reported the results for the two genders combined. Some of these
reports were unpublished university theses/dissertations (k = 7). As mentioned above,
most of these reports adopted a between-participant design, whereas a few were within-
participant (k = 3). These reports involved 61 unique samples and 4,220 learners.
Using a random-effects model, the results showed that groups receiving the flipped
intervention achieved significantly better than those receiving traditional face-to-face
teaching, g = 0.99, 95% CI (0.81, 1.16), z = 10.90, p < .001. This average effect size
exhibited substantial heterogeneity, Q(60) = 432.82, I2 = 86.14, p < .001. These results
indicate that around 86% of the dispersion of the true effect is over and above sampling
error and is potentially explainable by certain moderator variables.
In relation to publication bias, the classic fail-safe N test showed that 694 missing
reports would be required to bring the effect size down to zero, z = 26.02, p < .001. The
Orwin’s fail-safe N also showed that 58 additional reports are needed to reduce the effect
size to below 0.40, the generally recognized threshold for effective educational interven-
tions (Hattie, 2009). These results provide strong evidence of a non-zero effect size.
Similarly, the p-curve did not indicate evidence of questionable research practices such
as p-hacking (Figure 2). The p-curve included 45 statistically significant results (p <
.05), of which 38 were significant at p < .025.
However, the Trim and Fill method did suggest the possibility of publication bias. As
Figure 3 shows, reports with smaller samples tended to find larger effect sizes. This
analysis showed that there could be at least 17 missing reports. Adjusting for these miss-
ing reports made the average effect size shrink, g = 0.58, 95% CI (0.37, 0.78). This sug-
gests that the 0.99 effect size originally obtained might be inflated.2
In relation to research question 2, the moderator analysis revealed some interesting
results in relation to the target L2 outcomes investigated (Table 3). The findings showed
that flipped learning had a non-significant effect on reading and standardized tests, as
1280 Language Teaching Research 27(5)
0.1
0.2
Standard Error
0.3
0.4
0.5
-4 -3 -2 -1 0 1 2 3 4
Hedges's g
Figure 3. Funnel plot showing publication bias based on the fixed-effects model.
Note. Imputed results are filled dots.
the 95% confidence intervals overlapped with zero. The confidence intervals for the
reading outcome were also so wide that they were hardly informative, underscoring the
need for more research on reading. Vocabulary did show a significant effect, though
Vitta and Al-Hoorie 1281
lower confidence interval was barely above zero. In contrast, the effects were substan-
tial for writing, listening, grammar, speaking as well as assessments comprising multi-
ple outcomes.
1282 Language Teaching Research 27(5)
Table 4. Q-values in post hoc analyses showing whether differences in moderator levels are
significant.
Proficiency: 1 2 3
1. Below intermediate –
2. Intermediate 1.20 –
3. Above intermediate 7.11* 3.47† –
Report source: 1 2 3 4
1. Neither SSCI nor –
Scopus
2. Scopus only 0.50 –
3. SSCI and Scopus 3.24† 6.09* –
4. Thesis/conference 4.46* 7.52** 0.21 –
L2 outcome: 1 2 3 4 5 6 7
1. Writing –
2. Listening 0.03 –
3. Multi-outcome 2.14 0.73 –
4. Grammar 1.47 0.64 0.005 –
5. Speaking 0.53 0.05 0.14 0.46 –
6. Standardized tests 13.16*** 5.78* 6.33* 3.25† 10.58** –
7. Vocabulary 20.50*** 7.65** 12.12*** 5.01* 18.79*** 0.10 –
8. Reading 0.11 0.04 0.09 0.10 0.0003 1.66 2.06
Notes. †p < .10, *p < .05, **p < .01, ***p < .001.
When it comes to research question 3, the results did not provide evidence that learner
age, specifically whether they are in secondary or university level, was related to how
effective the flipped intervention was. In contrast, the effectiveness of flipped learning
varied significantly in relation to proficiency level. As the post hoc results in Table 4
show, learners with the higher proficiency were the ones exhibiting the larger effect sizes.
Regarding the type of the report itself, the analysis showed that peer reviewed journal
articles reported significantly larger effects than other types of reports such as conference
proceedings and unpublished theses. Furthermore, comparison by report source sug-
gested that the largest effect sizes came from journals not indexed in the SSCI (Table 4).
Analysis of whether the intervention used videos or not, and whether the technology
was interactive, did not result in a significant difference. Similarly, whether the research-
ers reported the reliability of their dependent variables did not seem to have an effect on
the results. The same applied to whether the researchers administered their own pretest
or relied on a pre-existing judgement or evaluation reported by learners.
Finally, we examined the relationship between the length of the intervention and its
effectiveness. Meta-regression analysis showed that there was a small negative effect of
duration of the study (see Figure 4 and Table 5), suggesting that the novelty of the
approach might slightly wane with time. One report lasted for 60 weeks, which was the
longest duration in our pool. Excluding that report led only a minor decrease in the coef-
ficient from −0.02 to −0.03 (see Figure 5).
Vitta and Al-Hoorie 1283
5.00
4.00
3.00
2.00
Hedges's g
1.00
0.00
-1.00
-2.00
-3.00
-4.00
-20.0 -10.0 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0
Duration in weeks
Figure 4. Meta-regression of the relationship between effect size and duration of the
intervention in weeks.
VII Discussion
The purpose of the present meta-analysis was to extend existing research synthesis work
on the effectiveness of flipped learning in the context of L2 learning. We aggregated
effect sizes in reports located through a broad literature search process that included dif-
ferent report types. In this section, we discuss the following three notable findings, in
relation to both research and practice, emerging from this meta-analysis:
•• There is clear evidence that flipped learning is effective for L2 learning overall
(research question 1).
•• Flipped learning seems more effective under certain conditions and for certain L2
outcomes (research questions 2 and 3).
•• Publication bias and methodological issues seem to have impeded accurate esti-
mation of the effect of flipped learning (research question 3).
1284 Language Teaching Research 27(5)
5.00
4.00
3.00
Hedges's g
2.00
1.00
0.00
-1.00
-2.00
-3.00
-10.0 -5.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0
Duration in weeks
Figure 5. Meta-regression of the relationship between effect size and duration of the
intervention in weeks after excluding one potentially outlier report.
et al. (2017) also had students draft ‘the final dialog collaboratively’ (p. 4) under the
conventional learning condition. Hung (2015) and Ishikawa et al. (2015) likewise had
their non-flipped learning groups engage in classroom discussions about the content
presented in class. While not all researchers intentionally used communicative activities
for their comparison groups, the fact that communicative features were observed in both
the flipped and non-flipped groups makes it unlikely that the results of the present meta-
analysis are attributable simply to communicative activities. Thus, flipped learning
applications in our report pool do not appear to be communicative language teaching by
another name, and are possibly superior in that additional, structured out-of-class activi-
ties are involved. Again, as this issue was not within the scope of the present meta-
analysis, direct comparative analysis between these two approaches seems an interesting
future direction.
the concern voiced by Milman (2012). Willis and Willis (2019) in a similar vein posited
that beginner students might have trouble engaging with student-centered task-based
language teaching (see also Vitta, Jost, & Pusina, 2019). Thus, there seems to be a theo-
retical basis for the positive association between proficiency and the effectiveness of
flipped learning. Should teachers seek to implement flipped learning with low-profi-
ciency learners, then extra care may need to be taken in preparing accessible and appeal-
ing content so that these learners can remain engaged with it outside of class time.
Alternatively, flipped learning applications to low-proficiency learners could require
greater first language (L1) and extra-linguistic support.
4 Future directions
As we mentioned above, the status of scholarship on flipped learning indicates that
researchers should move from the question of whether flipped learning is effective to
when and how it is so. To address these questions, we suggest two main future directions
Vitta and Al-Hoorie 1287
for the field. First, research should target different underrepresented L2 learners. Just like
it is the case in various L2 subdisciplines (Dörnyei & Al-Hoorie, 2017), L2 flipped learn-
ing research has been English-biased in that learners of languages other than English
have seldom been investigated. As with Lundin et al. (2018), our report pool was domi-
nated by university-level learners by ratio approaching 5:1. Younger learners were espe-
cially underrepresented, making it unclear to what extent flipped learning is effective
with younger learners considering that this approach presupposes a level of commitment
and self-directedness without the teacher’s direct supervision. It is likely that the type of
content that can attract this type of learners will be very different, and possibly more
demanding to prepare. In addition to young learners, older learners and those not suffi-
ciently skilled in or familiar with technology, including those based outside the devel-
oped world (e.g. only one of 56 reports was situated within an African context; Hassan,
2018) might also require different applications of the flipped approach.
A second future direction we recommend for flipped learning research has to do with
intervention quality. Part of understanding when and how flipped learning is effective is
to understand what features maximize its effectiveness. Little comparative analysis has
been conducted to investigate the various online platforms available to L2 teachers and
how their features influence learning (e.g. for lower proficiency learners). Another aspect
of the quality of flipped learning interventions is the teacher’s skill in preparing and
handling online materials. We suspect that teachers who can create custom materials on
demand to suit emerging needs of their particular classes will most likely be more effec-
tive. Investigation of these aspects requires a more micro-analysis of intervention qual-
ity. A further aspect of study quality is rigor in design and statistical analysis (Al-Hoorie,
2018; Hiver & Al-Hoorie, 2020a, 2020b). While Al-Hoorie and Vitta’s (2019) systematic
review found that the statistical quality varies based on the impact of the journal, the
present meta-analysis additionally showed that the actual results also vary. Further
research is needed to understand why the findings of high- and low-impact journals can
be discrepant (see Paiva et al., 2017). Tips and strategies for effective flipped learning
implementation can be found in Mehring and Leis (2018).
Next, research on flipped learning should therefore move to what Zanna and Fazio
(1982) called second-generation and third-generation questions. According to Zanna and
Fazio’s (1982) classification, first-generation research simply asks ‘is’ questions (e.g. is
flipped learning effective?). Second-generation questions move beyond this yes–no ques-
tion to ‘when’ questions (e.g. under what conditions does flipped learning become more
effective?) (see also Al-Hoorie & Al Shlowiy, 2020). Third-generation research asks ‘how’
questions (e.g. how is flipped learning effective?). This last type of questions inquires after
the mechanism, or mediators, making flipped learning effective. While this type of ques-
tions is described as third-generation, thus implying a temporal lag, in reality second- and
third-generation questions are ‘linked inextricably’ (Zanna & Fazio, 1982, p. 284).
Understanding under what conditions a treatment is effective might shed light on why it is
effective, and vice versa. It is at this point that practitioners-as-researchers can contribute
to the future directions of flipped research in L2 contexts as localized studies will be essen-
tial in addressing the second- and third-generation questions. To provide a specific exam-
ple, our findings highlight the need for frontline teachers to pilot and report flipped
approaches that focus on vocabulary outcomes and with lower proficiency learners.
1288 Language Teaching Research 27(5)
VIII Conclusions
The present study meta-analysed the effects of L2 flipped learning interventions.
Extending past flipped meta-analyses on flipped learning, our literature search was able
to locate about double the number of L2 experimental reports analysed in past syntheses.
Future endeavors could add to our approach by considering gray literature, however. Our
results also clearly demonstrate the effectiveness of this approach over the traditional
face-to-face approach. Still, there was also wide heterogeneity in the results that could be
partially explained by certain moderators, including learner proficiency, study type, and
target L2 outcome. Future research should shift focus from whether flipped learning is
effective to when and how its effectiveness can be maximized.
Acknowledgements
We would like to thank Dr. Jeffrey G. Mehring for his comments on our literature search protocols.
We are also grateful to Alex Sutton and Daniël Lakens for comments on the analysis.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this
article.
ORCID iDs
Joseph P. Vitta https://orcid.org/0000-0002-5711-969X
Ali H. Al-Hoorie https://orcid.org/0000-0003-3810-5978
Supplemental material
Supplemental material for this article is available online.
Note
1. The exclusion of true control (no learning intervention) comparisons was in line with the
methodologies of recent L2 (e.g. Bryfonski & McKay, 2019) and flipped learning (e.g.
Strelan et al., 2020) meta-analyses.
2. The Trim and Fill method was calculated using the fixed-effects model. The random-effects
model (not reported here) showed the opposite pattern, indicating that bias might be resulting
from reports with larger samples—which does not seem likely (see also Shi & Lin, 2019). The
funnel plot based on the random-effects model may be obtained from the authors.
Vitta and Al-Hoorie 1289
References
Adnan, M. (2017). Perceptions of senior-year ELT students for flipped classroom: A materials
development course. Computer Assisted Language Learning, 30, 204–222.
Al-Hoorie, A.H. (2018). The L2 motivational self system: A meta-analysis. Studies in Second
Language Learning and Teaching, 8, 721–754.
Al-Hoorie, A.H., & Al Shlowiy, A.S. (2020). Vision theory vs. goal-setting theory: A critical
analysis. Porta Linguarum, 33, 217–229.
Al-Hoorie, A.H., & Vitta, J.P. (2019). The seven sins of L2 research: A review of 30 journals’
statistical quality and their CiteScore, SJR, SNIP, JCR Impact Factors. Language Teaching
Research, 23, 727–744.
AlJaser, A.M. (2017). Effectiveness of using flipped classroom strategy in academic achieve-
ment and self-efficacy among education students of Princess Nourah Bint Abdulrahman
University. English Language Teaching, 10, 67–77.
Alnuhayt, S.S. (2018). Investigating the use of the flipped classroom method in an EFL vocabulary
course. Journal of Language Teaching and Research, 9, 236–242.
Baş, G., & Kuzucu, O. (2009). Effects of CALL method and DynED language programme on stu-
dents’ achievement levels and attitudes towards the lesson in English classes. International
Journal of Instructional Technology and Distance Learning, 6, 31–44.
Bergmann, J., & Sams, A. (2012). Flip your classroom: Reach every student in every class every
day. Eugene, OR: International Society for Technology in Education.
Borenstein, M., Hedges, L.V., Higgins, J.P.T., & Rothstein, H.R. (2009). Introduction to meta-
analysis. Oxford: Wiley.
Borenstein, M., Hedges, L.V., Higgins, J.P., & Rothstein, H.R. (2014). Comprehensive meta anal-
ysis: Version 3.3. Englewood, NJ: Biostat.
Brown, A.V., Plonsky, L., & Teimouri, Y. (2018). The use of course grades as metrics in L2
research: A systematic review. Foreign Language Annals, 51, 763–778.
Bryfonski, L., & Mckay, T.H. (2019). TBLT implementation and evaluation: A meta-analysis.
Language Teaching Research, 23, 603–632.
Burnham, J.F. (2006). Scopus database: A review. Biomedical Digital Libraries, 3(1).
Chen Hsieh, J.S., Wu, W.-C.V., & Marek, M.W. (2017). Using the flipped classroom to enhance
EFL learning. Computer Assisted Language Learning, 30, 1–21.
Cheng, L., Ritzhaupt, A.D., & Antonenko, P. (2019). Effects of the flipped classroom instructional
strategy on students’ learning outcomes: A meta-analysis. Educational Technology Research
and Development, 67, 793–824.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.
Council of Europe. (2001). Common European framework of reference for languages. Strasbourg:
Council of Europe.
Council of Europe. (2011). Common European framework of reference for languages: Learning,
teaching, assessment. Strasbourg: Council of Europe.
Davis, N.L. (2016). Anatomy of a flipped classroom. Journal of Teaching in Travel & Tourism,
16, 228–232.
Dickersin, K. (2005). Publication bias: Recognizing the problem, understanding its origins and
scope, and preventing harm. In Rothstein, H.R., Sutton, A.J., & M. Borenstein (Eds.),
Publication bias in meta-analysis: Prevention, assessment and adjustments (pp. 11–33).
Chichester: Wiley.
Dörnyei, Z., & Al-Hoorie, A.H. (2017). The motivational foundation of learning languages other
than Global English. The Modern Language Journal, 101, 455–468.
1290 Language Teaching Research 27(5)
Duval, S., & Tweedie, R. (2000a). A nonparametric ‘trim and fill’ method of accounting for pub-
lication bias in meta-analysis. Journal of the American Statistical Association, 95, 89–98.
Duval, S., & Tweedie, R. (2000b). Trim and fill: A simple funnel-plot–based method of testing and
adjusting for publication bias in meta-analysis. Biometrics, 56, 455–463.
Ellis, R. (2009). Task-based language teaching: Sorting out the misunderstandings. International
Journal of Applied Linguistics, 19, 221–246.
Evseeva, A., & Solozhenko, A. (2015). Use of flipped classroom technology in language learning.
Procedia – Social and Behavioral Sciences, 206, 205–209.
Fanelli, D. (2010). Do pressures to publish increase scientists’ bias? An empirical support from US
States Data. PLoS One, 5(4), e10271.
Filiz, S., & Benzet, A. (2018). A content analysis of the studies on the use of flipped classrooms in
foreign language education. World Journal of Education, 8, 72–86.
Freire, P. (1968/1970). Pedagogy of the oppressed. New York: Herder and Herder.
Gignac, G.E., & Szodorai, E.T. (2016). Effect size guidelines for individual differences research-
ers. Personality and Individual Differences, 102, 74–78.
Green, A. (2012). Language functions revisited: Theoretical and empirical bases for language
construct definition across the ability range. Cambridge: Cambridge University Press.
Halliday, M.A.K., & Matthiessen, C. (2014). Halliday’s introduction to functional grammar. 4th
edition. New York: Routledge.
Hamdan, M., McKnight, P.E., McKnight, K., & Arfstrom, K.M. (2013). A review of flipped learn-
ing. Flipped Learning Network. Available at: https://www.flippedlearning.org/wp-content/
uploads/2016/07/LitReview_FlippedLearning.pdf (accessed December 2020).
Hassan, S.R.R. (2018). Using the flipped learning model to develop EFL argumentative writing
skills of STEM secondary school students. Majalat Kuliyat Altarbiah (Education College
Journal), 70, 24–74.
Hattie, J.A.C. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achieve-
ment. New York: Routledge.
Hew, K.F., & Lo, C.K. (2018). Flipped classroom improves student learning in health professions
education: A meta-analysis. BMC Medical Education, 18, 38.
Hiver, P., & Al-Hoorie, A.H. (2020a). Reexamining the role of vision in second language motiva-
tion: A preregistered conceptual replication of You, Dörnyei, & Csizér (2016). Language
Learning, 70, 48–102.
Hiver, P., & Al-Hoorie, A.H. (2020b). Research methods for complexity theory in applied linguis-
tics. Bristol: Multilingual Matters.
Hung, H.-T. (2015). Flipping the classroom for English language learners to foster active learning.
Computer Assisted Language Learning, 28, 81–96.
Hung, H.-T. (2017). Design-based research: Redesign of an English language course using a
flipped classroom approach. TESOL Quarterly, 51, 180–192.
Ishikawa, Y., Akahane-Yamada, R., Smith, C., et al. (2015). An EFL flipped learning course
design: Utilizing students’ mobile online devices. In Helm, F., Bradley, L., Guarda, M., & S.
Thouësny (Eds.), Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova,
Italy (pp. 261–267). Dublin: Research-publishing.net.
Karimi, M., & Hamzavi, R. (2017). The effect of flipped model of instruction on EFL learners’
reading comprehension: Learners’ attitudes in focus. Advances in Language and Literary
Studies, 8, 95–103.
Låg, T., & Sæle, R.G. (2019). Does the flipped classroom improve student learning and satisfac-
tion? A systematic review and meta-analysis. AERA Open, 5, 3.
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practi-
cal primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 863.
Vitta and Al-Hoorie 1291
Plonsky, L., & Oswald, F.L. (2014). How big is ‘big’? Interpreting effect sizes in L2 research.
Language Learning, 64, 878–912.
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological
Bulletin, 86, 638–641.
Rounds, P.L. (1996). The classroom-based researcher as fieldworker: Strangers in a strange land.
In Schachter, J., & S. Gass (Eds.), Second language classroom research: Issues and opportu-
nities (pp. 45–59). Mahwah, NJ: Lawrence Erlbaum.
Ryback, D., & Sanders, J.J. (1980). Humanistic versus traditional teaching styles and student sat-
isfaction. Journal of Humanistic Psychology, 20, 87–90.
Shi, L., & Lin, L. (2019). The trim-and-fill method for publication bias. Medicine, 98(23), e15987.
Shi, Y., Ma, Y., MacLeod, J., & Yang, H.H. (2020). College students’ cognitive learning out-
comes in flipped classroom instruction: A meta-analysis of the empirical literature. Journal
of Computers in Education, 7, 79–103.
Simonsohn, U., Nelson, L.D., & Simmons, J.P. (2014). P-curve: A key to the file-drawer. Journal
of Experimental Psychology: General, 143, 534–547.
Strelan, P., Osborn, A., & Palmer, E. (2020). The flipped classroom: A meta-analysis of effects on
student performance across disciplines and education levels. Educational Research Review,
30, 100314.
Teng, M.F. (2017). Flipping the classroom and tertiary level EFL students’ academic performance
and satisfaction. Journal of Asia TEFL, 14, 605–620.
Turan, Z., & Akdag-Cimen, B. (2019). Flipped classroom in English language teaching: A system-
atic review. Computer Assisted Language Learning, 33, 590–606.
van Alten, D.C.D., Phielix, C., Janssen, J., & Kester, L. (2019). Effects of flipping the classroom
on learning outcomes and satisfaction: A meta-analysis. Educational Research Review, 28,
100281.
Vitta, J.P., & Al-Hoorie, A.H. (2017). Scopus- and SSCI-indexed L2 journals: A list for the Asia
TEFL community. The Journal of Asia TEFL, 14, 784–792.
Vitta, J.P., Jost, D., & Pusina, A. (2019). A case study inquiry into the efficacy of four East Asian
EAP writing programmes: Presenting the emergent themes. RELC Journal, 50, 71–85.
Voss, E., & Kostka, I. (2019). Flipping academic English language learning: Experiences from an
American university. Singapore: Springer Nature Singapore.
Webb, M., & Doman, E. (2016). Does the flipped classroom lead to increased gains on learning
outcomes in ESL/EFL contexts? CATESOL Journal, 28, 39–67.
Willis, D., & Willis, J. (2019). Doing task-based teaching. Oxford: Oxford University Press.
Xu, P., Chen, Y., Nie, W., et al. (2019). The effectiveness of a flipped classroom on the develop-
ment of Chinese nursing students’ skill competence: A systematic review and meta-analysis.
Nurse Education Today, 80, 67–77.
Zanna, M.P., & Fazio, R.H. (1982). The attitude-behavior relation: Moving toward a third gen-
eration of research. In Zanna, M.P., Higgins, E.T., & C.P. Herman (Eds.), Consistency in
social behavior: The Ontario symposium: Volume 2 (pp. 283–301). Hillsdale, NJ: Lawrence
Erlbaum.
Zarrinabadi, N., & Ebrahimi, A. (2019). Increasing peer collaborative dialogue using a flipped
classroom strategy. Innovation in Language Learning and Teaching, 13, 267–276.
Zhang, X. (2020). A bibliometric analysis of second language acquisition between 1997 and 2018.
Studies in Second Language Acquisition, 42, 199–222.
Zhang, S., & Zhang, X. (2020). The relationship between vocabulary knowledge and L2 read-
ing/listening comprehension: A meta-analysis. Language Teaching Research. Epub ahead of
print 31 March 2020. DOI: 10.1177/1362168820913998.