Using Digital Technology To Address Confirmability and Scalabilit
Using Digital Technology To Address Confirmability and Scalabilit
9-11-2020
J. Patrick Biddix
The University of Tennessee, Knoxville, [email protected]
Part of the Communication Technology and New Media Commons, Educational Methods Commons,
and the Higher Education Commons
This How To Article is brought to you for free and open access by the The Qualitative Report at NSUWorks. It has
been accepted for inclusion in The Qualitative Report by an authorized administrator of NSUWorks. For more
information, please contact [email protected].
Using Digital Technology to Address Confirmability and Scalability in Thematic
Analysis of Participant-Provided Data
Abstract
This article presents a technique for analyzing large-scale qualitative data to address considerations for
scalability and confirmability in thematic analysis of participant-provided data. A network approach
provides a consistent means of coding that scales with the size of the dataset and is verifiable using
standardized methods. This form of data analysis can be used with smaller data sources including
interview transcripts as well as large data sources such as open-ended survey responses. A constructivist
(inductive) approach is maintained and needed, however, to aid in interpretation of latent constructs. In
this article, we provide both a conceptual overview of the co-word analysis method and a practical
example.
Keywords
Qualitative Research, Network Analysis, Co-Word Analysis, Thematic Analysis, College Students,
Technology
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International
License.
Acknowledgements
This research was supported by Kyungpook National University Bokhyeon Research Fund, 2017.
J. Patrick Biddix
The University of Tennessee, Knoxville, USA
Introduction
Researchers working with large scale qualitative data sources are challenged with
representing reality and perspective within and across data sources, which becomes
exponentially more difficult as data volume increases (Twining, Heller, Nussbaaum, & Tsai,
2017). Although thematic coding and analysis software has made investigation of larger
datasets more manageable, we find confirmability (i.e., the degree to which the analysis process
is influenced by the researcher) and scalability (i.e., maintaining the core tenets of
constructivism as volume of data increases) to be persistent challenges. We have found this
especially problematic when working to separate researcher perspective from results that are
grounded in large-scale participant-provided data (Shenton, 2004, p. 72).
Although we prefer member checking as a technique when working with smaller scale
participant-provided data or co-constructed data such as interviews, we have found
contradictions in verifying data when asking a few members to authenticate or corroborate
findings for very large populations. Taking a pragmatic approach, our search for more precise
and repeatable results to address our concerns for confirmability and scalability led us to social
network analysis techniques. In this article, we present an example case for how to conduct a
social network-based analysis of data (referred to as co-word analysis) taken from participant-
provided qualitative (short answer) responses to a survey to demonstrate how it can be used to
enhance confirmability and increase scalability in qualitative data analyses.
Chung Joo Chung, J. Patrick Biddix, Han Woo Park, 3299
Analog | Manual and Digital | Manual approaches are human-reliant techniques for data
analysis. In both cases, the researcher reviews data sources using a line-by-line or approach to
segment excerpts of data (e.g., Chenail, 2012) to identify and highlight important words or
phrases (codes) and then compiles codes to create themes (Merriam & Tisdell, 2015). In recent
years, software has gotten more complex, adding the ability to create and displays thematic
models to show links between data (Paulus, Lester, & Dempster, 2014). Manual approaches
are labor intensive and allow for nuance and consideration of context. A major advantage is
that researchers remain “close” to the data, and become more intentional participants in the co-
creation of results (Neuendorf, 2016; Richards, 1998).
Digital | Automated and Analog | Automated approaches are computer-reliant
techniques for data analysis. Both techniques approximate an in vivo or grounded approach to
data analysis by seeking the most commonly occurring words in a dataset and providing
descriptive statistics in terms of frequencies of use (Vlieger & Leydesdorff, 2011). Analog |
Automated techniques are more conceptual than practical at this point, since automation or
digital coding requires a digital data source. Digital approaches are highly systematic and
efficient, but do not allow the researcher to connect context with meaning (Biddix, Park, &
Wang, 2009; Richards, 1998). However, a major advantage is the ability to examine large
amounts of data efficiently (Jung & Park, 2015).
Operational techniques for analyzing textual data provide advantages and challenges
related to both the process and conceptualization of data analysis. Primarily, there is concern
for an inability to confirm results with most traditional coding techniques. Further, while
3300 The Qualitative Report 2020
scalability can be substantially enhanced with the efficiency of digital and/or automated
methods, this must be balanced against the potential loss of meaning and context (Twining et
al., 2017). Additional discussion of these considerations follows.
Co-word analysis is a form of social network analysis (Danowski & Park, 2014;
Hanneman & Riddle, 2005) in which the researcher identifies and models co-occurrences
among words. Graphical representations aid in the interpretation of meaning (Leydesdorff &
Vlieger, 2005). Researchers employ specialized software such as FullText.exe for English texts
or KrKwic (Korean Key Words In Context) for Korean texts to search for words that appear,
or co-occur, together (Park & Leydesdorff, 2004). Individual and co-occurring words are
assigned descriptive statistics, which can be viewed in a variety of ways to identify patterns, or
“recurring regularities” (Merriam & Tisdale, 2015, p. 206) in the data. A social network
analysis feature is incorporated to visualize the connections in the data and more clearly
identify emergent content, factors, and overall structure (Park, 2018; Park & Leydesdorff,
2013). Co-word analysis is concerned with finding shared meanings and interpretations among
words with concepts in common (Doerfel, 1998) that can be mathematically valued.
Researchers sometimes describe this as the “measurement of meaning” (Vlieger &
Leydesdorff, 2011).
Leydesdorff and Welbers (2011) observed three non-exclusive capabilities of co-word
analysis: inductive data analysis, large data analysis, and validation of content analysis using
samples. Co-word analysis is regarded a blend between content analysis and factor analysis.
As a form of content analysis, it is used to find meaning in documents from prominent words
or phrases. As a form of factor analysis, it is used to detect correlations between words; the
identification of latent concepts is also possible (Vlieger & Leydesdorff, 2011). Park and
Chung Joo Chung, J. Patrick Biddix, Han Woo Park, 3301
colleagues (Biddix, Chung, & Park, 2015, 2016; Biddix, Park, & Wang, 2009; Park, 2012)
proposed and demonstrated co-word analysis as an alternative operational techniques for
coding thematic data. Because the techniques can handle small or very large volumes of data,
this form of social network analysis has been useful in the study of “big data” (Lee & Park,
2019; Park & Leydesdorff, 2013).
Co-word analysis of thematic data begins by searching data using specialized data
mining software. Units of analysis are texts, which can vary in size from sentences and
paragraphs to sections and pages. While many thematic data analysis programs offer mining
capabilities, such as producing descriptive frequencies for words using specified (a priori) and
unspecified (in vivo) techniques, network analysis extends this step by identifying and then
tracking relations between words. In particular, social media network tools also provide open
data repository and sentiment analysis options for both qualitative and quantitative research
(Smith, 2015). These relations, or links, are considered to “co-occur” which gives this form of
inquiry its name, co-word or co-occurrence analysis (Chung & Park, 2010). Once the relevant
unit of analysis is selected, the researcher decides how the word occurrences will be recorded
(most/least frequent, weighting for moderate frequency, or using chi-square analysis). Co-word
analysis programs typically utilize a chi-square analysis, which enables the researcher to
calculate observed/expected values and assess the extent to which a word occurs above or
below expectation (for more information, see Leydesdorff & Welbers, 2011).
Procedurally, the initial network analysis step fits a Digital | Automated categorization,
but the secondary meaning making step is Manual. When content analysis is completed using
a social network approach, the semantic or linguistic association between prominent words
becomes the fundamental feature (Leydesdorff, 2001). However, since words are rarely spoken
or written without context, meaningful analysis of text must also consider other associated
words, phrases, or concepts (Neuendorf, 2016). As a result, co-word analysis is a multi-step
process that first uncovers significant data points, identifies links between units, values those
links and units, evaluates their position in the dataset, and then relies on the researcher to
contextualize and interpret findings. This pairing of network analysis with thematic coding
blends the efficiency of large-scale data automated analysis while allowing for a manual
constructivist interpretation.
First, we imported open-ended responses from the online survey platform as text files
into KrKwic. A member of the research team initially screened the data, and removed blank or
single word responses. During data cleaning and dataset preparation, researchers typically use
a stop word or natural language processing dictionary. A stop word is a list of several
commonly used words that co-word analysis software ignores such as articles (e.g., a, an, the)
and conjunctions (e.g., and, but). After dataset preparation, we identified a listing of the top 40
word frequencies (words that appeared at least 3 times) as a group. We also specified automated
data mining for stemming words. For example, learning also includes other versions of the
word such as “learned.” Table 2 displays the results.
Next, a member of the team exported data from KrKwic into UCINET. For detailed
procedures, refer to https://www.leydesdorff.net/software/fulltext/. The software was used to
generate network metrics such as nDegree and nEigenvector (Borgatti et al., 2002), which are
essential for understanding the importance of individual words as well as the overall structure
of a network. Table 2 also displays these metrics. The degree centrality of each word is
calculated based on the number of words adjacent to a given word in a text. nDegree stands for
the normalized degree centrality that is the degree divided by the maximum possible degree.
In a contrast to degree centrality, eigenvector value considers the centrality of words to which
a given word is connected.
Researchers must decide how to organize and interpret these results. For example,
should only Understanding and Content be interpreted as indicative of the responses, since they
are more highly correlated or should Unknowingness be added to help make sense of the
results? This somewhat subjective interpretation of the statistical measures is aided by the use
of network visualization software, which helps to further identify optimal patterns in the data.
3304 The Qualitative Report 2020
In other words, network diagrams can be useful in making decisions about which co-words are
most indicative of responses.
1
This form of analysis is referred to CONCOR, which clusters network data by splitting blocks based upon the
CONvergence of iterated CORrelations (CONCOR) with user control of the splits. Given an adjacency matrix, or
a set of adjacency matrices for different relations, a correlation matrix can be formed by the following procedure.
Form a profile vector for a vertex i by concatenating the ith row in every adjacency matrix; the i,jth element of
the correlation matrix is the Pearson correlation coefficient of the profile vectors of i and j. This (square,
symmetric) matrix is called the first correlation matrix. The procedure can be performed iteratively on the
correlation matrix until convergence. Each entry is now 1 or -1. This matrix is used to split the data into two
blocks such that members of the same block are positively correlated; members of different blocks are negatively
correlated. CONCOR uses the above technique to split the initial data into two blocks. Successive splits are then
applied to the separate blocks and are controlled by the user.
2
The size of the concentric circles indicates the degree centrality among words. Prominence in this case refers to
centrality, or how important certain words are to the overall structure of the network. The number of vertices
adjacent to a given vertex in a symmetric graph is the degree of that vertex.
Chung Joo Chung, J. Patrick Biddix, Han Woo Park, 3305
college students. Using this same technique, we created interpretable, generalized, and perhaps
most importantly, contextualized responses as a team. We identified and included direct
quotations to further evidence the alignment of generalized data analysis with actual data, as
recommended by Braun and Clarke (2006). One member of the team completed the write-up
by composing narrative themes. We provide additional details for this step in the following
section, as this is critical in demonstrating how the technique can address scalability (dataset
size is less relevant using this quasi-automated method) and confirmability (the process allows
for checking and verifying that data themes and clusters match the text.
6a. Create and verify theme clusters. Although based on statistical measures and
verifiable by review of frequency and correlation metrics, selecting the most “representative”
clusters as themes is a constructivist-based “sensemaking” activity. We derived summary
statements by viewing and interpreting the network metrics (Table 1), along with the social
network diagram (Figure 1). This procedure is best completed separately by members of the
research team and then compared as a validation check. As with identifying themes in
traditional qualitative analysis, disagreements are discussed until consensus agreement on the
summaries is reached. Two members of our research team followed this procedure, and
generally agreed on the results after the initial round. To promote accuracy, we also used asked
members of the population to review results (member-checking).
6c. Validate and contextualize findings. Graneheim and Lundman (2004) noted that
“a text always involves multiple meanings and there is always some degree of interpretation
when approaching a text” (p. 106). Although the initial list of co-words was statistically
identified, the correlations among words may not reflect sentiments in the actual data. Further
a concern is that some important clarifying words, such as “not” might be overlooked
depending on the algorithm and specification of co-occurrence. However, the default
specifications in most software is set to identify words co-occurring more than three times. So,
a case where “not” might appear with “distraction” would be visible in the output. To address
this issue and further and investigate the potential for misspecification, we returned to the initial
data and reviewed responses using the listed phrases. This process is best considered iterative,
meaning that there may be some trial-and-error in the checking procedures.
Co-word analysis procedures yield several different types of output files that are used
in data transfer, analysis, and in interpretation. For the purposes of presenting analysis and
results, we typically provide the same three visuals we presented in this article: Network
metrics (Table 2), Co-word matrix (Table 3), and the Network diagram (Figure 1). We find that
a good organizational strategy for results is to use subsections for each open-ended response
or research question (depending on the unit of analysis). Then, the most frequently co-occurring
responses, as interpreted from both hierarchical and co-word analyses, should be displayed in
phrases and then reworded to create summaries. The final presentation may also use conceptual
themes derived by the researcher. Following is a brief example of a summary result section.
3306 The Qualitative Report 2020
• Learning and achievement is both effective and ineffective with the Internet
(can be enhancing or distracting
• Videos of teaching material for questions and improvement
• Content related to interests improves understanding
• Searching for references, solutions, and files is convenient with a smartphone
(in class)
• Using mobile devices makes various assignments and review possible
• Using smartphones in class is a distraction
While students described numerous advantages of using mobile devices for learning
related to convenience and the ability to enhance comprehension (even during class), they also
mentioned the problems of distraction in nearly every example. Two students used the image
of a double-edged sword to convey this dilemma. One noted, “I think it is a double-edged
sword. It's easy to study with mobile devices, but there is a possibility that it will fall into a
side path.” Following are examples of additional quotations related to enhanced, but distracted
learning.
“It can help me understand in depth what I want to know but there is a concern
that my attention may be distracted. I would recommend using them on more
lessons and books.”
“You can conveniently find the materials you want, but they are too easily
exposed to out-of-school materials and often interfere with your studies.”
Students appreciated the convenient ability to locate material and access information,
as needed. They described doing this both during study and in class. The primary motivator
Chung Joo Chung, J. Patrick Biddix, Han Woo Park, 3307
was the ease of connection to information and the speed of finding an answer. A few students
discussed determining credibility when describing this convenience.
“It is easy to access information that you do not know before, which is a great
help in studying.”
“First, it is easy to find the data you want anywhere, so you can easily access
the information you need.”
“It makes you feel convenient in studying. Access to more information through
associative search.”
Enhances Learning
Beyond merely convenience, mobile devices were useful in helping students find
alternative explanations for concepts, supported homework by allowing rapid access to
information or videos online, and enhanced learning by providing the ability to explore
concepts more deeply.
“Useful because you can find out words or terms you do not understand during
class while searching the internet.”
“It is important. Internet, wikis, videos, etc. are used to understand the
concept and the programmes and it reduces the time for calculation.”
Too Distracting
Several students discussed only the distractions of mobile device use. They believed
that the distraction outweighed the benefits for themselves and for most students.
“Unlike the past, I cannot concentrate on my time because there are much more
apps and icons that distract me.”
“I do not think the use of mobile devices has a positive effect on my studies. It
would be fine if it had only the functions to be used, but usually it would do a
lot of personal things to do rather than lessons.”
Final Considerations
The purpose of this article was to demonstrate a technique for enhancing confirmability
and scalability of qualitative data, while maintaining the core values of constructivism. As it
3308 The Qualitative Report 2020
becomes both easier to collect large volumes of qualitative data and more commonplace for
participants to provide it online, interpretive techniques for analyzing large-scale open-ended
or document-based data are needed. In this article, we demonstrated a solution using co-word
analysis paired with network visualization.
One possible concern for this analysis is the time cost for the analysis – both in terms
of learning the software and in performing and interpreting the automated analysis followed by
human analysis. The software tools used for this analysis are commonly employed in many
academic fields including communications, sociology, and increasingly, education. The basic
functions demonstrated in this article for data mining and network visualization can be
performed with little prior knowledge of network analysis. Our added challenge was in
translating the Korean text to English. This step was an advanced function enabled by the
software that would not be necessary for data that did not require translation. We also
performed additional validation checks in the full dataset to ensure the accuracy of the
translation.
We close by emphasizing the important role of the researcher for final interpretation of
data, consistent with the goals of constructivism (Merriam & Tisdell, 2015). Similar to early
users of qualitative and later mixed methods analysis techniques (Creswell, 2008), as the use
of co-word analysis for qualitative data continues to develop, researchers are advised to provide
readers with additional insight about the procedure. We acknowledge that for smaller datasets,
such as the one used for this demonstration, this technique can more labor intensive in that both
the software and the human element are needed for analysis. In larger datasets, however, the
technique can considerably reduce the time cost of initial analysis (scalability) as well as
verification process of ensuring accurate interpretation (confirmability).
References
Belotto, M. J. (2018). Data analysis methods for qualitative research: Managing the challenges
of coding, interrater reliability, and thematic analysis. The Qualitative Report, 23(11),
2622-2633. https://nsuworks.nova.edu/tqr/vol23/iss11/2
Biddix, J. P. (2018). Research methods and applications for student affairs. San Francisco,
CA: Jossey Bass.
Biddix, J. P., Chung, C., & Park, H. W. (2015). The hybrid shift: evidencing a student-driven
restructuring of the college classroom. Computers & Education, 80, 162-175.
Biddix, J. P., Chung, C., & Park, H. W. (2016). Faculty use and perception of mobile
information and communication technology (m-ICT) for teaching practices.
Innovations in Education and Teaching International, 53(4), 375-387
Biddix, J. P., Park, H. W., & Wang, T. (2009). Co-word analysis of open-end answers from
Chinese Internet users: An alternative content analysis method for qualitative research.
The Society for Humanities Studies in East Asia, 16, 415-447.
Borgatti, S. P., Everett, M. G., & Freeman, L. C. (2002). UCINET 6 for Windows: Software
for Social Network Analysis. Harvard, MA: Analytic Technologies.
Boyatzis, R. E. (1998). Transforming qualitative information: Thematic analysis and code
development. Thousand Oaks, CA: Sage.
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research
in Psychology, 3(2), 77-101.
Chenail, R. J. (2012). Conducting qualitative data analysis: Qualitative data analysis as a
metaphoric process. The Qualitative Report, 17(1), 248-253.
https://nsuworks.nova.edu/tqr/vol17/iss1/13
Chung, C., & Park, H. W. (2010). Textual analysis of political messages: The inaugural
addresses of two Korean presidents. Social Science Information, 49(2), 215-239.
Chung Joo Chung, J. Patrick Biddix, Han Woo Park, 3309
Author Note
Chung Joo Chung is an Associate Professor in the Department of Journalism and Mass
Communication at Kyungpook National University. He conducts research on new media and
technology, social networks, data science, AI from the perspective of social science. He has
published articles in prestigious journals, such as Journal of Computer-Mediated
Communication, Scientometrics, Computers and Education, Social Science Computer Review,
Technological Forecasting and Social Change, and Telecommunications Policy. He also
contributes to start-up activities and communities. Please direct correspondence to
[email protected].
J. Patrick Biddix is a Professor and Associate Director of the Postsecondary Education
Research Center (PERC) at the University of Tennessee. His research and teaching focus on
research design and assessment, student engagement and involvement, and postsecondary
outcomes. Dr. Biddix is the author of Research Methods and Applications for Student Affairs
(Jossey-Bass, 2018) and co-authored the 2nd editions of Assessment in Student Affairs (Jossey-
Bass, 2016) and Frameworks for Assessing Learning and Development Outcomes 2.0 (CAS,
2020). In 2015, he received a Fulbright Scholar Award to study college student communication
and technology use in Montreal, Canada. Please direct correspondence to [email protected].
Han Woo Park (Corresponding Author) is a Professor in the Dept. of Media &
Communication, Interdisciplinary Graduate Programs of Digital Convergence Business and
East Asian Cultural Studies, and Founders of Cyber Emotions Research Institute (at
YeungNam University) and WATEF (World Association for Triple Helix & Future Strategy
Studies), South Korea. He was a pioneer in network science of open and big data in the early
2000s (often called Webometrics) when he used to work for Royal Netherlands Academy and
lead the World Class University project. He has published more than 100 articles in SSCI
Journals. He is currently Chief Editors for Journal of Contemporary Eastern Asia and Quality
& Quantity. Several publications were included in top 10 list of downloads and citations. He
has been co-awarded the best paper in EPI-SCImago in 2016 and included in the list of core-
candidates of the Derek de Solla Price Memorial Medal in 2017 and 2019. Please direct
correspondence to [email protected].
Copyright 2020: Chung Joo Chung, J. Patrick Biddix, Han Woo Park, and Nova
Southeastern University.
Article Citation
Chung, C. J., Biddix, J. P., & Park, H. W. (2020). Using digital technology to address
confirmability and scalability in thematic analysis of participant-provided data. The
Qualitative Report, 25(9), 3298-3311. https://nsuworks.nova.edu/tqr/vol25/iss9/7