(Ebook) Scaling Up: How Data Curation Can Help Address Key Issues in Qualitative Data Reuse and Big Social Research by Sara Mannheimer ISBN 9783031492211, 3031492218 - The ebook in PDF format with all chapters is ready for download
(Ebook) Scaling Up: How Data Curation Can Help Address Key Issues in Qualitative Data Reuse and Big Social Research by Sara Mannheimer ISBN 9783031492211, 3031492218 - The ebook in PDF format with all chapters is ready for download
com
https://ebooknice.com/product/scaling-up-how-data-curation-
can-help-address-key-issues-in-qualitative-data-reuse-and-
big-social-research-54773978
OR CLICK BUTTON
DOWLOAD EBOOK
(Ebook) AI in Marketing, Sales and Service: How Marketers without a Data Science
Degree can use AI, Big Data and Bots by Peter Gentsch ISBN 9783319899565, 3319899562
https://ebooknice.com/product/ai-in-marketing-sales-and-service-how-marketers-
without-a-data-science-degree-can-use-ai-big-data-and-bots-7222652
ebooknice.com
(Ebook) Scaling Big Data with Hadoop and Solr by Karambelkar, Hrishikesh ISBN
9781783281374, 1783281375
https://ebooknice.com/product/scaling-big-data-with-hadoop-and-solr-24003492
ebooknice.com
(Ebook) Scaling Big Data with Hadoop and Solr by Karambelkar, Hrishikesh ISBN
9781783281374, 1783281375
https://ebooknice.com/product/scaling-big-data-with-hadoop-and-solr-55292476
ebooknice.com
(Ebook) Data Algorithms: Recipes for Scaling Up with Hadoop and Spark by Mahmoud
Parsian ISBN 9781491906187, 1491906189
https://ebooknice.com/product/data-algorithms-recipes-for-scaling-up-with-
hadoop-and-spark-5157066
ebooknice.com
(Ebook) Scaling big data with Hadoop and Solr: understand, design, build, and
optimize your big data search engine with Hadoop and Apache Solr by Karambelkar,
Hrishikesh Vijay ISBN 9781783553396, 9781783553402, 1783553391, 1783553405
https://ebooknice.com/product/scaling-big-data-with-hadoop-and-solr-understand-
design-build-and-optimize-your-big-data-search-engine-with-hadoop-and-apache-
solr-11793554
ebooknice.com
https://ebooknice.com/product/qualitative-data-analysis-key-approaches-53591044
ebooknice.com
(Ebook) Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to
Deliver Extraordinary Results by Bernard Marr ISBN 9781119231387, 1119231388
https://ebooknice.com/product/big-data-in-practice-how-45-successful-companies-
used-big-data-analytics-to-deliver-extraordinary-results-5676454
ebooknice.com
https://ebooknice.com/product/contemporary-issues-in-communication-cloud-and-
big-data-analytics-36465900
ebooknice.com
(Ebook) Decoding the City: How Big Data Can Change Urbanism by Dietmar Offenhuber
(ed.), Carlo Ratti (ed.) ISBN 9783038213925, 3038213926
https://ebooknice.com/product/decoding-the-city-how-big-data-can-change-
urbanism-4763378
ebooknice.com
Synthesis Lectures on
Information Concepts, Retrieval, and Services
Sara Mannheimer
Series Editor
Gary Marchionini, School of Information and Library Science, The University of North
Carolina at Chapel Hill, Chapel Hill, NC, USA
This series publishes short books on topics pertaining to information science
and applications of technology to information discovery, production, distribution,
and management. Potential topics include: data models, indexing theory and
algorithms, classification, information architecture, information economics, privacy and
identity, scholarly communication, bibliometrics and webometrics, personal information
management, human information behavior, digital libraries, archives and preservation,
cultural informatics, information retrieval evaluation, data fusion, relevance feedback,
recommendation systems, question answering, natural language processing for retrieval,
text summarization, multimedia retrieval, multilingual retrieval, and exploratory search.
Sara Mannheimer
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage
and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or
hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give
a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that
may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
This book would not have been possible without my own communities of practice.
Thank you to Scott Young, my partner and colleague; our discussions and mutual support
enhance my life in general, and this book in particular. This book is based on my Ph.D.
research, and I am deeply grateful for the guidance and support provided by my disserta-
tion advisors, Vivien Petras and Michael Zimmer. Thank you to Kalpana Shankar for key
advice on research methods; Eric Raile for reviewing my interview guides; Emily O’Brien
for cleaning interview transcripts; and David Mannheimer for providing suggestions on
writing style, clarity, and structure. Parts of Chap. 5 were previously published in the
Journal of eScience Librarianship, and I am grateful to the JeSLIB peer reviewers whose
feedback made those sections stronger. Thank you to Dessi Kirilova and the curators at
Qualitative Data Repository for their seamless curation services when archiving the asso-
ciated research data. Finally, I want to thank the thirty researchers and data curators who
participated in research interviews. The knowledge, experience, and insights they shared
are the heart of this book.
v
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Issues Raised by Qualitative Data Reuse and Big Social Research . . . . . . 3
1.2.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Data Quality and Trustworthiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Data Comparability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.4 Informed Consent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.5 Privacy and Confidentiality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.6 Intellectual Property and Data Ownership . . . . . . . . . . . . . . . . . . . . . 5
1.3 Data Curation to Address Issues in Qualitative Data Reuse and Big
Social Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Goal and Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Theoretical Approach, Methods, and Definitions . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Theoretical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Defining Qualitative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 Defining Qualitative Data Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Defining Big Social Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.4 Defining Big Social Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Qualitative Data Reuse in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1 History of Qualitative Data Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Benefits of Qualitative Data Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Issues in Qualitative Data Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.2 Data Quality and Trustworthiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
vii
viii Contents
“Before social scientists can begin using ideas and algorithms from computer science, they
need to learn how to work with large-scale unstructured organic data and understand the gen-
eral principles, tools, and methods used by computer scientists. Likewise, computer scientists
can reach inaccurate conclusions if they fail to understand key considerations and objec-
tives within social science research that may not traditionally apply in computer science.”
(Mneimneh et al. 2021).
1.1 Background
The research community has recently seen increased interest in qualitative data archiv-
ing and reuse, in conjunction with shifts toward open science practices and engagement
with new technologies (Corti et al. 2005; Glenna et al. 2019). There are many poten-
tial benefits of qualitative data reuse. For example, reusing qualitative data can increase
efficiency, deepen research conclusions, and reduce the burden on research subjects by
allowing new studies to be conducted without collecting new data. Qualitative data reuse
can also potentially support larger-scale, longitudinal research by facilitating the combin-
ing of datasets to analyze more participants and to investigate human behavior over longer
periods of time. In 2002, Mason encouraged the social science community to invest in lon-
gitudinal qualitative studies that were specifically designed for secondary use. She called
for “appropriately qualitative ways to ‘scale up’ research resources currently generated
through multiple small-scale studies, to fully exploit the massive potential that qualitative
research offers for making cross-contextual generalisations” (Mason 2002). In the two
decades since Mason issued this call, some researchers have aggregated qualitative data
to produce new conclusions (Halford and Savage 2017; Winskell et al. 2018; Davidson
et al. 2018), but it is still a rare practice.
At the same time, qualitative data can increasingly be collected from online sources.
Researchers can access and analyze personal narratives and social interactions through
social media such as blogs, online forums, and posts and interactions on platforms like
Facebook, Twitter, YouTube, and TikTok. These “big social data” (Manovich 2012) have
been celebrated as unprecedented sources of data analytics, able to produce social insights
by analyzing human behavior on a massive scale (Fan and Gordon 2014; Cappella 2017).
Big social data are a form of qualitative data that have been published online by users
themselves. When researchers analyze big social data, this could be seen as qualitative
data reuse—that is, researchers are repurposing and recontextualizing big social data to
answer research questions.
Using this similarity between qualitative data reuse and big social research as a starting
point, this book investigates three communities of practice (Wenger et al. 2002) who are
engaged with social research and social data:
Qualitative researchers who share or reuse data and big social researchers have similar
goals—they aim to scale up and enhance social science research. But these two com-
munities of practice are under-connected. Big social research has not yet been widely
framed as a form of qualitative data reuse, and qualitative data reuse has only begun to
be discussed through a big social research lens. These two communities of practice also
have different backgrounds, training, and disciplinary values. Qualitative researchers tend
to come from social science disciplines, and they tend to focus on using in-depth research
methods to investigate social and behavioral phenomena. Big social researchers, on the
other hand, tend to have computer science and other types of engineering backgrounds,
and they tend to focus on using computational methods to analyze large amounts of data.
Data curators as a profession are concerned with organizing, managing, and curat-
ing data, rather than building methodologies and drawing conclusions from those data.
Therefore, data curators are uniquely positioned to build connections between qualitative
researchers and big social researchers, based on the similarities of the data used by both
types of researchers. In this book, I suggest that data curation strategies can be used to
support and enhance responsible practice, and that data curators can act as facilitators and
intermediaries between communities of practice.
1.2 Issues Raised by Qualitative Data Reuse and Big Social Research 3
This book is centered around six key epistemological, ethical, and legal issues that apply
to qualitative data reuse, big social data research: context, data quality and trustworthiness,
data comparability, informed consent, privacy and confidentiality, and intellectual property
and data ownership. These six key issues are at the heart of this book, helping to structure
interviews with researchers and curators, and functioning as scaffolding for data curators
to build connections with researchers. Below, I provide brief summaries of each of the
issues. These issues are addressed in more detail in Chap. 3 (as related to qualitative data
reuse), Chap. 4 (as related to big social research), and Chap. 5 (comparing and contrasting
issues for each type of research and synthesizing relevant data curation strategies for each
issue).
1.2.1 Context
Both qualitative data reuse and big social research are context dependent. For qualita-
tive data reuse, there is some concern that reused data may not be able to be properly
understood outside of their original context, without the knowledge and expertise of the
researchers who conducted the original research project and originally analyzed the data.
For big social research, context is even more murky. Because automated data collection
happens on a large scale, generally without interaction with the people who created the
data, the context of big social data may be absent or difficult to understand.
Issues relating to data quality and trustworthiness are also common to both big social
research and qualitative data reuse. Qualitative researchers who reuse data need to know
that those data are high-quality and trustworthy—that the data have been collected using
valid methods, that transcriptions are accurate, and that the data are complete. Big social
researchers deal with the issue of data representativeness—social media users may not
be representative of society as a whole, and the data collected through web scraping or
calls to Application Programming Interfaces (APIs) may not be complete. Issues of data
quality and trustworthiness are further complicated by the possibility of fake social media
accounts and bots that may appear to be human, but that researchers may not want to
include in their analysis.
4 1 Introduction
The unstructured, complex, and varied nature of qualitative data can make it difficult
to analyze an archived qualitative dataset so as to yield a meaningful answer to a new
research question. For big social research, data may have different file types, different
metadata fields, and different metadata standards, all of which make combining data
more difficult, especially on a large scale. Data comparability is an important issue for
both qualitative data reuse and big social research because combining and comparing
datasets can enhance the context and quality of their research. Combining datasets can
also increase the scope of qualitative and big social research by allowing researchers to
build larger or longitudinal datasets.
Informed consent is an issue for both qualitative data reuse and big social research. For
qualitative data, while research participants provide consent for the initial study, they
may not have provided consent for the data to be archived for future use. In recent years,
broad consent (that is, consent to data reuse) has begun to be included in consent forms,
and Institutional Review Boards (IRBs) can provide guidelines for consent procedures
that allow the use of qualitative data beyond its original purpose. On the other hand, big
social researchers often consider big social data to be content that is simply found online,
and therefore may not consider it necessary to obtain informed consent from the users
who generate big social data. Big social researchers may also consider it sufficient that
users have agreed to their social media platforms’ terms of service; these terms generally
include consent for different types of data use, including research use. However, most
users do not read the terms of service closely enough to constitute informed consent.
Both qualitative researchers who share or reuse data and big social researchers both con-
tend with the issue of privacy and confidentiality. While some big social researchers have
argued that big social data are public by nature, and therefore that deidentification of
such data is unnecessary, negative public responses to projects such as the Taste, Ties,
and Time dataset (Zimmer 2010) and an openly shared OKCupid dataset (Resnick 2016)
have shown the perils of sharing big social data without proper deidentification. For both
qualitative and big social data, protecting participant privacy and confidentiality is all the
more vital when participants are part of vulnerable populations such as prisoners, chil-
dren, people involved in illegal activities, and marginalized and minoritized communities
1.3 Data Curation to Address Issues in Qualitative Data Reuse and Big … 5
Intellectual property and data ownership is a key issue for both qualitative researchers
who share or reuse data and big social researchers. Both communities of practice may
encounter challenges when collecting existing data from sources where intellectual prop-
erty rights, licenses, or permissions may be varied. For qualitative data, the data may be
owned by institutions, or intellectual property rights may be held by research participants.
In either case, consent from intellectual property rights holders is necessary to redistribute
the data for reuse. For big social data, the intellectual property rights are often controlled
by private, for-profit companies. Even if social media posts are the intellectual property of
the users who posted them, the rights to these posts are licensed to the social media com-
panies through the companies’ terms of service. Additionally, intellectual property rights
and data ownership may vary according to how and where the data were collected. For
example, when collecting data from Indigenous communities, additional considerations
come into play, such as the CARE Principles (Carroll et al. 2021) and the First Nations
principles of ownership, control, access, and possession (OCAP® ) (FNIGC 2010).
The rapidly evolving data landscape presents interesting possibilities for social and behav-
ioral research. And as more researchers share data and conduct big social research, there
is an increased need for assistance in responsible big social research, data sharing, and
data reuse practices. The field of data curation has grown exponentially in response to
this need. However, data sharing practices and guidelines that are specific to qualitative
data reuse and big social research are still in the early stages of development. When
confronting issues involving responsible data sharing and reuse, data curators often refer
to the FAIR Guiding Principles (Wilkinson et al. 2016), which suggest that shared data
should be findable, accessible, interoperable, and reusable. However, the FAIR Principles
were designed to support technical issues relating to data reuse. They do not directly
address the epistemological, ethical, and legal issues that arise when using data originally
created through interaction with human subjects.
A growing body of literature suggests that data curation strategies can alleviate some
of the epistemological, ethical, and legal issues described above. These practices include
data management planning, designing research to facilitate later data sharing, and produc-
ing metadata and other documentation to capture contextual information. Data curation
6 1 Introduction
strategies can also help protect participants from harm, through data deidentification,
aggregating data, or restricting access to data. Data curation for qualitative data reuse
is a more established practice, and literature going back to the 1990s examines how data
curation strategies can support epistemologically sound, ethical, and legal data sharing.
Data curation for big social data is less well-developed, and there is little consensus about
how to maintain a balance between conducting research, encouraging transparency, and
protecting research subjects.
This book suggests that comparing data curation practices for qualitative data reuse and
big social research can help researchers responsibly scale up their research practices. By
exploring the similarities and differences between the epistemological, ethical, and legal
issues in qualitative data reuse and big social research, this book identifies data curation
strategies that can encourage responsible use and reuse of qualitative data, both big and
small. These strategies reduce the potential for harm to the human subjects whose thoughts
and activities are represented in archived qualitative data and big social data, while at the
same time promoting the use and reuse of these data.
The book is divided into eight chapters, including this introduction. Chapter 2 out-
lines my general theoretical approach to the research, provides a brief summary of my
research methods, and defines common terms that are used throughout the book. Chap-
ters 3 and 4 review existing literature in qualitative data reuse and big social research;
through these literature reviews, I identify the six key issues outlined above—context, data
quality and trustworthiness, data comparability, informed consent, privacy and confiden-
tiality, and intellectual property and data ownership. Chapter 5 explores the similarities
and differences between these key issues in qualitative data reuse and big social research,
especially focusing on the data curation implications of these issues. Chapter 6 provides a
detailed description of interviews with qualitative researchers, big social researchers, and
data curators. Chapter 7 synthesizes, proposes recommendations, and suggests areas of
focus for data curators, based on the literature and insights presented in previous chapters.
Chapter 8 suggests future work that can continue to enhance responsible practices when
scaling up social and behavioral research, and presents concluding thoughts about the role
of data curation in facilitating epistemologically sound, ethical, and legal qualitative data
reuse and big social research.
References 7
References
Cappella JN (2017) Vectors into the future of mass and interpersonal communication research: big
data, social media, and computational social science. Hum Commun Res 43:545–558. https://doi.
org/10.1111/hcre.12114
Carroll SR, Herczog E, Hudson M, Russell K, Stall S (2021) Operationalizing the CARE and FAIR
principles for indigenous data futures. Sci Data 8:108. https://doi.org/10.1038/s41597-021-008
92-0
Corti L, Witzel A, Bishop L (2005) On the potentials and problems of secondary analysis: an intro-
duction to the FQS special issue on secondary analysis of qualitative data. Forum Qualitative
Sozialforschung/Forum Qual Soc Res 6. https://doi.org/10.17169/fqs-6.1.498
Davidson E, Edwards R, Jamieson L, Weller S (2018) Big data, qualitative style: a breadth-and-depth
method for working with large amounts of secondary qualitative data. Qual Quant 1–14. https://
doi.org/10.1007/s11135-018-0757-y
Fan W, Gordon MD (2014) The power of social media analytics. Commun ACM 57:74–81. https://
doi.org/10.1145/2602574
FNIGC (2010) The first nations principles of OCAP®, a registered trademark of the First Nations
Information Governance Centre (FNIGC). First Nations Information Governance Centre, Akwe-
sasne, ON
Glenna L, Hesse A, Hinrichs C, Chiles R, Sachs C (2019) Qualitative research ethics in the big data
era. Am Behav Sci 63:555–559. https://doi.org/10.1177/0002764219826282
Halford S, Savage M (2017) Speaking sociologically with big data: symphonic social science and
the future for big data research. Sociology 51:1132–1148. https://doi.org/10.1177/003803851769
8639
Manovich L (2012) Trending: the promises and the challenges of big social data. In: Gold MK (ed)
Debates in the digital humanities. University of Minnesota Press, Minneapolis, MN, pp 460–475
Mason J (2002) Qualitative research resources: a discussion paper. Prepared for the ESRC Research
Resources Board (unpublished, obtained from author)
Mneimneh Z, Pasek J, Singh L, Best R, Bode L, Bruch E, Budak C, Davis-Kean P, Donato K, Ellison
N, gelman andrew, Groshen E, Hemphill L, Hobbs W, Jensen JB, Karypis G, Ladd JM, O’Hara A,
Raghunathan T, Resnik P, Ryan R, Soroka S, Traugott M, West B, Wojcik S (2021) Data acquisi-
tion, sampling, and data preparation considerations for quantitative social science research using
social media data. PsyArXiv
Resnick B (2016) Researchers just released profile data on 70,000 OkCupid users without permis-
sion. Vox
Wenger E, McDermott RA, Snyder W (2002) Cultivating communities of practice: a guide to man-
aging knowledge. Harvard Business School Press, Boston, MA
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten
J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I,
Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble
C, Grethe JS, Heringa J, ’t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME,
Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes
E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E,
Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B (2016) The FAIR
Guiding Principles for scientific data management and stewardship. Sci Data 3:160018. https://
doi.org/10.1038/sdata.2016.18
8 1 Introduction
Winskell K, Singleton R, Sabben G (2018) Enabling analysis of big, thick, long, and wide data: data
management for the analysis of a large longitudinal and cross-national narrative data set. Qual
Health Res. https://doi.org/10.1177/1049732318759658
Zimmer M (2010) “But the data is already public”: on the ethics of research in Facebook. Ethics Inf
Technol 12:313–325. https://doi.org/10.1007/s10676-010-9227-5
Theoretical Approach, Methods,
and Definitions
2
To build the foundation for the rest of the book, this chapter provides an overview of
my general theoretical approach to this research, provides a summary of my research
methods, and then defines key terms that I use throughout the book: qualitative data,
qualitative data reuse, big social data, and big social research.
knowledge around data use and reuse, then synthesizes their insights and approaches to
support ethical, legal, and epistemologically sound research and data sharing practices.
To further the goal of understanding the communities investigated in this research
(qualitative and big social research communities, and the data curation community), this
book also incorporates the idea of communities of practice (Lave and Wenger 1991;
Wenger 1998). Communities of Practice Theory helps social science researchers group
and analyze scientific communities, with a goal of explaining how groups of people dis-
seminate knowledge. Wenger et al. (2002) define communities of practice as “groups
of people who share a concern, set of problems, or a passion about a topic, and who
deepen their knowledge and expertise in this area by interacting on an ongoing basis.”
This book examines three distinct communities of practice: qualitative researchers who
reuse or share data, big social researchers, and data curators.
Each community of practice has three key characteristics: their domain, their commu-
nity, and their practice (Wenger et al. 2002). Domain describes a set of shared interests
and disciplines; community forms when those in the domain work together, discuss, and
share the interests and disciplines that characterize their domain; practice includes the
shared research practices, shared jargon, and shared values of each community.
Using qualitative researchers as an example, the domain is (1) the interests of qual-
itative researchers—for example, interest in human behavior, human phenomena, and
qualitative research methods, and (2) the disciplines that these researchers come from—
for example, anthropology, sociology, or health sciences. The community forms when
qualitative researchers meet at conferences, cite each other’s research, or have commu-
nity calls. The practice may include qualitative content analysis, grounded theory, ideas
such as “researcher as instrument,” and the shared commitment to in-depth research into
human behavior.
Communities of practice theory has been widely used in library and information
science, including to study science collaboratories (Bos et al. 2007), to build data man-
agement and digital scholarship services in academic libraries (Smith et al. 2020), and as
a framework for supporting undergraduate student researchers (Pirmann et al. 2023).
2.2 Methods
The research described in this book uses two methods. First, in Chaps. 3, 4 and 5, I
review the existing literature in qualitative data reuse and big social research, synthesize
key ideas, and identify six issues in common between qualitative data reuse and big social
research, with a focus on data curation. Second, in Chaps. 6 and 7, I use the six issues
identified in the earlier chapters to inform semi-structured interviews with three different
types of participants, referred to throughout this book as communities of practice:
2.3 Definitions 11
• big social researchers who have conducted research with big social data
• qualitative researchers who have shared or reused qualitative data
• data curators who have worked with one or both types of data.
Through a qualitative content analysis of these interviews, I confirm and build upon the
six key issues identified in Chaps. 3, 4 and 5, I additionally suggest three new lenses for
considering the practices of these communities: domain differences, strategies for respon-
sible practice, and perspectives on data curation and data sharing. The interview methods
are discussed further in Chap. 6. For full details about research methods and sampling, as
well as interview guides, transcripts, codebook, analysis, and other documentation, please
see the associated dataset:
Mannheimer (2023) Interviews regarding data curation for qualitative data reuse and big
social research. Qualitative Data Repository. https://doi.org/10.5064/F6GWMU4O.
2.3 Definitions
Defining key terms will help readers build a foundation for understanding the research
in this book. Therefore, in this section, I provide in-depth definitions of qualitative data,
qualitative data reuse, big social data, and big social research.
In the broadest sense, qualitative data (in contrast to quantitative data) are data that are
not numeric (Kitchin 2014). To clarify further, while qualitative data may be analyzed to
produce numeric results such as code counts and statistics, the foundational qualitative
data themselves are non-numeric (Greener 2011; DuBois et al. 2018).
Bernard et al. (1986) define the construction of qualitative data in anthropology as “an
interactive process between a researcher, a theory, and the research materials under study,
whether they be people in the field or documents to be examined;” Bernard et al. suggest
four main types of data construction: “(1) relatively open-ended, unstructured interviews
with key informants, (2) structured interviews of respondents who, in the case of surveys,
may number in the hundreds or thousands, (3) direct observation of behavior and envi-
ronmental features, and (4) extraction of information from existing records such as native
texts, court proceedings, marriage records, and so on.” As these passages suggest, qual-
itative data are produced by qualitative research, and therefore the term qualitative data
can be defined by the process that was used to create or collect that data. The National
Endowment for the Humanities Office of Digital Humanities (2019) corroborates this
12 2 Theoretical Approach, Methods, and Definitions
idea, defining data as “materials generated or collected during the course of conducting
research.”
Corti describes qualitative research as “defined by openness and inclusiveness, aiming
to capture participants’ lived experiences of the world and the meanings they attach to
these experiences from their own perspectives” (Corti 1999). To meet the aims described
by Corti, qualitative researchers collect and examine various types of data. Bernard,
Wutich, and Ryan (2017) suggest that qualitative data exist in five formats: (1) physi-
cal objects, (2) still images, (3), sounds, (4) moving images; and (5) texts. In Table 2.1, I
provide examples of public data and private data for each of these categories.
The types of data identified in Table 2.1 are far-reaching and include many types of
data that a qualitative researcher could analyze. I include a variety of examples, including
both analog and digital data, and both small-scale and large-scale data.
Heaton suggests a classification structure for qualitative data that divides these differ-
ent types of data into “non-naturalistic” data (i.e., data that are solicited by researchers
through interviews, questionnaires, etc.), and “naturalistic” data (i.e., data that are
found or collected by researchers with minimal interaction with the research subjects)
(Heaton 2004). In Table 2.2, I suggest some examples of non-naturalistic and naturalistic
qualitative data.
As with the examples of qualitative data listed in Table 2.1, non-naturalistic and nat-
uralistic data can be either analog or digital in format. For example, fieldnotes could
take the form of paper notebooks or word processing documents; diaries could be writ-
ten using pen and paper, kept using a notetaking app, or openly posted online in blog
form; and social interactions could take the form of a face-to-face conversation or a
technology-mediated interaction such as a Twitter exchange or a Reddit thread.
2.3 Definitions 13
For purposes of this book, taking into account the kinds of data listed in Tables 2.1
and 2.2, I define qualitative data as analog or digital objects, images, sounds, moving
images, and texts that are collected and/or analyzed by researchers during the course of
qualitative research.
The term secondary analysis emerged in the 1950s do describe a research methodology
that uses pre-existing data. Lipset and Bendix (1959) provide a simple definition of this
concept: “the study of specific problems through analysis of existing data which were
originally collected for another purpose.”
It should be noted that secondary analysis is distinct from meta-analysis and literature
review. Meta-analysis and literature review synthesize research findings, whereas sec-
ondary analysis uses primary data to generate new insights (Heaton 1998; Thorne 1998).
The definitions of secondary analysis developed over the decades clarify this distinction.
For instance, Glass (1976) suggests that secondary analysis is conducted for the purpose
of “answering the original research question with better statistical techniques or answering
new questions with old data,” and Hakim (1982) defines secondary analysis as “further
analysis of an existing data set which presents interpretations, conclusions, or knowl-
edge additional to, or different from, those presented in the first report on the enquiry
as a whole and its main results.” In her 2004 definition of qualitative secondary analy-
sis, Heaton (2004) additionally brings in the idea of verification, writing that “secondary
analysis is a research strategy which makes use of … preexisting qualitative research
data for the purposes of investigating new questions or verifying previous studies [empha-
sis added].” In order to explain this definition, it is necessary to discuss the concept of
verification in qualitative research.
In the 1970s and 1980s, verification was considered a way to legitimize qualitative
research—to prove its dependability, confirmability, and trustworthiness (Guba 1981;
Scheff 1986; Guba and Lincoln 1989). However, as discussion of qualitative data sharing
increased in the 1990s and 2000s, some began to argue that verification might not be
14 2 Theoretical Approach, Methods, and Definitions
“reuse provides an opportunity to study the raw materials of past research projects to
gain methodological and substantive insights.” van de Sandt et al. (2019) take an even
broader view of data reuse, concluding that reuse can be seen as equal to use. They define
reuse as “the use of any research resource regardless of when it is used, the purpose, the
characteristics of the data and its user.”
One final note: the various definitions reviewed here do not differentiate between data
collected oneself or data collected by another researcher. While some suggest that reusing
one’s own data could reduce challenges and increase benefits (Hinds et al. 1997; Thorne
1998; Heaton 2004; Sherif 2018), Mauthner et al. (1998) write about the challenges they
faced when revisiting their own data for analysis, suggesting that the passage of time
caused reuse of even their own data to be difficult. Irwin (2013) argues that reusing one’s
own data provides a critical distance from which researchers can evaluate the quality
and efficiency of the data from the perspective of new research questions, and they can
identify and provide any missing information. Thus, I consider all data reuse to have
similar benefits and challenges, regardless of who originally collected it. Whatever method
is used while reusing existing data, the epistemological, ethical, and legal issues remain
the same from a data curation perspective.
Taking all of these existing definitions and conversations into account, and limiting my
definition to the scholarly use of data, this book uses the term qualitative data reuse, with
the following definition:
Qualitative data reuse is when researchers use existing qualitative data to refine ideas, gain
new insights, and produce new scholarship.
Big data are often defined in terms of three “Vs”: volume, velocity, and variety (Laney
2001; Diebold 2012; Zikopoulos 2012; Kitchin 2014). That is, big data have large vol-
ume—comprising terabytes or petabytes of data; they have high velocity—the data are
being created continually in real-time; and they exist in a variety of formats and types—
big data may be structured metadata or unstructured text, audio, or video. Boyd and
Crawford (2012) offer additional defining characteristics for big data, writing:
We define Big Data as a cultural, technological, and scholarly phenomenon that rests
on the interplay of
• Mythology: the widespread belief that large data sets offer a higher form of intelligence
and knowledge that can generate insights that were previously impossible, with the aura
of truth, objectivity, and accuracy. (Boyd and Crawford 2012)
Boyd and Crawford’s definition helps to explain the cultural phenomenon that big data
have become in our society. As big data and big data analytics have grown during
the 21st Century, they have captured the imagination of private and public realms,
leading to an era of widespread data-driven decision-making in nearly every industry,
including business (e.g., Chen et al. 2012; Liebowitz 2013; Schroeder 2016; Raguseo
2018), healthcare (e.g., Chawla and Davis 2013; Raghupathi and Raghupathi 2014;
Viceconti et al. 2015; Wang et al. 2018), education (e.g., Picciano 2014; Williamson
2017; Nazarenko and Khronusova 2017), and journalism (e.g., Gray et al. 2012; Lewis
2015; Borges-Rey 2016).
The term big social data (or sometimes big behavioral data) is used to describe big data
that informs social research. The definition of big social data specifically includes the
human traces that are inherent in big data. Amer-Yahia et al. (2010) differentiate between
direct and indirect human participation in big data. Big data resulting from direct human
participation usually take the form of unstructured or semi-structured data such as text,
videos, and audio that are created and shared online (Olshannikova et al. 2017). Big data
resulting from indirect human participation usually take the form of structured metadata
that reflects user behavior such as interactions with interfaces, or the spatial or temporal
aspects of user behavior (Gandomi and Haider 2015). In Table 2.3, I provide examples of
different kinds of big social data, informed by on Amer-Yahia et al. (2010), Olshannikova
et al. (2017), Yanai (2012), Ramasamy et al. (2013), and Drakonakis et al. (2019).
In addition to the table above, I also present Table 2.4, below. Table 2.4 uses a similar
structure to Table 2.1, in Sect. 2.3.1, so as to demonstrate the relationship between big
social data and qualitative data. Contrasting Table 2.1 (Examples of qualitative data based
Table 2.3 Examples of direct human interaction data and indirect human interaction data
Subcategories Examples
Direct Data related to Usernames, passwords, tweets, Instagram photos, TikTok
human individual users videos, tagged photos, @-mentions
interaction Data related to user Direct messages, comments on a news story, Wikipedia
data communication and editing data, Slack chats, videoconferences
dialogue
Indirect Data related to user Followers, likes, views, network analysis data
human relationships
interaction Automatically Timestamps, geospatial data, type of operating system, type
data created metadata of device, application used to post (e.g., a third-party app
such as Tweetdeck or Hootsuite)
2.3 Definitions 17
Table 2.4 Examples of big social data based on form and access
Public Private Ambiguous
Text Online obituaries, twitter Emails, notes taken on Comments on other
posts using hashtags, notetaking apps, short people’s Twitter posts,
blogs, news stories responses to survey questions online forum posts
Images Instagram posts from Personal photos, digital patient Openly accessible
public figures, Flickr scans, Instagram posts from Instagram profile posts
images private profiles from non-public figures
Audio Podcast ads, songs, online Voice memos, voicemail Digital oral histories
news audio clips messages, interview recordings
Video Online news footage, Personal iPhone videos, Videos posted to social
TikTok videos, digital Snapchat video messages media by non-public
films and tv shows figures
on form and access) with Table 2.4 (Examples of big social data based on form and
access) highlights two notable differences between qualitative data and big social data.
First, Table 2.4 does not include “physical objects,” because big social data are by nature
digital. Second, while Table 2.1 categorizes qualitative data into “public” and “private,”
Table 2.4 adds a third category, “ambiguous.” As Nissenbaum suggests in her theory of
contextual integrity (2009), and as is discussed further in Chap. 4, Sect. 4.3.5, big social
data exists in an ambiguous space between private and public; there are some contexts in
which social media users expect privacy, and other contexts in which users consider their
activities to be more public. Therefore, in Table 2.4, the column labeled “ambiguous”
includes examples such as open Instagram posts from non-public figures that may be
accessible publicly, but are designed for a limited, private audience.
Social media is a common source for big social data. Here, I use the term social
media to describe emerging digital technologies associated with Web 2.0 (Wilson et al.
2011), that allow users to post content and interact with other people. Social media is a
broader term than social network site, which is defined by Boyd and Ellison (2007) as a
networked communication platform in which participants “(1) construct a public or semi-
public profile within a bounded system, (2) articulate a list of other users with whom
they share a connection, and (3) view and traverse their list of connections and those
made by others within the system.” The broader term social media includes a wide range
of digital platforms, including not only social network sites but also blogs, microblogs,
photo-sharing sites, video-sharing platforms, social news and gaming, review sites, online
forums, social search and crowd sourcing services, collaboration services, and virtual
worlds (Ishikawa 2015; Olshannikova et al. 2017). The uniting thread among social media
platforms is that they allow users to interact within communities and to create and share
digital content in a networked environment (Ip and Wagner 2008; Lüders 2008; Kim et al.
2010; Wilson et al. 2011; Bechmann and Lomborg 2012). Bechmann and Lomborg outline
18 2 Theoretical Approach, Methods, and Definitions
three characteristics that are commonly emphasized when considering social media as a
social phenomenon:
1. Social media platforms facilitate direct communication between users—that is, communi-
cation is “de-institutionalized”;
2. Users create and share their own content such as text, photos, and videos, in addition to
sharing traditional published content;
3. Social media platforms are interactive and networked. (Bechmann and Lomborg 2012)
A fourth consideration is that social media platforms are often controlled by private,
for-profit companies (Driscoll and Walker 2014). Blog platforms like SquareSpace and
WordPress, microblogs like Twitter, photo-sharing sites like Flickr (owned by Yahoo),
video-sharing sites like YouTube (owned by Google) and TikTok, online forums like Red-
dit (owned by Conde Nast) and Quora, virtual worlds like Facebook’s Metaverse, or the
communities that form among videogame users—these platforms all act as intermediaries
between the human communities that are formed online (Oboler et al. 2012; Fuchs 2017).
All of these considerations regarding social media are therefore key considerations for
researchers who collect and analyze big social data. Big social data come from an online
space with specific characteristics, and access to these data is often controlled by private
companies.
To define big social research, I will begin by outlining two key types of internet-mediated
research: obtrusive and unobtrusive, as defined by Hewson et al. (2016). In Table 2.5,
below, I give examples of obtrusive and unobtrusive internet mediated research.
These types of internet-mediated research are reminiscent of the two types of qualita-
tive data outlined in Table 2.2, in Sect. 2.3.1.: non-naturalistic data, which are solicited
for research studies, and naturalistic data, which are found or collected with minimal
interference by researchers. Applying Hewson et al.’s framework, Heaton’s examples of
non-naturalistic data—e.g., field notes, observational records, interviews, focus groups,
and solicited diaries—would be characterized as resulting from obtrusive research, while
Heaton’s examples of naturalistic data—autobiographies, found diaries, letters, official
documents, photographs, film, and social interaction—would be characterized as resulting
from unobtrusive research.
Big social research is a sub-field of internet mediated research, and it is almost always
conducted using unobtrusive methods (Bright 2017). Additionally, while researchers can
use subsets of data from online sources to conduct traditional, human-coded content anal-
ysis (e.g., Ruthven et al. 2018), conversation analysis (e.g., Paulus et al. 2016), and
online ethnographies (e.g., Caliandro 2018), big social research is by definition large-
scale. Big social research is therefore commonly conducted using computational social
science methods. Computational social science is a “research area at the intersection of
computer science, statistics, and the social sciences, in which novel computational meth-
ods are used to answer questions about society” (Mason et al. 2014). Computational social
science began in the 2000s, and it uses methods such as natural language processing, sen-
timent analysis, network analysis, artificial intelligence, and deep learning techniques to
draw conclusions from big social data (Bankes et al. 2002; Mason et al. 2014; Berkout
et al. 2019).
Taking into account the literature and conversations reviewed above, I define big social
research as follows:
Big social research is when researchers use large-scale data from social media or other online
social spaces to gain insights and produce scholarship.
The theoretical approach, definitions, and methods presented here provide a foundation
for the rest of the book. The definitions of qualitative data reuse and big social research
especially begin to demonstrate the shared characteristics and unique qualities of these two
types of research. The next two chapters review existing literature to further explore these
similarities and differences, identifying key issues that are shared between qualitative
data reuse (Chap. 3) and big social research (Chap. 4). The rest of the book continues to
compare and contrast qualitative data reuse and big social research, aiming to inform data
curation strategies to support epistemologically sound, ethical, and legal data sharing and
use.
20 2 Theoretical Approach, Methods, and Definitions
References
Amer-Yahia S, Doan A, Kleinberg J, Koudas N, Franklin M (2010) Crowds, clouds, and algorithms:
exploring the human side of “big data” applications. In: Proceedings of the 2010 ACM SIG-
MOD international conference on management of data. ACM, Indianapolis Indiana USA, pp
1259–1260
Bankes S, Lempert R, Popper S (2002) Making computational social science effective: epistemology,
methodology, and technology. Soc Sci Comput Rev 20:377–388. https://doi.org/10.1177/089443
902237317
Bechmann A, Lomborg S (2012) Mapping actor roles in social media: different perspectives on value
creation in theories of user participation. New Media Soc 15:765–781. https://doi.org/10.1177/
1461444812462853
Berkout OV, Cathey AJ, Kellum KK (2019) Scaling-up assessment from a contextual behavioral
science perspective: potential uses of technology for analysis of unstructured text data. J Con-
textual Behav Sci 12:216–224. https://doi.org/10.1016/j.jcbs.2018.06.007
Bernard HR, Pelto PJ, Werner O, Boster J, Romney AK, Johnson A, Ember CR, Kasakoff A (1986)
The construction of primary data in cultural anthropology. Curr Anthropol 27:382–396. https://
doi.org/10.1086/203456
Bernard HR, Wutich A, Ryan GW (2017) Analyzing qualitative data: systematic approaches, 2nd
edn. Sage Publications, Los Angeles, CA
Bishop L, Kuula-Luumi A (2017) Revisiting qualitative data reuse: a decade on. Sage Open 7.
https://doi.org/10.1177/2158244016685136
Borges-Rey E (2016) Unravelling data journalism. J Pract 10:833–843. https://doi.org/10.1080/175
12786.2016.1159921
Bos N, Zimmerman A, Olson J, Yew J, Yerkie J, Dahl E, Olson G (2007) From shared databases to
communities of practice: a taxonomy of collaboratories. J Comput-Mediat Commun 12:652–672.
https://doi.org/10.1111/j.1083-6101.2007.00343.x
Bourdieu P (1986) The forms of capital. In: Richardson J (ed) Handbook of theory and research for
the sociology of education. Greenwood, Westport, CT, pp 241–258
Boyd D, Crawford K (2012) Critical questions for big data: provocations for a cultural, techno-
logical, and scholarly phenomenon. Inf, Commun Soc 15:662–679. https://doi.org/10.1080/136
9118X.2012.678878
Boyd D, Ellison N (2007) Social network sites: definition, history, and scholarship. J Comput Mediat
Commun 13:210–230. https://doi.org/10.1111/j.1083-6101.2007.00393.x
Bright J (2017) ‘Big social science’: doing big data in the social sciences. In: Fielding NG, Lee RM,
Blank G (eds) The sage handbook of online research methods. Sage Publications, London, UK,
pp 125–139
Caliandro A (2018) Digital methods for ethnography: analytical concepts for ethnographers explor-
ing social media environments. J Contemp Ethnogr 47:551–578. https://doi.org/10.1177/089124
1617702960
Castells M (2000) Materials for an exploratory theory of the network society. Br J Sociol 51:5–24.
https://doi.org/10.1111/j.1468-4446.2000.00005.x
Chawla NV, Davis DA (2013) Bringing big data to personalized healthcare: a patient-centered
framework. J Gen Intern Med 28:660–665. https://doi.org/10.1007/s11606-013-2455-8
Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: from big data to big
impact. MIS Q 36:1165–1188. https://doi.org/10.2307/41703503
Corti L (1999) Text, sound and videotape: the future of qualitative data in the global network.
IASSIST Q 23:15. https://doi.org/10.29173/iq726
References 21
Corti L (2000) Progress and problems of preserving and providing access to qualitative data for
social research—the international picture of an emerging culture. Forum Qualitative Sozial-
forschung/Forum: Qual Soc Res 1. https://doi.org/10.17169/fqs-1.3.1019
Cronin B (2008) The sociological turn in information science. J Inf Sci 34:465–475. https://doi.org/
10.1177/0165551508088944
Diebold FX (2012) A personal perspective on the origin(s) and development of “big data”: the phe-
nomenon, the term, and the discipline, second version. PIER Working Paper No 13–003. https://
doi.org/10.2139/ssrn.2202843
Drakonakis K, Ilia P, Ioannidis S, Polakis J (2019) Please forget where I was last summer: the privacy
risks of public location (meta) data. In: Proceedings 2019 network and distributed system security
symposium. Internet Society, San Diego, CA
Driscoll K, Walker S (2014) Working within a black box: transparency in the collection and produc-
tion of big Twitter data. Int J Commun 8:20
DuBois JM, Strait M, Walsh H (2018) Is it time to share qualitative research data? Qual Psychol
5:380–393. https://doi.org/10.1037/qup0000076
Fuchs C (2017) Social media: a critical introduction. Sage Publications
Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf
Manage 35:137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007
Garfinkel H (1967) Studies in ethnomethodology. Polity Press, Cambridge, UK
Glass GV (1976) Primary, secondary, and meta-analysis of research. Educ Res 5:3–8. https://doi.org/
10.2307/1174772
Gray J, Bounegru L, Chambers L (eds) (2012) The data journalism handbook. European Journalism
Centre, Brussels, Belgium
Greener I (2011) Designing social research: a guide for the bewildered. Sage Publications, London,
UK
Guba EG (1981) Criteria for assessing the trustworthiness of naturalistic inquiries. Educ Commun
Technol 29:75–91. https://www.jstor.org/stable/30219811
Guba EG, Lincoln YS (1989) Fourth generation evaluation. Sage Publications, Thousand Oaks, CA
Hakim C (1982) Secondary analysis in social research : a guide to data sources and methods with
examples. Allen and Unwin, London, UK
Hammersley M (1997) Qualitative data archiving: some reflections on its prospects and problems.
Sociology 31:131–142. https://doi.org/10.1177/0038038597031001010
Heaton J (2004) Reworking qualitative data. Sage Publications, London, UK
Heaton J (1998) Secondary analysis of qualitative data. Social Research Update 6
Hewson C, Vogel C, Laurent D (2016) Internet-mediated research: state of the art. In: Internet
research methods. Sage Publications, London, UK
Hinds PS, Vogel RJ, Clarke-Steffen L (1997) The possibilities and pitfalls of doing a secondary anal-
ysis of a qualitative data set. Qual Health Res 7:408–424. https://doi.org/10.1177/104973239700
700306
Ip RKF, Wagner C (2008) Weblogging: a study of social computing and its impact on organizations.
Decis Support Syst 45:242–250. https://doi.org/10.1016/j.dss.2007.02.004
Irwin S (2013) Qualitative secondary data analysis: ethics, epistemology and context. Prog Dev Stud
13:295–306. https://doi.org/10.1177/1464993413490479
Ishikawa H (2015) Social big data mining. CRC Press, Boca Raton, FL
Kim W, Jeong O-R, Lee S-W (2010) On social web sites. Inf Syst 35:215–236. https://doi.org/10.
1016/j.is.2009.08.003
Kitchin R (2014) The data revolution: big data, open data, data infrastructures & their consequences.
Sage Publications, Los Angeles, CA
Laney D (2001) 3D data management: controlling data volume, velocity and variety. Meta Group
22 2 Theoretical Approach, Methods, and Definitions
Rogers EM (2003) Diffusion of innovations, 5th edn. Free Press, New York, NY
Ruthven I, Buchanan S, Jardine C (2018) Relationships, environment, health and development: the
information needs expressed online by young first-time mothers. J Am Soc Inf Sci 69:985–995.
https://doi.org/10.1002/asi.24024
Scheff TJ (1986) Toward resolving the controversy over “thick description.” Curr Anthropol 27:408–
409. https://doi.org/10.1086/203460
Schroeder R (2016) Big data business models: challenges and opportunities. Cogent Soc Sci 2.
https://doi.org/10.1080/23311886.2016.1166924
Sherif V (2018) Evaluating preexisting qualitative research data for secondary analysis. Forum
Qualitative Sozialforschung/Forum: Qual Soc Res 19. https://doi.org/10.17169/fqs-19.2.2821
Smith PL, Felima C, Durant F, Van Kleeck D, Huet H, Taylor LN (2020) Building socio-technical
systems to support data management and digital scholarship in the social sciences. In: Crowder
JW, Fortun M, Besara R, Poirier L (eds) Anthropological data in the digital age: new possibili-
ties—new challenges. Springer International Publishing, Cham, Switzerland, pp 31–57
Stenbacka C (2001) Qualitative research requires quality concepts of its own. Manag Decis 39:551–
556. https://doi.org/10.1108/EUM0000000005801
Talja S, Tuominen K, Savolainen R (2005) “Isms” in information science: constructivism, collec-
tivism and constructionism. J Doc 61:79–101. https://doi.org/10.1108/00220410510578023
Thorne S (1998) Ethical and representational issues in qualitative secondary analysis. Qual Health
Res 8:547–555. https://doi.org/10.1177/104973239800800408
Thorne S (2004) Secondary analysis of qualitative data. The Sage encyclopedia of social science
research methods
Tsai AC, Kohrt BA, Matthews LT, Betancourt TS, Lee JK, Papachristos AV, Weiser SD, Dworkin SL
(2016) Promises and pitfalls of data sharing in qualitative research. Soc Sci Med 169:191–198.
https://doi.org/10.1016/j.socscimed.2016.08.004
van de Sandt S, Dallmeier-Tiessen S, Lavasa A, Petras V (2019) The definition of reuse. Data Sci J
18:22. https://doi.org/10.5334/dsj-2019-022
Viceconti M, Hunter P, Hose R (2015) Big data, big knowledge: big data for personalized healthcare.
IEEE J Biomed Health Inf 19:1209–1215. https://doi.org/10.1109/JBHI.2015.2406883
Wang Y, Kung L, Byrd TA (2018) Big data analytics: understanding its capabilities and potential
benefits for healthcare organizations. Technol Forecast Soc Chang 126:3–13. https://doi.org/10.
1016/j.techfore.2015.12.019
Wenger E (1998) Communities of practice: learning, meaning, and identity. Cambridge University
Press, Cambridge, UK
Wenger E, McDermott RA, Snyder W (2002) Cultivating communities of practice: a guide to man-
aging knowledge. Harvard Business School Press, Boston, MA
Williamson B (2017) Big data in education: the digital future of learning, policy and practice. Sage
Publications, London, UK
Wilson DW, Lin X, Longstreet P, Sarker S (2011) Web 2.0: a definition, literature review, and
directions for future research
Yanai K (2012) World seer: a realtime geo-tweet photo mapping system. In: Proceedings of the 2nd
ACM international conference on multimedia retrieval. Association for Computing Machinery,
Hong Kong, China, pp 1–2
Zikopoulos P (2012) Understanding big data: analytics for enterprise class Hadoop and streaming
data. McGraw-Hill, New York, NY
Qualitative Data Reuse in Practice
3
The practice of data reuse goes back to the first part of the twentieth century, when
researchers began reusing survey data in an effort to “save time, money, careers, degrees,
research interest, vitality, and talent, self-images and myriads of data from untimely,
unnecessary, and unfortunate loss” (Glaser 1963). The earliest book describing secondary
analysis in detail was published in 1972 (Hyman 1972), and a major symposium, Sec-
ondary Analysis of Existing Data Sets: For What Purpose and Under What Condition, was
held at the Annual Meeting of the American Educational Research Association in New
York in 1977. Since then, quantitative data reuse has generated an expansive body of
literature, including educational texts on finding and analyzing statistical datasets (e.g.,
Hakim 1982; Kiecolt and Nathan 1985; Smith 2008), and literature examining the episte-
mological, ethical, and legal implications of reusing existing quantitative data in the social
sciences (e.g., de Lusignan et al. 2007; Goodwin 2012; Duke and Porter 2013; Hartter
et al. 2013).
As early as 1962, Glaser wrote that “secondary analysis is not limited to quantita-
tive data. Observation notes, unstructured interviews, and documents can also be usefully
reanalyzed” (Glaser 1962). However, despite this early mention, qualitative data reuse did
not become a common practice until the 1990s (e.g., Thorne 1994; Hammersley 1997;
Hinds et al. 1997; Szabo and Strang 1997; Heaton 1998; Mauthner et al. 1998; Corti
1999; Thompson 2000).
The practice of qualitative data reuse continued to grow through the 1990s and 2000s.
Some still questioned whether reusing qualitative data was “tenable, given that it is
often thought to involve an intersubjective relationship between the researcher and the
researched” (Heaton 1998), but a growing faction of researchers, funding agencies, and
Qualitative data reuse has increased in the twenty-first century as the scholarly community
becomes more attuned to its potential benefits. As Mauthner writes, “the case for sharing
data rests on three central pillars: a scientific, a moral, and an economic one” (Mauthner
2012).
3.2 Benefits of Qualitative Data Reuse 27
• Avoiding duplication of effort and allowing the conservation of time and resources,
therefore supporting a higher return on investment. A 2013 study conducted on the
UK’s Economic and Social Data Service, Archaeology Data Service, and British
Atmospheric Data Centre emphasized the economic benefit of data sharing, finding
that researchers who used these data archives saw increases in research, teaching and
28 3 Qualitative Data Reuse in Practice
studying efficiency, and these research gains outweighed the costs of establishing and
maintaining the data archives (Beagrie and Houghton 2014).
In addition to the potential benefits discussed above, qualitative data reuse raises episte-
mological, ethical, and legal issues. The epistemological issues of context, data quality
and trustworthiness, and data comparability are concerns about the scholarly legitimacy
and usefulness of the data—how well can future researchers truly understand the data, and
can we ensure that research that reuses qualitative data will be credible and conclusive.
The ethical and legal issues of informed consent, privacy and confidentiality, and
intellectual property and data ownership are concerns about the rights of research sub-
jects—ensuring that research participants are informed and protected. Researchers are
guided by laws, regulations, and ethical frameworks designed specifically for research.
These guidelines are built upon the values of academic disciplines and the guidelines
of professional organizations and learned societies, as well as ethics regulatory guidance
like the Nuremberg Code (BMJ 1996), the Declaration of Helsinki (World Medical Asso-
ciation 2013), the Belmont Report (National Commission for the Protection of Human
Subjects of Biomedical and Behavioral Research 1979), and the Federal Policy for the
Protection of Human Subjects, or “Common Rule” (U.S. Department of Health and
Human Services 1991). Most recently, the General Data Protection Regulations in the
European Union have brought an increased awareness to ethical data use (Voigt and von
dem Bussche 2017). Professional working groups such as Force11/COPE Research Data
Publishing Ethics working group (Puebla and Lowenberg 2021), and organizations such
as the International Data Spaces Association (IDS Association 2022) also point toward
an emerging infrastructure to support ethical and legal data practices in qualitative data
reuse.
I discuss six key epistemological, ethical, and legal issues below: context, data quality
and trustworthiness, data comparability, informed consent, privacy and confidentiality, and
intellectual property and data ownership.
3.3.1 Context
Qualitative research is a process that may include deep and prolonged contact and con-
nection with research subjects with the goal of understanding the subjects within their
own context (Miles et al. 2020). Qualitative data are therefore highly context depen-
dent. Insights are created through not only reviewing the data, but also through a deep
knowledge of the research context and research subjects. That is, in qualitative research,
3.3 Issues in Qualitative Data Reuse 29
“meaning is made rather than found” (Mauthner et al. 1998). This meaning is made
through the data collection process itself—which can be deeply affected by researchers’
own cultural experiences, biases, and decision-making processes. Meaning is additionally
made through the process of data analysis, which is likewise affected by the unique per-
spective of the data analyst (Thorne 1994; Tsai et al. 2016). As Hinds et al. (1992) write,
“context is a source of data, meaning, and understanding… Ignoring context, underusing
it, or not recognizing one’s own context-driven perspective will result in incomplete or
missed meaning and a misunderstanding of human phenomena.” The literature reflects
the importance of considering whether data can be properly understood outside of their
original context, without the nuanced knowledge and expertise of the researchers who
conducted the original research project and originally analyzed the data. As Broom et al.
(2009) suggest, “the idea that data can be neutralized and deposited into an archive, ready
to be ‘picked up’ by others, sits uncomfortably for many.” Dale et al. (1988) voice this
discomfort, writing, “it seems unlikely that the re-analysis of either interview transcripts
or field notes by an outsider could give more than a partial understanding of the research
issues.” Pasquetto et al. (2019) write that “removing data from their original context nec-
essarily involves information loss” stemming from small adjustments that may be made to
the data during research and the loss of other deep knowledge of the research that data cre-
ators hold but may not be able to communicate in a dataset description. Responding to the
idea that some contextual information is either undocumented or undocumentable, some
go so far as to say that data reusers should contact or collaborate with the researchers
who originally collected the data (Hinds et al. 1997; Szabo and Strang 1997; Heaton
2008). However, this strategy is impractical for long-term use of data beyond the lifetime
of the original researchers, and furthermore, the original researchers themselves may not
remember the full context. Mauthner and Parry discuss in several articles the difficulty of
maintaining the context of data, even when attempting to reuse data that they themselves
had previously collected (Mauthner et al. 1998; Parry and Mauthner 2004; Mauthner and
Parry 2009). As Thorne (1994) describes, researchers may “make mental notes” about
participants, settings, and other details that may never be documented in field notes or
memos, and may be forgotten later.
Hinds et al. (1997) frame distance from the original context of the data as a possible
benefit, arguing that distance can free a researcher from developing fixed ideas about the
phenomena reflected in the dataset, so long as the secondary researcher has enough knowl-
edge of the original context to prevent misinterpretation. Data curation strategies can also
support communication of context. A number of scholars argue that contextual knowledge
can be provided through proper metadata and documentation (Corti 1999, 2000; Field-
ing 2004; van den Berg 2005; Goodwin and O’Connor 2006; Elman and Kapiszewski
2014; Bernard et al. 2017). Metadata and documentation are discussed in more detail in
Sect. 3.4.1.
30 3 Qualitative Data Reuse in Practice
Any reuse of qualitative data relies on the data’s quality and trustworthiness, espe-
cially when the data were collected by other researchers. Before the data can be reused,
researchers need to spend time reviewing the dataset in order to assess the quality of the
data (McCall and Appelbaum 1991; Yoon 2017). Sherif (2018) advises that “the original
data must allow the researcher conducting secondary analysis to understand examined
processes, relationships, and subjective meanings.” Hinds et al. (1997) suggest reviewing
three randomly selected interviews to determine whether the larger dataset can be used
to achieve the research goals of any contemplated new study. Stenbacka (2001) suggests
looking at four different dimensions when evaluating a dataset for reuse: “validity, relia-
bility, generalizability and carefulness.” I further examine these dimensions of data quality
below.
Validity may be affected by errors made during the research process—by research
subjects, by reporters or recorders of field data, by researchers, or by data coders. Simple
mistakes or inaccuracies can occur throughout the process. And systematic errors can be
introduced into datasets as a result of bias related to personal identity, political ideology,
general personality, or assumptions. Bernard et al. (1986) suggest that “researchers using
archival material need actively to consider potential biases and then, whenever possible,
test for them.” Reliability can be measured by examining the credentials of the data cre-
ators and understanding other factors that affect the data collection such as training and
time spent collecting data (Hinds et al. 1997). Reliability can also be determined by eval-
uating the completeness and accuracy of the dataset. Generalizability can be measured
partly by examining the breadth and depth of the dataset, to determine whether the data
are appropriate for reuse. The idea of generalizability also overlaps with data compara-
bility, which I examine further in Sect. 3.3.3. Carefulness can be demonstrated through
thoughtful and thorough documentation. Data curators can contribute to data trustworthi-
ness by co-producing data with data producers—providing data management, curation,
and metadata support to increase data quality (Giarlo 2013; Frank et al. 2017; Yoon
2017; ICPSR 2019; Yoon and Lee 2019). Data repositories and academic libraries also
support trust through certifications such as the CoreTrustSeal Trustworthy Data Reposito-
ries Requirements (CoreTrustSeal 2023) and the TRUST principles for digital repositories
(Lin et al. 2020). I further discuss metadata and data archiving in Sect. 3.4.
When reusing data, researchers must determine whether the primary data can be under-
stood or analyzed in a way that is applicable to the study reusing the data. Because
qualitative data tends to be relatively unstructured, complex, and varied (Heaton 2004), it
3.3 Issues in Qualitative Data Reuse 31
can be difficult to fit a primary dataset into a secondary research question. When attempt-
ing to compare and combine qualitative datasets, the literature suggests that researchers
use three strategies: (1) identify the extent of missing data; (2) identify how well the
research questions converge in the primary research and secondary research; and (3) assess
the methods used to produce the primary data (Thorne 1994; Hinds et al. 1997; Heaton
2004).
Another challenge for data comparability is that qualitative researchers often use pro-
prietary qualitative data analysis software such as NVivo and Atlas.ti. These proprietary
software programs may not be interoperable and could cause challenges for data reuse.
Some research has begun to support standardized formats and interoperability (Corti and
Gregory 2011; Evers et al. 2020), but more advocacy for this approach is needed. Data
curators can support comparability of qualitative datasets by encouraging researchers who
publish qualitative data to include clear documentation addressing missing data, research
questions, and methods, by using standardized metadata, and by advocating for open
source software and interoperable formats (Karcher et al. 2021). Data curation strategies
are further discussed in Sect. 3.4.
Qualitative researchers have long debated whether participants’ consent can ever be truly
informed, due to the developmental, reflexive nature of research (Parry and Mauthner
2004). In fact, some go so far as to suggest implementing “process consent”—a structure
in which research subjects continually consent to their participation as the researchers’
ideas and inquiries evolve (Lawton 2001). However, other researchers advocate for strik-
ing a balance that protects participants without overly obstructing the research process
(Wiles et al. 2007; Alexander et al. 2020).
Consent for qualitative data reuse is even more thorny. When reusing data from previ-
ous studies, some argue that consent should be re-obtained from the original participants.
This strategy is also called the selective, repeated, or reconsent model, in which partici-
pants consent anew to each future use of their data (Master and Resnik 2013; Joly et al.
2015). As Thorne (1994) writes, “there may be especially sensitive instances in which
the implied consent of original subjects cannot be presumed.” However, Heaton (1998)
suggests that reconsent may often prove too difficult: “given that it is usually not fea-
sible to seek additional consent, a professional judgement may have to be made about
whether reuse of the data violates the contract made between subjects and the primary
researchers.” In a later paper, Heaton suggests that “it may be inappropriate to generalise
about the need to obtain informed consent for secondary analyses, as this is likely to vary
according to the characteristics of the secondary study.”
A common strategy to support informed consent for data reuse is to include a clause
in the consent form detailing any potential future data sharing, also referred to as broad
32 3 Qualitative Data Reuse in Practice
consent (U.S. Department of Health and Human Services 2017). As Hinds et al. (1997)
write, “a researcher planning a secondary analysis will doubtlessly feel more ethically
correct if permission from the participants in the primary study has been solicited at the
time of the primary study.” Tiered consent (also called flexible consent, line-item consent,
or multilayered consent) can be useful for research in which participants consent to data
reuse. The tiered consent model provides participants with a wider variety of options for
data sharing—for example, opting out of data sharing completely, consenting to restricted
data sharing only, or allowing participants the opportunity to review the data prior to
sharing (Tiffin 2018; VandeVusse et al. 2022). Regardless of consent strategy, questions
remain about how well research participants understand the full implications of data shar-
ing. In a recent study on abortion reporting, VandeVusse et al. (2022) found that many
participants who agreed to “data sharing” misunderstood the term to mean dissemination
of research results, even though the consent form contained a detailed description of how
the research data would be shared.
The General Data Protection Guidelines (GDPR) in the European Union regulate and
define the obligation to communicate clearly about data sharing. GDPR requires that if
a data controller (i.e., a person or organization that controls data processing) “intends
to process personal data for a purpose other than that for which it was collected, it
should provide the data subject prior to that further processing with information on that
other purpose and other necessary information” (Voigt and von dem Bussche 2017). A
comparable set of guidelines does not exist in the United States.1 However, the revised
Common Rule, which went into effect in 2019, adds more explicit guidelines for sec-
ondary research, including the idea of broad consent (U.S. Department of Health and
Human Services 2017). While secondary data use is still viewed as exempt from ethics
review, Exemption 7 and Exemption 8 in the revised Common Rule now explicitly state
that broad consent must be obtained from primary research participants in order for sec-
ondary research with identifiable human subjects data to be considered exempt (Office
for Human Research Protections 2018). Institutional Review Boards (IRBs) that oversee
ethical practice in human subjects research in accordance with the Common Rule are
increasingly beginning to provide template language that researchers can use to obtain
broad consent and thus support data reuse (Lavori et al. 1999; Siminoff 2003; Elman
et al. 2018; Cornell Research Services 2022), and in May 2022, NIH released guidance
on consent language for data reuse, indicating that such language may increasingly be
standardized (NIH 2022).
However, broad consent is not a perfect solution, especially when viewed through
the lens of feminist and post-colonial theories, which consider power structures between
1 The California Consumer Privacy Act, which went into effect in the state of California in January
2020, dictates that “a business that sells the personal information of consumers shall provide the
notice of right to opt-out” (State of California 2020). Vermont also enacted Act “No. 171. An act
relating to data brokers and consumer protection” in May 2018 (State of Vermont 2018). However,
these acts do not extend to non-commercial reuse of data.
3.3 Issues in Qualitative Data Reuse 33
researchers and research subjects. There is concern that broad consent exposes respon-
dents to uncertain future risks and “marginalizes respondents’ moral and political rights to
retain on-going involvement and decision-making powers in how their data will be used
in the future” (Mauthner and Parry 2013).
When sharing qualitative data for future reuse, researchers use various strategies to protect
the confidentiality of participants in adherence to ethical and legal standards. Data dei-
dentification procedures attempt to disguise the identity of participants by deleting their
real names or using pseudonyms, by removing any potentially identifying specifics about
their lives and experiences, or amalgamating or aggregating data (Clark 2006; Garfinkel
2015). However, some qualitative researchers describe challenges that may arise during
the deidentification process. I review these challenges below.
A commonly-cited issue is that that “removal of key identifying characteristics of
research participants may…compromise the integrity and quality of the data, or even
change their meaning” (Parry and Mauthner 2004). On the other hand, if too much con-
textual information is present in a dataset—exactly the kind of contextual information that
is necessary to understand and reuse the data in the first place—the deidentification may
be compromised, thus risking deductive disclosure (Tsai et al. 2016; Myers et al. 2020).
Other issues that may affect privacy and confidentiality are limited time and financial
resources required (Dorr et al. 2006), and potential technical challenges when deiden-
tifying audiovisual data (Marschik et al. 2023). Additionally, deidentification should be
conducted especially thoroughly when participants come from vulnerable populations—
e.g., children, people involved in illegal activities, or respondents from marginalized and
minoritized communities such as Black, Indigenous, LGBTQIA+, or disabled communi-
ties. Participants from these communities may face high risk if the deidentified data are
able to be reidentified (Rothstein 2010). Smaller, more tight-knit communities may also
need more careful deidentification practices to avoid potential identification of research
participants (Ellard-Gray et al. 2015).
In addition to these limitations, some argue that there are instances in which deidentifi-
cation may not in fact be desirable (Turnbull 2000; Moore 2012). Moore (2012) considers
the feminist ethics of care and giving credit, showing that many studies point to “the need
for, and benefits of, a careful situated and negotiated ethical practice around naming or
anonymization.”
Data curators can support deidentification practices by providing resources and ser-
vices. If deidentification is not possible or desirable, data repositories can also protect
privacy and confidentiality by facilitating restrictions to data access and use (Antes et al.
2018). Access controls are discussed further in Sect. 3.4.2.
34 3 Qualitative Data Reuse in Practice
Intellectual property is a key consideration for qualitative data reuse (Fienberg et al. 1985;
Mauthner et al. 1998; Heaton 2004). As the United States statute states, “copyright protec-
tion subsists... in original works of authorship fixed in any tangible medium of expression”
(17 U.S. Code § 102 1990). This means that research participants hold copyright over their
own qualitative responses, and copyright holders have exclusive rights to distribute and
use their works. As my coauthors and I write in 2019, “per this form of intellectual prop-
erty protection, when someone else holds the copyright in some of a scholar’s data and she
was not legally assigned that right, her ability to grant others access to those data may be
limited” (Mannheimer et al. 2019). In order for researchers to publish the text of research
participant responses, participants may need to either waive their rights or license their
responses for use in the research study (Parry and Mauthner 2004). To further complicate
matters, universities often claim ownership of research data from affiliated researchers
(Steneck 2007).
A data use agreement or licensing agreement outlines the rights, responsibilities, and
obligations of the original and secondary researchers, and may include “a description
of the data that were accessed (e.g., interviews, demographic data), method of access
(i.e., via computer software), and provisions for reference citations in publications and
presentations” (Szabo and Strang 1997). While such licensing could be organized as part
of a research study, if no license or other permission exists, the “fair use” exemption
offers a potential venue for future researchers to reuse qualitative data. According to
Hirtle, Hudson, and Kenyon,
Fair use… ensures that the balance between the interests of copyright owners and users can
be maintained and that copyright law does not stifle the very creativity it is intended to foster.
On a very practical level, it provides important protections to libraries, archives, and nonprofit
educational institutions. When those organizations have a reasonable belief that their use of a
copyrighted work is a fair use, many of the most stringent remedies in copyright law cannot
be applied. (Hirtle et al. 2009)
The fair use exemption is an important one for researchers reusing qualitative data,
whose purpose in using the data is likely to be scholarly or educational, and for non-
commercial purposes.
How researchers address intellectual property and data ownership may vary according
to how and where the data were collected. For example, when collecting data from Indige-
nous communities, additional considerations come into play, such as the CARE principles
(Carroll et al. 2021) and the First Nations Principles of Ownership, Control, Access, and
Possession (OCAP®) (FNIGC 2010). Such principles provide guidelines for qualitative
researchers and communities who contribute to research to engage with “concerns about
fairness, trust, and accountability” and enable contributing communities, “as collectives,
to have a say in how their data actually gets used” (Carroll et al. 2021).
3.4 Data Curation to Support Qualitative Data Reuse 35
In a 2021 survey that asked researchers about their data sharing practices, more than
half of respondents reported needing help with copyright and licensing (Simons et al.
2021). Data curators can advise researchers on data licensing for shared data; they can also
help researchers with rights clearance, rights management, and data citation to support
qualitative data reuse (Cox et al. 2017). Data curation strategies are further discussed in
Sect. 3.4.
The literature published by the qualitative research community and the data curation com-
munity discuss a variety of data curation and archiving practices that respond to the issues
described above. These practices can be grouped into two main categories: (1) metadata
and documentation; (2) data repositories and professional data curation. While the data
curation structures and practices described below cannot address every issue, they do
demonstrate that qualitative researchers and data curators are developing a set of strategies
to facilitate ethical, legal, and with epistemologically sound qualitative data reuse.
Metadata and contextual information can serve to prevent “serious misinterpretations and
biases in analysis” (White 1991), or secondary researchers making “bolder claims than
they otherwise might” (Fienberg et al. 1985). Contextual documentation could include
field notes, research diaries, correspondence, and methodological information (Corti and
Thompson 1998; Fink 2000; Karcher et al. 2021). According to Corti, “for archives,
documentation of the research process provides some degree of the context, and whilst it
cannot compete with being there, field notes, letters and memos documenting the research
can serve to help aid the original fieldwork experience” (Corti 2000). White suggests
that researchers should prepare highly explicit codebooks to help future users replicate
the coding process. These codebooks should contain “information on everything known
about the reliability, validity, and coding problems of specific variables, extensive coding
notes on problematic individual cases, page references to and quotes from the original
ethnographic sources from which the coding inferences were made, plus multiple codings
wherever they were done and multiple measures of the same variables wherever possible”
(White 1991). Hinds et al. especially emphasize documentation as a mechanism for help-
ing future researchers “feel close to a condition of ‘having been there’ and to imagine the
emotions and cognitions experienced by the participants and the researchers during data
collection and analysis” (Hinds et al. 1997).
36 3 Qualitative Data Reuse in Practice
Faniel et al. (2019) interviewed and observed researchers to understand data reuse
from the reuser’s perspective. Faniel et al.’s findings emphasize three types of informa-
tion to facilitate data reuse: (1) data production information, including information about
data collection, specimen and artifact details, data producer information, data analysis
methods, any missing data, and research objectives; (2) repository information, including
provenance, reputation and history of the repository, and curation and digitization activi-
ties; and (3) data reuse information, including prior reuse, terms of use, and guidance on
reuse.
Initiatives such as Open Context (Kansa and Kansa 2018), and the Data Curation
Network (Johnston et al. 2018) help researchers and data repositories create documen-
tation for qualitative research that enhances contextual integrity for data reuse. Data
repositories can also encourage researchers to augment their data deposits with any
additional materials or information that could provide context to research data. This
could include documentation about research methods and practices, consent form(s), IRB
approval number, information about the selection of interview subjects and interview set-
ting, instructions given to interviewers, data collection instruments, steps taken to remove
direct identifiers in the data, problems that arose during the selection and/or interview
process and how they were handled, and interview roster (ICPSR 2012). The Annota-
tions for Transparent Inquiry initiative supports contextual information and cross-linking.
Possible annotations include: excerpt from a textual source (e.g., an excerpt from the
transcription for handwritten material, audiovisual material, or material generated through
interviews or focus groups); source excerpt translation; analytic note (i.e., discussions that
illustrate how the data were generated and/or analyzed and how they support the empir-
ical claim or conclusion being annotated in the text); a link to the data source; and the
full citation for an excerpted source (Karcher and Weber 2019). Qualitative Data Reposi-
tory’s data curation handbook provides guidelines for contacting and interacting with the
data depositor, file processing procedures, data-level and project-level metadata, terms
of use, access conditions and restrictions, publication procedures, and post-publication
procedures (Demgenski et al. 2021).
In 2000, Corti raised several open questions regarding metadata standards for qual-
itative data: “Are the existing standards for study description for numerical datasets
adequate? How do the emerging document type definition standards for data suit quali-
tative data? Do they need to be extended or reworked? At the same time, how relevant
are standards adopted by the “traditional” and library communities for more complex
qualitative material?” (Corti 2000). In the years since Corti asked these questions, sev-
eral initiatives have been developed to support metadata for qualitative data. The Data
Documentation Initiative (DDI) (DDI Alliance 2022) was initially created to create stan-
dardized metadata for quantitative social science data, but DDI metadata can be applied
at the study level to describe qualitative research. Issues that may complicate the applica-
tion of DDI metadata to qualitative data include “complex study designs and relationships
between files, the need to preserve the hierarchical structure of codes, and the attachment
3.4 Data Curation to Support Qualitative Data Reuse 37
Generally, data are shared in three ways: as appendices to papers and books, upon request,
or more formally via a data repository (Fienberg et al. 1985). However, it is becom-
ing more common for data repositories to be the preferred sharing method. Notably, the
data sharing and data management plans required by funders like NSF and NIH gener-
ally ask researchers to formally state how the data will be publicly shared, which has
driven an increased demand for data curation and data repository services. Beyond fun-
der requirements, data repositories are a growing infrastructure to support data sharing
and preservation as part of the broader context of scholarly communication. Data repos-
itory staff can encourage researchers from early stages of their projects to consider how
to support findable, accessible, interoperable, and reusable (FAIR) data (Wilkinson et al.
2016). This includes providing guidance on data documentation, facilitating data licens-
ing, implementing machine-readable metadata, optimizing data records for search and
discovery, and ensuring long-term preservation for published datasets (Demgenski et al.
2021). Data repositories can also provide restricted access to datasets that may not be
appropriate for public sharing—for example, video data that cannot be deidentified or
sensitive data that should not be widely distributed. Access to datasets can be embargoed
Discovering Diverse Content Through
Random Scribd Documents
The Project Gutenberg eBook of Practical
Methods of Sewage Disposal for Residences,
Hotels and Institutions
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
Language: English
Sewage. Frontispiece.
Practical Methods of Sewage
Disposal
FOR RESIDENCES, HOTELS AND
INSTITUTIONS
BY
HENRY N. OGDEN
M. AM. SOC. C.E.
Professor of Sanitary Engineering, Cornell University
AND
H. BURDETT CLEVELAND
ASSOC. M. AM. SOC. C.E.
Principal Assistant Engineer, New York State Department of Health
FIRST EDITION
FIRST THOUSAND
NEW YORK
JOHN WILEY & SONS
London: CHAPMAN & HALL, Limited
1912
Copyright, 1912, by
Henry N. Ogden
and
H. Burdett Cleveland
Chapter I. Introductory
PAGES
FIGURE PAGE
PRACTICAL METHODS OF
SEWAGE DISPOSAL
FOR
RESIDENCES, HOTELS AND
INSTITUTIONS
CHAPTER I
INTRODUCTORY
The problem of sewage disposal for a single house differs from the
corresponding problem for a city chiefly in two ways: first, because
in the city it is becoming, if it has not, indeed, already become, a
necessity, and city authorities, though somewhat reluctantly, are
willing to grant the necessary appropriation to secure engineering
advice which will solve the problem in a scientific as well as
economic fashion. In the case of a single house, whether a farm-
house or a villa, the necessity of employing competent engineering
advice has not been generally recognized, and no attempt has been
made to solve the problem of sewage disposal in a scientific manner.
Cesspools have been considered the only way of caring for sewage
in places where a running stream was not available, or where
attempts were made to protect such a stream from pollution, and
while, in these last few years, crude attempts have been made to
utilize the so-called septic tank, such attempts have generally been
so unintelligent that the results have been anything but satisfactory.
Since it has been understood that insects, such as flies and
mosquitoes, play an important part in the transmission of disease,
the danger of overflowing cesspools and of open ditches in which
stagnant sewage is present, has been appreciated; also the higher
standards of living which have made themselves felt throughout the
rural community have demanded in farm-houses and country homes
sanitary conveniences which have hitherto been wanting.
Gradually every house is using more and more water for various
purposes, and living conditions, which in the past tolerated a scanty
supply drawn from a pump, are no longer endured. The increased
water supply and the demands of extended plumbing mean a
greater amount of sewage—so great an amount that, in many cases,
soils which could receive and digest the waste waters from houses
supplied by wells are clogged and made impervious by this greater
amount.
Further, the danger to wells from the infiltration of cesspools is more
feared, and it is understood as never before that in order to maintain
the highest degree of health in a family the drinking-water used
must be above suspicion and not subject to contaminating influences
in the vicinity.
Again, communities are being aroused to the intrinsic value of
maintaining streams in a pure condition—partly because of the value
of fish and ice coming from the streams themselves, and partly on
the broad ground that watercourses belong to the country as a
whole, and must be kept pure for the sake of succeeding
generations, not spoiled for them on account of the selfishness of a
few at the present time.
Thus it is that to-day the problem of sewage disposal, while arousing
general interest, is recognized as one which requires more than the
common sense of an average person, that the force and principles
involved are understood to be not those in common use, and that,
for successful disposal of sewage, special knowledge and judgment
are required.
Whatever the character of the sewage and whatever the kind of soil
available for treatment, the method of dealing with sewage most
obvious to most people has been to discharge the sewage directly
into the nearest watercourse. This has been the practice of cities as
well as of individual houses in the past, and the practice is very
difficult to check because of the economy of this method of disposal.
In many cases there is no objection to this method, and where a
large stream is available, where no use is made further downstream
of the waters for drinking purposes, and where the volume of water
in the stream is sufficient to dilute the sewage to a point where no
odors or objectionable appearances result, it would seem most
uneconomical to adopt any more complicated method of disposal
than by simply carrying the outfall pipe into the main bed of the
stream.
In New York State, and in a number of other States, the number of
which is continually increasing, such direct discharge, however, is not
permitted by law except under certain conditions. In New York State
it is required that any house, butter or cheese factory, manufacturing
establishment, or village shall obtain the permission of the State
Commissioner of Health before such a method of discharge be
adopted, and in order to obtain this permission it must be definitely
shown that the conditions of the stream are such that no reasonable
objection to this method could be urged. The policy of the various
Departments of Health in the United States is gradually becoming
more and more rigorous in the matter of prohibiting the discharge of
crude sewage into watercourses, and it is wise to make very sure
that the discharge of sewage into streams is above the suspicion of
a nuisance before adopting this as a suitable method. Rather would
it seem better to provide for some method of treatment and allow
only purified sewage to go into the stream than to run the risk of
being forced in a few years to reconstruct the entire line of outfall
pipe, with perhaps an entire reconstruction of the plumbing within
the house.
The problem of treatment is the question of so modifying the
character of a large volume of dirty water that it shall neither injure
the quality of any drinking-water into which it may be discharged,
nor cause objectionable odors, nor present disagreeable
appearances in any body of water into which it may be emptied.
In order to properly understand a reasonable method of treatment
some consideration must be given to the composition of sewage.
This is chiefly water with which is mixed a small amount of animal,
vegetable, and mineral matter. Roughly speaking, the amount of
mineral dirt is about one tablespoonful to a barrelful of water, and
the combined amount of animal and vegetable matter amounts to
another tablespoonful. It seems almost impossible that so small a
quantity of organic matter as one tablespoonful in a barrel of water
could cause offense in any way, and yet engineers, city officials, and
householders know by bitter experience that, when spread out on
the surface of the ground or when allowed to stand in pools, water
so polluted will undergo putrefaction resulting in most disagreeable
odors and in complete stagnation. The problem of sewage
treatment, then, consists in removing from the barrelful of water, the
tablespoonful of organic dirt, whether animal or vegetable, in such a
way that no odors shall be occasioned by the process and at the
same time so that the cost of the process may be a reasonable one.
Unfortunately, the greater part of this organic matter is in solution,
dissolved, like salt in water, so that, though undeniably present, it
must be removed by some process more complicated and less
obvious than that of simple straining. It would be comparatively
simple if the polluting substances remained floating or suspended in
the water. Then they could be strained out through a fine sieve or
settled out in a tank, either with or without the aid of chemicals. But
for particles in solution, straining, by itself, is useless and, while in
large plants frequent use is made of sieves as a complement to the
main process of purification, in small plants it is of so little value as
hardly to deserve consideration.
Another factor enters to lessen the value of the use of screens or
sieves in an installation for a single house. A great deal of the
organic matter found in sewers requires both agitation and time for
its subdivision into particles small enough to be acted upon in any
process of purification adopted. If a screen is used, large particles of
putrescible matter are held on the screen since not enough time has
existed to break down their mass, and thus the screen itself
becomes a most emphatic disturbance and a most objectionable
feature of the purification plant.
For efficient purification, therefore, some method of reducing and
modifying the character of organic solids, particularly those in
solution, must be selected. In seeking a method by which this may
be accomplished, scientific men found years ago that this very
process was being carried on continually by natural forces, although
at a very slow rate of purification. All organic matter, however
formed and wherever present, is subject to the natural forces of
decay. Fruits, vegetables, and meats of all kinds, exposed to the air,
rapidly lose their original character and form and in the course of
time disappear entirely. Except for this provision of nature, the
accumulation of organic wastes since the beginning of the earth’s
occupation by human beings would be so great that the earth would
be uninhabitable on account of the deposits of waste matter which
would have formed by this time. Nature, then, recognizes the need
of disposing of organic wastes, and her method is the one which
apparently must be followed by human beings if successful
treatment is to be secured.
Only a few decades ago, it was found that this process of decay was
due to the activity of very small organisms known as bacteria, and
their agency was proved by experiments which showed that if
vegetables or meat were kept free from bacteria, no decay,
fermentation, or putrefaction took place. It was proved that the air
itself was not responsible because in certain experiments air was
allowed to enter through a filtering medium fine enough to strain out
the bacteria and no decay took place, although oxygen and air were
both freely admitted. It is well understood by the housewife that
fruits can be kept indefinitely if they are cooked sufficiently to kill
any bacteria present and then sealed in bacteria-free, air-tight jars.
When such preserves spoil, it is because some bacteria were left in
the jar or have since been admitted through an imperfect top. When
decay is allowed to proceed, the obvious result is, first of all, a
softening of the material, as in the case of a rotten apple, a
liquefaction, as it is more technically known. Following that part of
the process is a gradual breaking down of the material, the residue
being of an earthy character which is assimilated by the soil into
which it falls.
The bacteria required for the putrefaction of organic matter are
among the most widely distributed of all the micro-organisms. They
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com