SlideShare a Scribd company logo
Mechanisms for
Data Quality and Validation
in Citizen Science
A. Wiggins, G. Newman, R. Stevenson & K. Crowston
Presented by Nathan Prestopnik
Motivation

 Data quality and validation are a primary concern
  for most citizen science projects
   More contributors = more opportunities for error

 There has been no review of appropriate data
  quality and validation mechanisms
   Diverse projects face similar challenges

 Contributors’ skills and scale of participation are
  important considerations in ensuring quality
Methods

 Survey
   Questionnaire with 70 items, all optional
   63 completed questionnaires representing 62 projects
   Mostly small-to-medium sized projects in US, Canada,
    UK; most focus on monitoring and observation

 Inductive development of framework
   Based on survey results and authors’ direct experience
    with citizen science projects
Survey: Resources

 FTEs: 0 – 50+
   Average: 2.4; Median: 1
   Often small fractions of several individuals’ time

 Annual budgets: $125 - $1,000,000
   Average: $105,000; Median: $35,000; Mode: $20,000
   Up to 5 different funding sources, usually grants, in-
    kind contributions (staff time), & private donations

 Age/duration: -1 to 100 years
   Average age: 13 years; Median: 9 years; Mode: 2 years
Survey: Methods Used
Method                                                n    Percentage
Expert review                                         46      77%
Photo submissions                                     24      40%
Paper data sheets submitted along with online entry   20      33%
Replication/rating by multiple participants           14      23%
QA/QC training program                                13      22%
Automatic filtering of unusual reports                11      18%
Uniform equipment                                     9       15%
Validation planned but not yet implemented            5       8%
Replication/rating, by the same participant           2       3%
Rating of established control items                   2       3%
None                                                  2       3%
Not sure/don’t know                                   2       3%
Survey:
         Combining Methods
Methods                                      n    Percentage
Single method                                10      17%
Multiple methods, up to 5 (average 2.5)      45      75%
Expert review + Automatic filtering          11      18%
Expert review + Paper data sheets            10      17%
Expert review + Photos                       14      23%
Expert review + Photos + Paper data sheets   6       10%
Expert review + Replication, multiple        10      17%
Survey:
     Resources & Methods
 Number of validation methods and staff are
  positively correlated (r2 = 0.11)
   More staffing = more supervisory capacity

 Number of validation methods and budget are
  negatively correlated (r2 = -0.15)
   If larger budgets means more contributors, this
    constrains scalability of multiple methods
   Larger projects may use fewer but more sophisticated
    mechanisms
   Suggests that human-supervised methods don’t scale
Survey:
 Other Validation Options
 “Please describe any additional validation methods
  used in your project”
   Several projects rely on personal knowledge of
    contributing individuals for data quality
     Not scientifically robust, but understandably relevant
   Most comments referred to details of expert review
     Reinforces the perceived value of expertise
   Reporting interface and associated error-checking is
    often overlooked, but provides important initial data
    verification
Choosing Mechanisms

 Data characteristics to consider when choosing
  mechanisms to ensure quality
   Accuracy and precision: taxonomic, spatial, temporal,
    etc.
   Error prevention: malfeasance (gaming the system),
    inexperience, data entry errors, etc.

 Evaluate assumptions about error and accuracy
   Where does error originate? How do mechanisms
    address this? At what step in the research process?
    How transparent is data review and outcomes? How
    much data will be reviewed? In how much detail?
Mechanisms: Protocols
Mechanism                 Process   Type/Detail
QA project plans          Before    SOP in some areas
Repeated samples/tasks    During    By multiple participants, single
                                    participant, or experts (calibration)
Tasks involving control   During    Contributions compared to known states
items
Uniform/calibrated        During    Used for measurements; cost/scale
equipment                           tradeoff; who pays?
Paper data sheets +       During    Extended details, verifying data entry
online entry*                       accuracy
Digital vouchers*         During    Photos, audio, specimens/archives
Data triangulation,       After     Corroboration from other data sources;
normalization, mining*              statistical & computer science methods
Data documentation*       After     Provide metadata about processes
Mechanisms: Participants

Mechanism                 Process   Types/Details
Participant training      Before,   Initial; Ongoing; Formal QA/QC
                          During
Participant testing       Before,   Following training; Pre/test-retest
                          During
Rating participant        During,   Unknown to participant; Known to
performance               After     participant
Filtering of unusual      During,   Automatically; Manually
reports                   After
Contacting participants   After     May alienate/educate contributors
about unusual reports
Automatic recognition     After     Techniques for image/text processing
Expert review             After     By professionals, experienced contributors,
                                    or multiple parties
Discussion

 Need to pay more attention to way that data are
  created, not just protocols but also qualities of data
  like accuracy, precision

 Clear need for quality/validation mechanisms for
  analysis, not only for data collection/processing
   Data mining techniques
   Spatio-temporal modeling

 Scalability of validation may be limited
   May need to plan different quality management
    techniques based on expected/actual project growth
Future Work

 Most projects worry more about contributor
  expertise than appropriate analysis methods
   Resources are needed to support suitable analysis
    approaches and tools

 Comparative valuation of the efficacy of the data
  quality and validation mechanisms identified
   Develop a QA/QC planning and evaluation tool

 Develop examples of appropriate data
  documentation for citizen science projects
   Necessary for peer review, data re-use
Thanks!

 Nate Prestopnik

 DataONE working group on Public Participation in
  Scientific Research

 US NSF grants 09-43049 & 11-11107
Ad

More Related Content

What's hot (6)

CRO - Clinical Vendor Oversight Webinar.
CRO - Clinical Vendor Oversight Webinar.CRO - Clinical Vendor Oversight Webinar.
CRO - Clinical Vendor Oversight Webinar.
DDi Drug Development informatics
 
Clean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLSClean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLS
Katalyst HLS
 
Risk Based Monitoring in Practice
Risk Based Monitoring in PracticeRisk Based Monitoring in Practice
Risk Based Monitoring in Practice
www.datatrak.com
 
Bab 6 Tool Support For Testing
Bab 6 Tool Support For TestingBab 6 Tool Support For Testing
Bab 6 Tool Support For Testing
lolayoriva
 
Monitoring Plan Template
Monitoring Plan TemplateMonitoring Plan Template
Monitoring Plan Template
www.datatrak.com
 
The secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trialThe secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trial
pcirnkt
 
Clean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLSClean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLS
Katalyst HLS
 
Risk Based Monitoring in Practice
Risk Based Monitoring in PracticeRisk Based Monitoring in Practice
Risk Based Monitoring in Practice
www.datatrak.com
 
Bab 6 Tool Support For Testing
Bab 6 Tool Support For TestingBab 6 Tool Support For Testing
Bab 6 Tool Support For Testing
lolayoriva
 
The secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trialThe secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trial
pcirnkt
 

Viewers also liked (8)

GeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 PresentationGeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 Presentation
Cameron Kiddle
 
E scidocdays review
E scidocdays reviewE scidocdays review
E scidocdays review
Jeffrey Demaine
 
Intellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and FutureIntellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and Future
Andrea Wiggins
 
Tales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureTales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science Cyberinfrastructure
Andrea Wiggins
 
Online Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCamsOnline Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCams
Andrea Wiggins
 
4. sistema nervioso autonomo
4. sistema nervioso autonomo4. sistema nervioso autonomo
4. sistema nervioso autonomo
Fredy Vasquez
 
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceFree as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Andrea Wiggins
 
All About me
All About meAll About me
All About me
veronicalopez
 
GeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 PresentationGeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 Presentation
Cameron Kiddle
 
Intellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and FutureIntellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and Future
Andrea Wiggins
 
Tales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureTales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science Cyberinfrastructure
Andrea Wiggins
 
Online Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCamsOnline Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCams
Andrea Wiggins
 
4. sistema nervioso autonomo
4. sistema nervioso autonomo4. sistema nervioso autonomo
4. sistema nervioso autonomo
Fredy Vasquez
 
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceFree as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Andrea Wiggins
 
Ad

Similar to Mechanisms for Data Quality and Validation in Citizen Science (20)

Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Péter Király
 
Tatiana Stebakova
Tatiana StebakovaTatiana Stebakova
Tatiana Stebakova
Ark Group Australia Pty Ltd
 
Iso 17025 question
Iso 17025 questionIso 17025 question
Iso 17025 question
Amqc Almumtaz
 
Data quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data qualityData quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data quality
JaveriaGauhar
 
Measurement And Validation
Measurement And ValidationMeasurement And Validation
Measurement And Validation
Jennifer Campbell
 
Data analysis market research
Data analysis   market researchData analysis   market research
Data analysis market research
sachinudepurkar
 
WEEK 9 - DATA COLLECTION GUIDELINES COMPACT.pptx
WEEK 9 - DATA COLLECTION GUIDELINES COMPACT.pptxWEEK 9 - DATA COLLECTION GUIDELINES COMPACT.pptx
WEEK 9 - DATA COLLECTION GUIDELINES COMPACT.pptx
noviantobudik
 
Role of Data Quality Assessment in a Project
Role of Data Quality Assessment in a ProjectRole of Data Quality Assessment in a Project
Role of Data Quality Assessment in a Project
Ferdin Joe John Joseph PhD
 
Lecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfLecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdf
AbdullahOmar64
 
INTRODUCTION TO DATA QUALITY ASSESSMENT FRAMEWORK
INTRODUCTION TO DATA QUALITY ASSESSMENT FRAMEWORKINTRODUCTION TO DATA QUALITY ASSESSMENT FRAMEWORK
INTRODUCTION TO DATA QUALITY ASSESSMENT FRAMEWORK
KevinChen787672
 
introduction to data quality assessment.
introduction to data quality assessment.introduction to data quality assessment.
introduction to data quality assessment.
munnashabani2
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
Knoldus Inc.
 
Data validation in the Digital Age
Data validation in the Digital AgeData validation in the Digital Age
Data validation in the Digital Age
J T "Tom" Johnson
 
DATA PROCESSING EDITING^J CODING^Jclassification.pptx
DATA PROCESSING EDITING^J CODING^Jclassification.pptxDATA PROCESSING EDITING^J CODING^Jclassification.pptx
DATA PROCESSING EDITING^J CODING^Jclassification.pptx
yenim99539
 
A Machine learning based Data Quality Analysis Approach
A Machine learning based Data Quality Analysis ApproachA Machine learning based Data Quality Analysis Approach
A Machine learning based Data Quality Analysis Approach
Nandeep Nagarkar
 
Unit 9c. Data Collection tools.pptx
Unit 9c. Data Collection tools.pptxUnit 9c. Data Collection tools.pptx
Unit 9c. Data Collection tools.pptx
shakirRahman10
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And Integrity
Gerrit Klaschke, CSM
 
#CitSciNZ2018 - Andrea Wiggins (University of Nebraska) Fitness for Intended ...
#CitSciNZ2018 - Andrea Wiggins (University of Nebraska) Fitness for Intended ...#CitSciNZ2018 - Andrea Wiggins (University of Nebraska) Fitness for Intended ...
#CitSciNZ2018 - Andrea Wiggins (University of Nebraska) Fitness for Intended ...
NZ Landcare Trust
 
Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...
Stats Statswork
 
Quality Analyst related job
Quality Analyst related job Quality Analyst related job
Quality Analyst related job
Er. rahul abhishek
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Péter Király
 
Data quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data qualityData quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data quality
JaveriaGauhar
 
Data analysis market research
Data analysis   market researchData analysis   market research
Data analysis market research
sachinudepurkar
 
WEEK 9 - DATA COLLECTION GUIDELINES COMPACT.pptx
WEEK 9 - DATA COLLECTION GUIDELINES COMPACT.pptxWEEK 9 - DATA COLLECTION GUIDELINES COMPACT.pptx
WEEK 9 - DATA COLLECTION GUIDELINES COMPACT.pptx
noviantobudik
 
Lecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfLecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdf
AbdullahOmar64
 
INTRODUCTION TO DATA QUALITY ASSESSMENT FRAMEWORK
INTRODUCTION TO DATA QUALITY ASSESSMENT FRAMEWORKINTRODUCTION TO DATA QUALITY ASSESSMENT FRAMEWORK
INTRODUCTION TO DATA QUALITY ASSESSMENT FRAMEWORK
KevinChen787672
 
introduction to data quality assessment.
introduction to data quality assessment.introduction to data quality assessment.
introduction to data quality assessment.
munnashabani2
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
Knoldus Inc.
 
Data validation in the Digital Age
Data validation in the Digital AgeData validation in the Digital Age
Data validation in the Digital Age
J T "Tom" Johnson
 
DATA PROCESSING EDITING^J CODING^Jclassification.pptx
DATA PROCESSING EDITING^J CODING^Jclassification.pptxDATA PROCESSING EDITING^J CODING^Jclassification.pptx
DATA PROCESSING EDITING^J CODING^Jclassification.pptx
yenim99539
 
A Machine learning based Data Quality Analysis Approach
A Machine learning based Data Quality Analysis ApproachA Machine learning based Data Quality Analysis Approach
A Machine learning based Data Quality Analysis Approach
Nandeep Nagarkar
 
Unit 9c. Data Collection tools.pptx
Unit 9c. Data Collection tools.pptxUnit 9c. Data Collection tools.pptx
Unit 9c. Data Collection tools.pptx
shakirRahman10
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And Integrity
Gerrit Klaschke, CSM
 
#CitSciNZ2018 - Andrea Wiggins (University of Nebraska) Fitness for Intended ...
#CitSciNZ2018 - Andrea Wiggins (University of Nebraska) Fitness for Intended ...#CitSciNZ2018 - Andrea Wiggins (University of Nebraska) Fitness for Intended ...
#CitSciNZ2018 - Andrea Wiggins (University of Nebraska) Fitness for Intended ...
NZ Landcare Trust
 
Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...
Stats Statswork
 
Ad

More from Andrea Wiggins (20)

Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Andrea Wiggins
 
Online Communities in Citizen Science
Online Communities in Citizen ScienceOnline Communities in Citizen Science
Online Communities in Citizen Science
Andrea Wiggins
 
Citizen Science Phenotypes
Citizen Science PhenotypesCitizen Science Phenotypes
Citizen Science Phenotypes
Andrea Wiggins
 
The Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen ScienceThe Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen Science
Andrea Wiggins
 
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Andrea Wiggins
 
Data Management for Citizen Science
Data Management for Citizen ScienceData Management for Citizen Science
Data Management for Citizen Science
Andrea Wiggins
 
With Great Data Comes Great Responsibility
With Great Data Comes Great ResponsibilityWith Great Data Comes Great Responsibility
With Great Data Comes Great Responsibility
Andrea Wiggins
 
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Andrea Wiggins
 
Open Source & Citizen Science
Open Source & Citizen ScienceOpen Source & Citizen Science
Open Source & Citizen Science
Andrea Wiggins
 
From Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen ScienceFrom Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen Science
Andrea Wiggins
 
Motivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and IncentivesMotivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and Incentives
Andrea Wiggins
 
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themesData Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Andrea Wiggins
 
Secondary data analysis with digital trace data
Secondary data analysis with digital trace dataSecondary data analysis with digital trace data
Secondary data analysis with digital trace data
Andrea Wiggins
 
Open Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceOpen Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen Science
Andrea Wiggins
 
Reclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS ProjectsReclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS Projects
Andrea Wiggins
 
Crowdsourcing Science
Crowdsourcing ScienceCrowdsourcing Science
Crowdsourcing Science
Andrea Wiggins
 
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen ScienceDistributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
Andrea Wiggins
 
Designing Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen ScienceDesigning Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen Science
Andrea Wiggins
 
National Park System Property Designations
National Park System Property DesignationsNational Park System Property Designations
National Park System Property Designations
Andrea Wiggins
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
Andrea Wiggins
 
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Andrea Wiggins
 
Online Communities in Citizen Science
Online Communities in Citizen ScienceOnline Communities in Citizen Science
Online Communities in Citizen Science
Andrea Wiggins
 
Citizen Science Phenotypes
Citizen Science PhenotypesCitizen Science Phenotypes
Citizen Science Phenotypes
Andrea Wiggins
 
The Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen ScienceThe Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen Science
Andrea Wiggins
 
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Andrea Wiggins
 
Data Management for Citizen Science
Data Management for Citizen ScienceData Management for Citizen Science
Data Management for Citizen Science
Andrea Wiggins
 
With Great Data Comes Great Responsibility
With Great Data Comes Great ResponsibilityWith Great Data Comes Great Responsibility
With Great Data Comes Great Responsibility
Andrea Wiggins
 
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Andrea Wiggins
 
Open Source & Citizen Science
Open Source & Citizen ScienceOpen Source & Citizen Science
Open Source & Citizen Science
Andrea Wiggins
 
From Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen ScienceFrom Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen Science
Andrea Wiggins
 
Motivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and IncentivesMotivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and Incentives
Andrea Wiggins
 
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themesData Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Andrea Wiggins
 
Secondary data analysis with digital trace data
Secondary data analysis with digital trace dataSecondary data analysis with digital trace data
Secondary data analysis with digital trace data
Andrea Wiggins
 
Open Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceOpen Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen Science
Andrea Wiggins
 
Reclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS ProjectsReclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS Projects
Andrea Wiggins
 
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen ScienceDistributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
Andrea Wiggins
 
Designing Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen ScienceDesigning Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen Science
Andrea Wiggins
 
National Park System Property Designations
National Park System Property DesignationsNational Park System Property Designations
National Park System Property Designations
Andrea Wiggins
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
Andrea Wiggins
 

Recently uploaded (20)

Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Gary Arora
 
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptxIn-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
aptyai
 
TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc Webinar: Cross-Border Data Transfers in 2025TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc
 
Building a research repository that works by Clare Cady
Building a research repository that works by Clare CadyBuilding a research repository that works by Clare Cady
Building a research repository that works by Clare Cady
UXPA Boston
 
RFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdfRFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdf
EnCStore Private Limited
 
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
UXPA Boston
 
React Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for SuccessReact Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for Success
Amelia Swank
 
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdfGoogle DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
derrickjswork
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More MachinesRefactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Leon Anavi
 
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT StrategyRisk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
john823664
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Scientific Large Language Models in Multi-Modal Domains
Scientific Large Language Models in Multi-Modal DomainsScientific Large Language Models in Multi-Modal Domains
Scientific Large Language Models in Multi-Modal Domains
syedanidakhader1
 
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Vasileios Komianos
 
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
UXPA Boston
 
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UXPA Boston
 
Accommodating Neurodiverse Users Online (Global Accessibility Awareness Day 2...
Accommodating Neurodiverse Users Online (Global Accessibility Awareness Day 2...Accommodating Neurodiverse Users Online (Global Accessibility Awareness Day 2...
Accommodating Neurodiverse Users Online (Global Accessibility Awareness Day 2...
User Vision
 
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
Toru Tamaki
 
Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.
marketing943205
 
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Gary Arora
 
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptxIn-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
aptyai
 
TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc Webinar: Cross-Border Data Transfers in 2025TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc Webinar: Cross-Border Data Transfers in 2025
TrustArc
 
Building a research repository that works by Clare Cady
Building a research repository that works by Clare CadyBuilding a research repository that works by Clare Cady
Building a research repository that works by Clare Cady
UXPA Boston
 
RFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdfRFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdf
EnCStore Private Limited
 
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
UXPA Boston
 
React Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for SuccessReact Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for Success
Amelia Swank
 
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdfGoogle DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
derrickjswork
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More MachinesRefactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Leon Anavi
 
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT StrategyRisk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
john823664
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Scientific Large Language Models in Multi-Modal Domains
Scientific Large Language Models in Multi-Modal DomainsScientific Large Language Models in Multi-Modal Domains
Scientific Large Language Models in Multi-Modal Domains
syedanidakhader1
 
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Vasileios Komianos
 
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
UXPA Boston
 
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UXPA Boston
 
Accommodating Neurodiverse Users Online (Global Accessibility Awareness Day 2...
Accommodating Neurodiverse Users Online (Global Accessibility Awareness Day 2...Accommodating Neurodiverse Users Online (Global Accessibility Awareness Day 2...
Accommodating Neurodiverse Users Online (Global Accessibility Awareness Day 2...
User Vision
 
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
Toru Tamaki
 
Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.
marketing943205
 

Mechanisms for Data Quality and Validation in Citizen Science

  • 1. Mechanisms for Data Quality and Validation in Citizen Science A. Wiggins, G. Newman, R. Stevenson & K. Crowston Presented by Nathan Prestopnik
  • 2. Motivation  Data quality and validation are a primary concern for most citizen science projects  More contributors = more opportunities for error  There has been no review of appropriate data quality and validation mechanisms  Diverse projects face similar challenges  Contributors’ skills and scale of participation are important considerations in ensuring quality
  • 3. Methods  Survey  Questionnaire with 70 items, all optional  63 completed questionnaires representing 62 projects  Mostly small-to-medium sized projects in US, Canada, UK; most focus on monitoring and observation  Inductive development of framework  Based on survey results and authors’ direct experience with citizen science projects
  • 4. Survey: Resources  FTEs: 0 – 50+  Average: 2.4; Median: 1  Often small fractions of several individuals’ time  Annual budgets: $125 - $1,000,000  Average: $105,000; Median: $35,000; Mode: $20,000  Up to 5 different funding sources, usually grants, in- kind contributions (staff time), & private donations  Age/duration: -1 to 100 years  Average age: 13 years; Median: 9 years; Mode: 2 years
  • 5. Survey: Methods Used Method n Percentage Expert review 46 77% Photo submissions 24 40% Paper data sheets submitted along with online entry 20 33% Replication/rating by multiple participants 14 23% QA/QC training program 13 22% Automatic filtering of unusual reports 11 18% Uniform equipment 9 15% Validation planned but not yet implemented 5 8% Replication/rating, by the same participant 2 3% Rating of established control items 2 3% None 2 3% Not sure/don’t know 2 3%
  • 6. Survey: Combining Methods Methods n Percentage Single method 10 17% Multiple methods, up to 5 (average 2.5) 45 75% Expert review + Automatic filtering 11 18% Expert review + Paper data sheets 10 17% Expert review + Photos 14 23% Expert review + Photos + Paper data sheets 6 10% Expert review + Replication, multiple 10 17%
  • 7. Survey: Resources & Methods  Number of validation methods and staff are positively correlated (r2 = 0.11)  More staffing = more supervisory capacity  Number of validation methods and budget are negatively correlated (r2 = -0.15)  If larger budgets means more contributors, this constrains scalability of multiple methods  Larger projects may use fewer but more sophisticated mechanisms  Suggests that human-supervised methods don’t scale
  • 8. Survey: Other Validation Options  “Please describe any additional validation methods used in your project”  Several projects rely on personal knowledge of contributing individuals for data quality  Not scientifically robust, but understandably relevant  Most comments referred to details of expert review  Reinforces the perceived value of expertise  Reporting interface and associated error-checking is often overlooked, but provides important initial data verification
  • 9. Choosing Mechanisms  Data characteristics to consider when choosing mechanisms to ensure quality  Accuracy and precision: taxonomic, spatial, temporal, etc.  Error prevention: malfeasance (gaming the system), inexperience, data entry errors, etc.  Evaluate assumptions about error and accuracy  Where does error originate? How do mechanisms address this? At what step in the research process? How transparent is data review and outcomes? How much data will be reviewed? In how much detail?
  • 10. Mechanisms: Protocols Mechanism Process Type/Detail QA project plans Before SOP in some areas Repeated samples/tasks During By multiple participants, single participant, or experts (calibration) Tasks involving control During Contributions compared to known states items Uniform/calibrated During Used for measurements; cost/scale equipment tradeoff; who pays? Paper data sheets + During Extended details, verifying data entry online entry* accuracy Digital vouchers* During Photos, audio, specimens/archives Data triangulation, After Corroboration from other data sources; normalization, mining* statistical & computer science methods Data documentation* After Provide metadata about processes
  • 11. Mechanisms: Participants Mechanism Process Types/Details Participant training Before, Initial; Ongoing; Formal QA/QC During Participant testing Before, Following training; Pre/test-retest During Rating participant During, Unknown to participant; Known to performance After participant Filtering of unusual During, Automatically; Manually reports After Contacting participants After May alienate/educate contributors about unusual reports Automatic recognition After Techniques for image/text processing Expert review After By professionals, experienced contributors, or multiple parties
  • 12. Discussion  Need to pay more attention to way that data are created, not just protocols but also qualities of data like accuracy, precision  Clear need for quality/validation mechanisms for analysis, not only for data collection/processing  Data mining techniques  Spatio-temporal modeling  Scalability of validation may be limited  May need to plan different quality management techniques based on expected/actual project growth
  • 13. Future Work  Most projects worry more about contributor expertise than appropriate analysis methods  Resources are needed to support suitable analysis approaches and tools  Comparative valuation of the efficacy of the data quality and validation mechanisms identified  Develop a QA/QC planning and evaluation tool  Develop examples of appropriate data documentation for citizen science projects  Necessary for peer review, data re-use
  • 14. Thanks!  Nate Prestopnik  DataONE working group on Public Participation in Scientific Research  US NSF grants 09-43049 & 11-11107

Editor's Notes

  • #6: Rating = classification or judgment tasks, admittedly not the clearest wording, but no one corrected this in text responsesPercentage = percentage of responding projects that use each method
  • #7: Percentage = Percentage of responding projects that use this combination of methodsThere were a few other combinations that a handful of projects used; these were the dominant ones.Surprised to see so many with photos, as they are hard to use and store, and the frequency of using paper data sheets
  • #8: Note that we did ask about numbers of contributions, but the units of contribution for each project (and even the way they count volunteers) were so different that they couldn’t be used for analysis
  • #11: Split framework of mechanisms in two for ease of viewing; these are methods that address the protocol as the presumed source of errorStarred items address errors arising from both protocols and participants
  • #12: These methods all address expected errors form participants, focusing primarily on skill evaluation and filtering or review for unusual reports