Open data:
Benefits for the researcher,
Benefits for Society
Kevin Ashley
Digital Curation Centre
www.dcc.ac.uk
@kevingashley
Kevin.ashley@ed.ac.uk
Reusable with attribution: CC-BY The DCC is supported by Jisc
A summary
• Why data reuse ?
• What stops us ?
• Related issues – software & methods
• The case for reuse - again
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 2
An alternative summary
Being Selfish
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 3
What’s
possible now
… and still
benefiting others
Being Just Good
Enough
Thanks to:
Neil Chue Hong (@npch), Software Sustainability
Institute
ORCID: 0000-0002-8876-7606
David Flanders (@dfflanders), Dr Steven Manos
(DrStevenManos)
University of Melbourne.
All my colleagues at the DCC
Cameron Neylon (@CameronNeylon)
My home – the DCC
• Mission – to
increase capability
and capacity for
research data
services in UK
institutions
• Not just a UK
problem – an
international one
• Training, shared
services, guidance,
policy, standards,
futures
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 4
DATA REUSE HAPPENS – AND NOT
ALWAYS IN THE WAY YOU EXPECT
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 5
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 6
The Old
weather
project
Data for
research,
not from
research
Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The 19th-century ships logs that help us model
climate change
• The ‘noise’ from research radar that mapped
dust from Eyjafjallajökull
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 7
Should all data be open?
• NO
• Many reasons – most to do with human
subjects
• But data existence should always be open
• Allows discovery & negotiation on use
• Avoids pointless replication
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 8
Kevin Ashley –ORD2015 - CC-
BY 9
Some conundrums
• Releasing genome data is OK when it’s:
– An identified human subject
– An anonymous human subject
– Your pet dog
– Another mammal
– An insect
– A plant
– A virus
2015-05-28
Data reuse - messages
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 10
Often your data tells
stories that your
publications do not
Not all data comes from
other researchers
One person’s noise is
another person’s signal
Discipline-bounded data
discovery doesn’t give us
all we need or want
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 11
Why care?
• Data is expensive – an investment
• Reuse:
– More research
– Teaching & Learning
– Planning
• Impact – with or without publication
• Accountability
• Legal & regulatory requirements
Why does this matter?
• Research quality
– How close can we get to
the truth?
• Research speed
– How quickly can we get
to the truth?
• Research finance
– How much does the
truth cost?
• Improving one or more
of these is of interest to
all actors:
• Researchers as data
creators
• Researchers as data
reusers
• Research institutions
• Funders – hence
government and society
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 12
G8UK - Endorses
OA
Open Data
Charter
Policy Paper
18 June 2013
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 13
164 universities in UK*
*2011 HESA data
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 14
71 (43%) > 5% research income
115 (70%) > £1m income from research
£4.4 billion total
research grants
=~PLN 26.6 billion
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 15
Business case for UK investment in
data reuse
• National infrastructure costs £1.5m/year
• 5 years before data reuse is fully active
• 10,000 datasets per year captured
• 1 in 100 datasets reused each year
• £30,000 saved each time data is reused
• Saving: £3m/year – twice the running cost
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 16
http://www.flickr.com/photos/sethw/113073189/
95% of research
results are
never published
Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 17
http://flickr.com/photos/heymans/480396810/
If a million postdocs
repeat a million
experiments…
Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 18
http://flickr.com/photos/cliche/120070310/
And 25% of those
don’t work…
Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 19
…how much taxpayer’s
money is that?
http://flickr.com/photos/luismimunoznajar/2093185804/
Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 20
More benefits: patient safety
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 21
… and institutional reputation
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 22
BUT WHAT ABOUT ME
BEING SELFISH?
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 23
Funders are making demands
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 24
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 25
Findable, citable data has value
• Important to link publications to data (and vice versa)
• Increases citations – of data & publication
• Increases reuse (hence value)
• But effects exist even without publication, if data is:
– Archived
– Citable
– Discoverable
• All benefit – researcher; institution; publisher
Citability
• Making data available increases citations
• Everyone – academic, funder, institution –
loves citations
• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 26
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1
http://dx.doi.org/10.7287/peerj.preprints.1v1
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.
http://hdl.handle.net/2027.42/78307
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
Traditional skills can win
• Google Flu gets it wrong:
• Laze, D., Kennedy, R., King, G., and Vespignani, A. (2014). The Parable of
Google Flu: Traps in Big Data Analysis. Science, 343, Forthcoming.
• The data tells us why:
• Lazer, David; Kennedy, Ryan; King, Gary; Vespignani, Alessandro, 2014,
"Replication data for: The Parable of Google Flu: Traps in Big Data
Analysis", http://dx.doi.org/10.7910/DVN/24823
UNF:5:BJh9WzZQNEeSEpV3EWs+xg== IQSS Dataverse Network
[Distributor] V1 [Version]
• Personalisation ; suggested searches; other UI
changes
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 27
41 datasets –
none bigger than
1 Mbyte
Data made
available before
paper was
published –
result was
immediate
impact
What stops data reuse
• Loss
• Destruction
• Pride
• Gluttony
• Ineptitude
• Concealment
• Bureaucracy
• Complexity
• Procrastination
• Lack of potential
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 28
Excuses – and responses
• “People will ask questions”
– So use a data centre or repository
• “It will be misinterpreted”
– Stuff happens. Also, openness encourages correction
• “It’s not interesting”
– Let others be the judge – your noise is my signal
• “I might get another paper out of it”
– Up to a point. We might get more research out of it
• “I don’t have permission”
– A real problem. But solvable at senior level
• “It’s too bad/complicated” –see above
• “It’s not a priority”
– Unfortunately, funders are making it so. But if you looked at the
evidence, it would be your priority as well
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 29
See e.g. Carly Strasser’s blog:
http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
Why open software?
• Quicker start
• Better flexibility
• Improved robustness
• Increases collaborators
• Greater research impact
• Easier to work with industry
• No added cost
– Caveat: over what you should already be doing
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 30
What’s
software
got to do
with my
research?
31
Slide: Neil Chue Hong
The research community
relies on software
Do you use research
software?
What would happen to your
research without software
Survey of researchers from 15 UK Russell Group universities conducted
by SSI between August - October 2014. DOI: 10.5281/zenodo.14809
56% Develop their
own software
71%
Have no formal
software training2015-05-28 Kevin Ashley –ORD2015 - CC-BY 32
Slide: Neil Chue Hong
The modern researcher…
• … worries about:
– Data management
and analysis
– Reproducible
research
– Scalable simulations
– Integration of
models and
workflows
– Collaboration
Picture of Otto Stern from Emilio Segre Visual Archives.
Copyright American Institute of Physics.
Used with permission
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 33
Slide: Neil Chue Hong
Open software is good for science and
good for you
• Benefits
– More collaborators
– More citations
– More benefit to others
– Increased robustness
– Increased reuse
– Reduced replication of effort
• Far more than the drawbacks
– More structured collaboration
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 34
Slide: Neil Chue Hong
Improve your research impact
Vandewalle (2012) DOI: 10.1109/MCSE.2012.632015-05-28 Kevin Ashley –ORD2015 - CC-BY 35
Slide: Neil Chue Hong
, it’
Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/,
Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)
Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 36
software engineering (vs) software/data carpentry
Software carpenters craft
their research atop the
digital infrastructure to
produce novel science.
Software engineers
maintain, own and
operate digital
infrastructure.
Teaching
researchers
to code
Community exemplar:
#SWCarpentry
F
Publishing data & software
papers is easy
http://openresearchsoftware.metajnl.com
http://bit.ly/softwarejournals
http://dx.doi.org/10.6084/m9.figshare.942289
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 38
Slide: Neil Chue Hong
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 39
THERE’S HELP FOR DATA SHARING
AS WELL
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 40
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 41
Roles and
Responsibilities
What data to keep
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 42
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 43
How to cite data
What data to keep
Acquire research data skills
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 44
Finally…
• Sharing data is good for you
• It’s good for all of us
• It isn’t as hard as you think – start today!
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 45
It’s amazing what people will share…
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 46
Data reuse from Hubble
2015-05-28 Kevin Ashley –ORD2015 - CC-BY 47

More Related Content

PPTX
Use and reuse: research data locally & globally #esipfed
PPTX
Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM...
PPTX
My data, your data, our data - increasing data value through reuse (Eurocris2...
PDF
2015/12/16 Participatory Urban Sensing
PPTX
Tracking research and research systems
PDF
From AirBox to Smart City: where are we and what's next?
PPTX
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
PDF
20170410 CENTRA2 meeting - AirBox
Use and reuse: research data locally & globally #esipfed
Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM...
My data, your data, our data - increasing data value through reuse (Eurocris2...
2015/12/16 Participatory Urban Sensing
Tracking research and research systems
From AirBox to Smart City: where are we and what's next?
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
20170410 CENTRA2 meeting - AirBox

What's hot (17)

PDF
Data Science Tools
PDF
AirBox: a participatory ecosystem for PM2.5 monitoring
PPTX
How you can enhance the efficiency and effectiveness of teaching and learning...
PPTX
How can machine learning and AI in the cloud improve research?
PDF
Towards a better measure of business proximity: Topic modeling for industry i...
PPTX
Jisc learning analytics update-feb 2016
PPTX
20160301 23 Research Data Things
PPTX
City protocol presentation living lab
PDF
Open Access Pathfinder Case Study - Lincoln
PDF
Turning Learning into Numbers - A Learning Analytics Framework
PPTX
Parallel session: international
PDF
New systems for measuring research impact
PPTX
Wikipedia l
PPTX
Business intelligence: making more informed decisions - Jisc Digifest 2016
PDF
Workshop_CITA2015
PPTX
Challenges in end-to-end performance
PPT
Big Data Expo 2015 - Data Science Center Eindhove
Data Science Tools
AirBox: a participatory ecosystem for PM2.5 monitoring
How you can enhance the efficiency and effectiveness of teaching and learning...
How can machine learning and AI in the cloud improve research?
Towards a better measure of business proximity: Topic modeling for industry i...
Jisc learning analytics update-feb 2016
20160301 23 Research Data Things
City protocol presentation living lab
Open Access Pathfinder Case Study - Lincoln
Turning Learning into Numbers - A Learning Analytics Framework
Parallel session: international
New systems for measuring research impact
Wikipedia l
Business intelligence: making more informed decisions - Jisc Digifest 2016
Workshop_CITA2015
Challenges in end-to-end performance
Big Data Expo 2015 - Data Science Center Eindhove
Ad

Similar to Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society (20)

PPTX
Supporting open research - how to help your researchers - Vitae15
PPTX
Open data for open scholarship - where we are
PPTX
Open Science
PDF
Dataverse in the Universe of Data by Christine L. Borgman
PPTX
University of Northumbria Research
PPTX
The Challenges of Making Data Travel, by Sabina Leonelli
PPTX
Open data: Enhancing preservation, reproducibility, and innovation
PDF
The OpenCon Intro to Open Data
PDF
The State of Open Research Data
PDF
The State of Open Research Data - OpenCon 2014
PDF
How to overcome obstacles to data publication: Issues, requirements, and good...
PPTX
Workshop 4: Open Science & Open Data for Librarians/Ina Smith
PDF
Open data in ubi systems research - introduction to open science and open dat...
PDF
Open Research Data: Licensing | Standards | Future
PPTX
Open science, open data - FOSTER training, Potsdam
PPTX
How practising open research can benefit you
PPTX
Winning Horizon 2020 with Open Science
PPTX
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
PPTX
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
PPTX
ODiP: Open data and the scientific gift culture
Supporting open research - how to help your researchers - Vitae15
Open data for open scholarship - where we are
Open Science
Dataverse in the Universe of Data by Christine L. Borgman
University of Northumbria Research
The Challenges of Making Data Travel, by Sabina Leonelli
Open data: Enhancing preservation, reproducibility, and innovation
The OpenCon Intro to Open Data
The State of Open Research Data
The State of Open Research Data - OpenCon 2014
How to overcome obstacles to data publication: Issues, requirements, and good...
Workshop 4: Open Science & Open Data for Librarians/Ina Smith
Open data in ubi systems research - introduction to open science and open dat...
Open Research Data: Licensing | Standards | Future
Open science, open data - FOSTER training, Potsdam
How practising open research can benefit you
Winning Horizon 2020 with Open Science
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
ODiP: Open data and the scientific gift culture
Ad

More from Platforma Otwartej Nauki (20)

PPT
Umowy dot. autorskich praw majątkowych w praktyce wydawców książek naukowych
PDF
Prawne aspekty otwartego dostępu
PDF
Monografie Naukowe - Uniwersytet Śląski
PDF
DSpace - doświadczenia Repozytorium Uniwersytetu Łódzkiego
PDF
Platforma czasopism Wydawnictwa Uniwersytetu Łódzkiego
PDF
Biblioteka Nauki - techniczne możliwości wymiany metadanych
PDF
Monografie w Bibliotece Nauki
PPTX
Open Science Platform
PDF
OpenAIRE Services for Open Science
PPTX
Publikacje Ośrodka Badawczego Facta Ficta w Bibliotece Nauki
PDF
PRESSto Platfoma otwartych czasopism naukowych UAM
PPTX
Publikacje Instytutu Historii Ukrainy w Bibliotece Nauki
PDF
Polska Akademia Nauk a otwarta nauka
PDF
Otwarty dostęp do publikacji naukowych GUS - doświadczenia i wyzwania
PPTX
Making Open Access Book Funding Work Fairly
PPTX
UCL Press. The UK's first fully open access university press
PPTX
Funding open access books at Open Book Publishers
PPTX
Arianna Becerril García – Redalyc: A platform to advance non-commercial Open ...
PPTX
Abel L Packer – SciELO advances as an Open Science program
PDF
Open Data - zarządzanie danymi w projektach badawczych NCN
Umowy dot. autorskich praw majątkowych w praktyce wydawców książek naukowych
Prawne aspekty otwartego dostępu
Monografie Naukowe - Uniwersytet Śląski
DSpace - doświadczenia Repozytorium Uniwersytetu Łódzkiego
Platforma czasopism Wydawnictwa Uniwersytetu Łódzkiego
Biblioteka Nauki - techniczne możliwości wymiany metadanych
Monografie w Bibliotece Nauki
Open Science Platform
OpenAIRE Services for Open Science
Publikacje Ośrodka Badawczego Facta Ficta w Bibliotece Nauki
PRESSto Platfoma otwartych czasopism naukowych UAM
Publikacje Instytutu Historii Ukrainy w Bibliotece Nauki
Polska Akademia Nauk a otwarta nauka
Otwarty dostęp do publikacji naukowych GUS - doświadczenia i wyzwania
Making Open Access Book Funding Work Fairly
UCL Press. The UK's first fully open access university press
Funding open access books at Open Book Publishers
Arianna Becerril García – Redalyc: A platform to advance non-commercial Open ...
Abel L Packer – SciELO advances as an Open Science program
Open Data - zarządzanie danymi w projektach badawczych NCN

Recently uploaded (20)

PPT
ZooLec Chapter 13 (Digestive System).ppt
PDF
Glycolysis by Rishikanta Usham, Dhanamanjuri University
PDF
software engineering for computer science
PPTX
Cutaneous tuberculosis Dermatology
PPTX
Spectroscopic Techniques for M Tech Civil Engineerin .pptx
PDF
Unit Four Lesson in Carbohydrates chemistry
PPT
Chapter 52 introductory biology course Camp
PPTX
Thyroid disorders presentation for MBBS.pptx
PPTX
1. (Teknik) Atoms, Molecules, and Ions.pptx
PPTX
The Electromagnetism Wave Spectrum. pptx
PPTX
Contact Lens Dr Hari.pptx presentation powerpoint
PDF
Microplastics: Environmental Impact and Remediation Strategies
PDF
2024_PohleJellKlug_CambrianPlectronoceratidsAustralia.pdf
PDF
LEUCEMIA LINFOBLÁSTICA AGUDA EN NIÑOS. Guías NCCN 2020-desbloqueado.pdf
PDF
Human Anatomy (Anatomy and Physiology A)
PDF
Sujay Rao Mandavilli Variable logic FINAL FINAL FINAL FINAL FINAL.pdf
PDF
Telemedicine: Transforming Healthcare Delivery in Remote Areas (www.kiu.ac.ug)
PDF
SOCIAL PSYCHOLOGY_ CHAPTER 2.pdf- the self in a social world
PDF
TOPIC-1-Introduction-to-Bioinformatics_for dummies
PPTX
23ME402 Materials and Metallurgy- PPT.pptx
ZooLec Chapter 13 (Digestive System).ppt
Glycolysis by Rishikanta Usham, Dhanamanjuri University
software engineering for computer science
Cutaneous tuberculosis Dermatology
Spectroscopic Techniques for M Tech Civil Engineerin .pptx
Unit Four Lesson in Carbohydrates chemistry
Chapter 52 introductory biology course Camp
Thyroid disorders presentation for MBBS.pptx
1. (Teknik) Atoms, Molecules, and Ions.pptx
The Electromagnetism Wave Spectrum. pptx
Contact Lens Dr Hari.pptx presentation powerpoint
Microplastics: Environmental Impact and Remediation Strategies
2024_PohleJellKlug_CambrianPlectronoceratidsAustralia.pdf
LEUCEMIA LINFOBLÁSTICA AGUDA EN NIÑOS. Guías NCCN 2020-desbloqueado.pdf
Human Anatomy (Anatomy and Physiology A)
Sujay Rao Mandavilli Variable logic FINAL FINAL FINAL FINAL FINAL.pdf
Telemedicine: Transforming Healthcare Delivery in Remote Areas (www.kiu.ac.ug)
SOCIAL PSYCHOLOGY_ CHAPTER 2.pdf- the self in a social world
TOPIC-1-Introduction-to-Bioinformatics_for dummies
23ME402 Materials and Metallurgy- PPT.pptx

Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

  • 1. Open data: Benefits for the researcher, Benefits for Society Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley [email protected] Reusable with attribution: CC-BY The DCC is supported by Jisc
  • 2. A summary • Why data reuse ? • What stops us ? • Related issues – software & methods • The case for reuse - again 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 2
  • 3. An alternative summary Being Selfish 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 3 What’s possible now … and still benefiting others Being Just Good Enough Thanks to: Neil Chue Hong (@npch), Software Sustainability Institute ORCID: 0000-0002-8876-7606 David Flanders (@dfflanders), Dr Steven Manos (DrStevenManos) University of Melbourne. All my colleagues at the DCC Cameron Neylon (@CameronNeylon)
  • 4. My home – the DCC • Mission – to increase capability and capacity for research data services in UK institutions • Not just a UK problem – an international one • Training, shared services, guidance, policy, standards, futures 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 4
  • 5. DATA REUSE HAPPENS – AND NOT ALWAYS IN THE WAY YOU EXPECT 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 5
  • 6. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 6 The Old weather project Data for research, not from research
  • 7. Data reuse stories • The palaeontologist who saved years of work with archaeological data • The 19th-century ships logs that help us model climate change • The ‘noise’ from research radar that mapped dust from Eyjafjallajökull 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 7
  • 8. Should all data be open? • NO • Many reasons – most to do with human subjects • But data existence should always be open • Allows discovery & negotiation on use • Avoids pointless replication 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 8
  • 9. Kevin Ashley –ORD2015 - CC- BY 9 Some conundrums • Releasing genome data is OK when it’s: – An identified human subject – An anonymous human subject – Your pet dog – Another mammal – An insect – A plant – A virus 2015-05-28
  • 10. Data reuse - messages 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 10 Often your data tells stories that your publications do not Not all data comes from other researchers One person’s noise is another person’s signal Discipline-bounded data discovery doesn’t give us all we need or want
  • 11. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 11 Why care? • Data is expensive – an investment • Reuse: – More research – Teaching & Learning – Planning • Impact – with or without publication • Accountability • Legal & regulatory requirements
  • 12. Why does this matter? • Research quality – How close can we get to the truth? • Research speed – How quickly can we get to the truth? • Research finance – How much does the truth cost? • Improving one or more of these is of interest to all actors: • Researchers as data creators • Researchers as data reusers • Research institutions • Funders – hence government and society 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 12
  • 13. G8UK - Endorses OA Open Data Charter Policy Paper 18 June 2013 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 13
  • 14. 164 universities in UK* *2011 HESA data 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 14 71 (43%) > 5% research income 115 (70%) > £1m income from research
  • 15. £4.4 billion total research grants =~PLN 26.6 billion 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 15
  • 16. Business case for UK investment in data reuse • National infrastructure costs £1.5m/year • 5 years before data reuse is fully active • 10,000 datasets per year captured • 1 in 100 datasets reused each year • £30,000 saved each time data is reused • Saving: £3m/year – twice the running cost 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 16
  • 17. http://www.flickr.com/photos/sethw/113073189/ 95% of research results are never published Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 17
  • 18. http://flickr.com/photos/heymans/480396810/ If a million postdocs repeat a million experiments… Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 18
  • 19. http://flickr.com/photos/cliche/120070310/ And 25% of those don’t work… Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 19
  • 20. …how much taxpayer’s money is that? http://flickr.com/photos/luismimunoznajar/2093185804/ Slide: Cameron Neylon2015-05-28 Kevin Ashley –ORD2015 - CC-BY 20
  • 21. More benefits: patient safety 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 21
  • 22. … and institutional reputation 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 22
  • 23. BUT WHAT ABOUT ME BEING SELFISH? 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 23
  • 24. Funders are making demands 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 24
  • 25. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 25 Findable, citable data has value • Important to link publications to data (and vice versa) • Increases citations – of data & publication • Increases reuse (hence value) • But effects exist even without publication, if data is: – Archived – Citable – Discoverable • All benefit – researcher; institution; publisher
  • 26. Citability • Making data available increases citations • Everyone – academic, funder, institution – loves citations • Want evidence? – Alter, Pienta, Lyle – 240%, social sciences * – Piwowar, Vision – 9% (microarray data)† – Henneken, Accomazzi – 20% (astronomy) # 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 26 † Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1 * Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307 # Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
  • 27. Traditional skills can win • Google Flu gets it wrong: • Laze, D., Kennedy, R., King, G., and Vespignani, A. (2014). The Parable of Google Flu: Traps in Big Data Analysis. Science, 343, Forthcoming. • The data tells us why: • Lazer, David; Kennedy, Ryan; King, Gary; Vespignani, Alessandro, 2014, "Replication data for: The Parable of Google Flu: Traps in Big Data Analysis", http://dx.doi.org/10.7910/DVN/24823 UNF:5:BJh9WzZQNEeSEpV3EWs+xg== IQSS Dataverse Network [Distributor] V1 [Version] • Personalisation ; suggested searches; other UI changes 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 27 41 datasets – none bigger than 1 Mbyte Data made available before paper was published – result was immediate impact
  • 28. What stops data reuse • Loss • Destruction • Pride • Gluttony • Ineptitude • Concealment • Bureaucracy • Complexity • Procrastination • Lack of potential 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 28
  • 29. Excuses – and responses • “People will ask questions” – So use a data centre or repository • “It will be misinterpreted” – Stuff happens. Also, openness encourages correction • “It’s not interesting” – Let others be the judge – your noise is my signal • “I might get another paper out of it” – Up to a point. We might get more research out of it • “I don’t have permission” – A real problem. But solvable at senior level • “It’s too bad/complicated” –see above • “It’s not a priority” – Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 29 See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
  • 30. Why open software? • Quicker start • Better flexibility • Improved robustness • Increases collaborators • Greater research impact • Easier to work with industry • No added cost – Caveat: over what you should already be doing 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 30
  • 31. What’s software got to do with my research? 31 Slide: Neil Chue Hong
  • 32. The research community relies on software Do you use research software? What would happen to your research without software Survey of researchers from 15 UK Russell Group universities conducted by SSI between August - October 2014. DOI: 10.5281/zenodo.14809 56% Develop their own software 71% Have no formal software training2015-05-28 Kevin Ashley –ORD2015 - CC-BY 32 Slide: Neil Chue Hong
  • 33. The modern researcher… • … worries about: – Data management and analysis – Reproducible research – Scalable simulations – Integration of models and workflows – Collaboration Picture of Otto Stern from Emilio Segre Visual Archives. Copyright American Institute of Physics. Used with permission 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 33 Slide: Neil Chue Hong
  • 34. Open software is good for science and good for you • Benefits – More collaborators – More citations – More benefit to others – Increased robustness – Increased reuse – Reduced replication of effort • Far more than the drawbacks – More structured collaboration 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 34 Slide: Neil Chue Hong
  • 35. Improve your research impact Vandewalle (2012) DOI: 10.1109/MCSE.2012.632015-05-28 Kevin Ashley –ORD2015 - CC-BY 35 Slide: Neil Chue Hong
  • 36. , it’ Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4) Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 36
  • 37. software engineering (vs) software/data carpentry Software carpenters craft their research atop the digital infrastructure to produce novel science. Software engineers maintain, own and operate digital infrastructure. Teaching researchers to code Community exemplar: #SWCarpentry F
  • 38. Publishing data & software papers is easy http://openresearchsoftware.metajnl.com http://bit.ly/softwarejournals http://dx.doi.org/10.6084/m9.figshare.942289 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 38 Slide: Neil Chue Hong
  • 39. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 39
  • 40. THERE’S HELP FOR DATA SHARING AS WELL 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 40
  • 41. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 41
  • 42. Roles and Responsibilities What data to keep 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 42
  • 43. 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 43 How to cite data What data to keep
  • 44. Acquire research data skills 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 44
  • 45. Finally… • Sharing data is good for you • It’s good for all of us • It isn’t as hard as you think – start today! 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 45
  • 46. It’s amazing what people will share… 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 46
  • 47. Data reuse from Hubble 2015-05-28 Kevin Ashley –ORD2015 - CC-BY 47

Editor's Notes

  • #8: There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
  • #9: Medicine does, however, provide some clear reasons why we can’t just stick all research data on the internet for anyone to trawl through. When human subjects are involved there are real concerns about confidentiality. Yet what alltrials.net and other initiatives make clear is that the *existence* of the data should never be hidden. That allows it to be discovered and for negotiations to take place about its use. It avoids costly replication, which can delay scientific discovery and involve human suffering when the replication takes the form of a clinical trial.
  • #11: There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
  • #12: Kevin Ashley, DCC, UKSG Glasgow. CC-BY
  • #13: For an audience such as this, I shouldn’t have to explain why data reuse is important. But just in case, and to explain why some things have happened the way they have, I’ll describe some of the drivers. Ensuring that all research data is discoverable and reusable increases the quality of the research that we do. It can add to the data we collect ourselves and can improve the statistical rigour of our results. Exposing data to scrutiny makes it more straightforward to validate or challenge the findings of others. Making data available also improves the speed with which we can do research. If someone else has already gathered the data we need (perhaps for a different end use), we can move directly to the analysis stage of our work, saving both time and money. And saving money increases the efficiency of research. We hope that the money saved lets us do more research, but even if it doesn’t society as a whole will gain. There’s evidence behind this that I’ll come to later, but it is an effective counter to those in some universities who feel that increasing funder requirements for data management simply leads to additional costs with no gain. There is a gain in all these areas, and hence every one of the actors – researchers, their employers, their funders, and society, should be motivated to make this happen.
  • #27: Did I mention that making data available increases citations? This is a win all round. If you don’t believe me, here are three studies from three different areas that all show robust, positive correlations. The effect size varies with discipline, but we have enough evidence now that anyone who says that their area is different needs to come up with evidence to show why.
  • #30: Yet some researchers still aren’t convinced by the rhetoric. Carly Strasser at CDL has listed some of the reasons for not sharing data that she’s encountered – and here are some of my one-line responses. I’m not saying that the concerns aren’t sincere or reasonable but they can all be dealt with and some are positively misguided. The purpose of data centres, for instance, is to make data independently reusable (as stated in the OAIS standard) which relieves researchers of the burden of dealing with questions about it, at the same time as increasing the likelihood that their data will be cited.
  • #33: In the last four years, we have investigated and understood the challenges of the UK research community. Anecdotally, we had a lot of evidence for people working in this area that researchers relied on software, but there had been no studies conducted. So we did this ourselves. Two areas of interest, do you use software and possibly more important, what would happen to your research without software – this is 170,000 researchers in the UK who could not conduct their software without software. This is more than just a reliance on Word or web browsers – specialist software is written into the research workflows of people from psychology to physics, from the life sciences to literature. The reliance isn’t confined to the “traditionally” computationally intensive subjects, it’s a feature of all disciplines. This means that 140,000 researchers are relying on their own coding skills.
  • #34: http://www.flickr.com/photos/esva/2364906768
  • #37: All of these “excuses” can be seen as reasons to do the various things – ir will benefit you as well as your readers.
  • #38: FIONA method of training researcher how to code, but is taught by researchers for researchers using real research data and real problem. Researchers are desperate for more skills which allow them to craft their data in research specific ways. They don’t need to build highly scalable services (i.e. they don’t need - or want - to be professional software “engineers”), but they do need to be able to craft specific parts of their data so they can get to solution for their research (i.e. they do need to do a bit of software “carpentry”). By way of analogy, researchers need to know about as much about computing as they do about statistics. Most researchers don't have a degree in statistics, but at some point they've learned what correlation means, what statistical significance is, and so on.
  • #39: A metajournal which encourages the publication of information that encourages the reuse of software. A way of using the current tools and practices to make software better recognised.
  • #48: Many of you may be familiar with this graph from the Hubble Space telescope data archive. It tells the same story in a different way, and also tells a story about the transformation of astronomy as a discipline. In the days of photographic plates, sharing (analogue) astronomical data was difficult. Digital instruments transformed this, and some time around 2000, more research was being done with old data than with new data. I could be more specific about this if the data behind this graph was made available, incidentally!