Data Sharing Toolkit
Data Sharing Toolkit
His Highness Sheikh Mohammed bin Nesta is an innovation foundation. For us, This toolkit was created by Tom Symons and
Rashid Al Maktoum, Vice President and innovation means turning bold ideas into reality Camilla Bertoncin from Nesta in partnership
Prime Minister of the UAE, and Ruler of Dubai, and changing lives for the better. with the Smart Dubai team.
launched the Smart Dubai Initiative in 2013
We use our expertise, skills and funding in areas As part of the research process for this toolkit,
with a vision of making Dubai the happiest
where there are big challenges facing society. we held two workshops and presented our work
and smartest city on Earth.
to external experts in Dubai and London.
Nesta is based in the UK and supported by
The Smart Dubai office was formed in 2015
a financial endowment. We work with partners We’d like to thank all those who have attended
to oversee Dubai’s smart transformation
around the globe to bring bold ideas to life the workshops and contributed along
and accomplish the leadership’s vision.
to change the world for good. the way. In particular, we would like to thank
Collaborating with government and private
Thea Snow and Eddie Copeland for their insight
sector partners, Smart Dubai is consistently www.nesta.org.uk
and support.
adopting the latest technological innovations
to provide efficient, seamless, safe and This toolkit is a copyrighted work
personalised city experiences for residents of Smart Dubai.
and visitors.
www.smartdubai.ae
Data sharing toolkit 03
04
When should it be used? 05
How should it be used? 06
07
Labelling data sharing initiatives 12
Data sharing decision tool 13
14
What is the appropriate data infrastructure? 26
How is data accessed? 27
29
Checklist31
Requirements32
04
ABOUT THIS
TOOLKIT
Globally, innovation in data governance is fairly Our research has focused on how trusted data
embryonic. Recent years have seen a flurry of sharing arrangements can be formed that
new activity, but much of this remains poorly ensure data is in the right hands and value
defined or nascent. Few, if any, countries have can be extracted from it.
‘cracked the code’ of responsible and effective
data sharing initiatives and governance. We approached this work by analysing a range
of models for data sharing and boiling them down
Data’s value can be unlocked by creating trusted to the essential components that are common
and ethical mechanisms for individuals, private to all of them.
and public sector organisations to share data.
Data sharing toolkit / About this toolkit 05
— Innovators around the world interested We designed a flexible decision tool that:
in exploring data sharing initiatives
and looking for a tool to spark the right — Provides useful guidance and resources for
discussions, anticipate issues and find new private and public organisations to prepare
collaborative approaches to data sharing. for and design data sharing initiatives.
You are new to data sharing and You are considering proposing a data You already have a partnership in
would like to know more about what sharing partnership to your team, but place and want to kickstart aligning
is theoretically out there. want to have an idea of what it would expectations and incentives, and need
take to put one in place first. a tool to spark conversations across
different stakeholders.
— The first part of this toolkit provides — Start with the decision matrix. 14 ▶ — Start with the ‘Project foundations’ section.
the context, an overview of a range 29 ▶
— You can go through the material
of different data sharing initiatives
of the canvasses and dive deeper — Discuss the checklist items and then
and plenty of case studies. 07 ▶
using the additional resources in move to the more practical exercise
the ‘Useful tools’ sections. of the decision matrix. 14 ▶
— Move to the ‘Project foundations’ section. — Print each canvas on A3 (or a large piece
Do you tick all the boxes of the checklist? of paper), divide into multiple groups
29 ▶ and repeat the process if necessary
with other stakeholders to guarantee
— If not, is there something you can do
maximum representation.
to facilitate the process of change that
leads to all boxes being ticked?
07
DATA SHARING
It is now widely recognised that the value of data Yet many datasets that could help solve public
held by individuals and organisations alike can problems remain closed, proprietary or difficult
increase exponentially if it is shared and combined to find or share.
with other sources of data.
For this reason, governments, public and private
Bringing together data sources breaks down sector organisations around the world are
traditional silos and unleashes the potential experimenting with different ways of accelerating
for data to generate important and meaningful data sharing and collaboration between those
insights. Public services in many parts of the who hold valuable data and those able to deliver
world are exploring the potential of data analytics solutions to unlock the value from data.
to address public problems, using their own
data to become more efficient and effective.
Labour markets Consumer data and retail Smart cities and city data
— Future of work — Consumer sentiment and experience — Transport, mobility and urban planning
The Open Skills Project is a public–private Linking consumer confidence index and social City Brain, a partnership of the Chinese
partnership focused on providing a dynamic, media sentiment analysis is an analysis of government with the commercial platform
locally relevant, up-to-date and normalised the correlation between the official consumer Alibaba, provides real-time data from
taxonomy of skills and jobs. Its aim is to improve confidence index obtained from the MIER 750 sources to tackle problems of traffic
our understanding of the labour market and (Malaysian Institute of Economic Research) congestion, analyse energy and water
reduce frictions in the workforce data ecosystem and social media big data (via sentiment consumption patterns and identify vulnerable
by enabling a more granular common language analysis, from Twitter) on consumer purchasing residents in need of additional support.
of skills among industry, academia, government, behaviour for two types of products over
and non-profit organisations. the course of two years.
Data sharing toolkit / Data sharing 09
▶ Public Problems
— Tailoring learning materials based — Rare diseases — Air and water pollution reduction
on a student’s needs
— Genomic data mapping — Flood risk modelling
— Diagnosing strengths, weaknesses or
— Population growth forecasts — Forest change monitoring
gaps in a student’s learning experience
The Baltimore Early Childhood Data Personal health datasets shared by voluntary Fluxnet is a repository of eddy covariance
Collaborative is a partnership of Baltimore patients and research centres are increasingly measurements of carbon dioxide and water
City agencies serving young children and used to support research and preventive health vapour exchange from more than 800 active
their families, sharing data to understand care services, such as in the cases of MIDATA, and historic flux measurement sites, dispersed
the experiences of young children in Baltimore the NCI Genomic Data Commons or Healx. across most of the world’s climate space and
and how those experiences relate to later representative biomes.
educational outcomes.
Data sharing toolkit / Data sharing 10
Data breaches and privacy missteps While data sharing offers many opportunities, Given the challenges, but in light of the
regularly make headlines and there are also significant challenges to be significant opportunities outlined, there is
have recently caused a profound addressed and barriers to be overcome. a growing demand for trusted mechanisms
and widening lack of trust among Other barriers to data sharing include: for sharing data and meaningful privacy
individuals, institutions and and data protection regulations.
governments in the notion of safe — Risks associated with sharing commercially
Institutional frameworks needed to support
data sharing. sensitive information
the safe and trusted sharing and use of data
— The complexity of facilitating cross-border between multiple different organisations
In addition, a culture of risk aversion data flows are not yet well-established – there are not
in public sector agencies can mean yet clearly codified processes that facilitate
that the privacy risks are seen — Reputational concerns
responsible data sharing.
to outweigh the potential benefits. — Regulatory or legal uncertainty
The challenge facing public and private sector
— A lack of dedicated personnel to drive entities globally is how to strengthen trust
and steward such initiatives and implement effective public–private data
collaboration.
— Mixed levels of data maturity
across organisations This toolkit aims to offer some insights and
answers to tackle this challenge.
— Unclear incentives (especially when
engaging private companies)
Data sharing toolkit / Data sharing 11
Data commons Data exchanges Data trusts Open data platforms and
and markets open APIs
A spectrum of initiatives in which Usually this is a data platform Legal structures that provide Curated sets of open datasets
data is shared as a common where data is treated as independent stewardship of data. and APIs (application
resource among individuals an economic good, and programming interfaces).
or organisations, who collectively access is regulated through
decide on the rules that price mechanisms.
govern access to it.
— data.gov.uk
— Dataverse — Copenhagen–Hitachi City — Open Data Institute (ODI) — Transport for London Unified APIs
— DECODE Data Exchange data trusts pilots
— Global Forest Watch — MIDATA — Essex Centre for Data Analytics — Consumer Data Research Centre
— California Data Collaborative — Analytics Vidhya hackathon contests
Data sharing toolkit / Data sharing 12
As a very nascent field, there has context, predefined models and labels, such to think about the specific problem that sharing
been an emergence of organisations as those described above, are not useful as data is set to solve. From this problem and
specialising in researching and tools for translating theory into practice. the specific context in which the partnership
promoting specific data sharing models. is being developed, other questions around
Rather than considering whether a ‘data trust’
governance, power and access will flow.
or a ‘data collaborative’ or a ‘data commons’
To get a sense of the great variation in names is the right approach, it is much more important
and approaches, below we show the ODI Data
Access Map. This is a clear example of how
many terms and definitions can be attached
to data sharing initiatives, and it is an attempt
to try and navigate them, by crowdsourcing
definitions and case studies. These labels can
be used interchangeably depending on the
organisation in question, leading to confusion
and a lack of rigour in their application.
AB
process of designing a data sharing
14 29
initiative will involve some degree
of iteration.
14
A:
THE DECISION
MATRIX
Public sector
No access Restricted Open data Open data Private sector Individuals
Who to involve?
Private sector organisations Individuals International entities
Third sector organisations Public sector organisations
Data sharing toolkit / A: The decision matrix 16
Canvasses
This step encourages you to define the purpose This step supports consideration of which This step sets out the options for data storage
of sharing data. Defining this upfront, and parties need to be involved in the arrangement infrastructure, which enable the data exchange
having shared agreement of the purpose to make it successful. across stakeholders.
and what success looks like is a precondition
to a project’s success.
This step of the decision matrix supports This step of the matrix prompts parties This step helps you choose which form
consideration of datasets needed for the to consider where power lies, and how of data access best fits your initiative.
partnership and prompts stakeholders to the structure and roles of all parties interact
discuss the types of data that need to be shared, in the data sharing arrangement.
their form and the ethical considerations that
need to follow.
Discover new insights Unlocking innovation — Fixing potholes and identifying Optimised process efficiency
high-priority streets for and co-ordination
Amdex (Amsterdam Data Open Banking Since 2018, maintenance services
Exchange) A data exchange a regulation from the Competition Seoul Owl Bus In Seoul, South
initiative by the Amsterdam and Markets Authority mandates — Sharing garbage trucks’
real‑time location Korea, where the metro system
Economic Board, backed that UK-regulated banks allow shuts down from midnight to 5 am,
by Amsterdam Science Park authorised providers (such as the municipal government used its
and Amsterdam Data Science, licensed startups offering budgeting citizens’ late-night calls and texts
and supported by the City apps, or other banks) direct access Increased prediction
to plan routes for a new night bus
of Amsterdam. The project is still to customer account information capability and forecasting service. A telecom company (KT)
at concept phase, and aims to and data at transaction level provided the government with
collect city data held by government through APIs. The idea behind this Flowminder During the 2010 Haiti
earthquake response, Flowminder anonymous phone data, which
agencies, companies and others initiative is that it will bring more officials used to colour-code
to provide broad access to data innovation to financial services researchers pioneered the use of
de-identified data from mobile regions of the city by call volume.
for researchers, businesses, thanks to third-party developers, They then analysed the number
governments and individuals who will create new tools that will operators to follow population
displacement. As a result, mobile of passengers who were getting
in a secure marketplace. AMdEX positively impact on vulnerable on and off at each bus stop in
explores possible use cases communities’ financial inclusion. phone data is increasingly used
both in emergency contexts, the heavy-call volume regions
where data exchanges might and, based on this information,
be useful, including: providing operationally useful
insight to humanitarian staff, implemented the Seoul’s Owl Bus
Faster decision-making service along the nine most heavily
— Data Logistics for Logistics and in discrete pieces of research
Google Waze A platform that looking, for instance, at the trafficked late night routes. This
Data (DL4LD) An innovation project partnership was able to create
of the Dutch national technical provides real-time anonymised intersection of migration and the
crowdsourced traffic data collected climate crisis. To support partners, a service that not only saved late
institute TNO and the University of night commuters $1.2 million in taxi
Amsterdam on sharing logistic data from participating drivers. In its Flowminder has created FlowKit,
Connected Cities programme, a suite of software tools designed fares from 2012 to 2014, reducing
at a large scale. car trips by 2.3 million annually (city
it shares its large amount of traffic to enable access and analysis
— Chief E-Mobility A data-driven data with government agencies, of mobile data for humanitarian buses emit 80 per cent less carbon
optimisation project aimed at the which can use this data to better and development use cases. monoxide than private cars); it was
creation of an electric car charging inform policy or quickly deploy also beneficial to low-income
infrastructure in Amsterdam. traffic assistance if needed. Some communities, providing them with
— Knowledge Mile One of use cases of Waze data include: a viable solution to commute home
Amsterdam’s long streets, made to the outer boroughs after working
into the smartest city street — Reducing traffic night shifts in the city.
by the Amsterdam Creative — Reducing incident response times
Industries Network.
Data sharing toolkit / A: The decision matrix / Canvas 2 19
Useful tools
Questions to ask
— Nesta Dataset Catalogue
— What data would you need, and — ODI Data Spectrum
how much of it is already available? Individuals Potential datasets Public sector
— ODI Data Access Map
— How are you going to incorporate
data if it becomes available — ODI Data About Us
in the future? — Wellcome Trust,
— What are the data gaps and can Understanding patient data
you mitigate the effect of inequality — Local Government Association
in data availability? Data Maturity self‑assessment tool
Private sector
Data sharing toolkit / A: The decision matrix / Canvas 2 20
Open data
Public sector
Individuals
Potential datasets
Private sector
Data sharing toolkit / A: The decision matrix / Canvas 2 21
Who to involve? ▶
How to use this canvas
Review the questions and, using
the stakeholder map below, identify the
people you will need to involve (place
at the centre the most essential ones).
Third sector organisations International organisations
e.g. universities and research centres, think tanks, and companies
Questions to ask civil society organisations, voluntary organisations e.g. UN, World Bank, IMF, international charities,
(NGOs, community networks, etc.) multinational organisations
— Who is initiating the partnership?
— What are the incentives of
stakeholders to take part in
the partnership?
— Assess what the value distribution
is in the partnership (i.e. is there
an equity of value among all
stakeholders? Who will not benefit?)
— Where is funding coming from?
— Who holds data that relates to this
use case?
Public sector Individuals
— Is there anyone else besides data organisations e.g. patients, consumers,
providers who are needed to make e.g. national, regional citizen scientists
this project work? For example, and local entities
(municipalities, etc.)
do you have the expertise in terms
of data science?
— When data subjects’ involvement is
required, how are they represented
in the decisions?
— Is there an option for opting in or out? Commercial entities
e.g. national entities (mobile networks,
financial services, healthcare providers, etc.),
Useful tools local businesses (SMEs, etc.)
▶ Who to involve?
Commercial entities
Data sharing toolkit / A: The decision matrix / Canvas 4 24
There are various options for — The Cancer Genome Atlas and shares data. These are
governance of data sharing (note that this, like many other early‑stage, but examples like
initiatives, which here are presented health research initiatives, MIDATA or Saluus.coop have
as a set of options on a spectrum involves highly sensitive data). tried to show how this could work
of top-down to bottom-up — The AirNow partnership on air in practice.
approaches, depending on the quality data (therefore using — DECODE project pilots,
power dynamic among those who less sensitive data). which demonstrated how
make the decisions on the structure bottom‑up approaches
of the partnership, those who will — Top-down approaches where supported by enabling city
run the data sharing initiative, an intermediary is introduced include authorities can operate as
those who will provide the data initiatives such as: an effective hybrid model.
and the outputs derived from it.
— The Ontario Smart Metering Entity The best-fit governance model will
In reality, many options will take the in Canada depend upon the answers to a set
form of a hybrid, with some top- of questions around themes such
down involvement (either from the — The ODI data trust pilots in the UK
as who has decision-making power,
public or private sector), combined where accountability lies and how
with an element of self‑governance — At the opposite end of the spectrum,
it is difficult to imagine a purely risk is managed.
by other stakeholders (e.g. a city,
citizen group, commercial entity). bottom-up model, as infrastructure
and technologies often include
— Examples at the top-down end of decisions made by those outside of
the scale include the Sidewalk Labs’ the stakeholders’ group, and can
smart city project in Toronto and end up unrepresentative of the
the partnership between DeepMind population. Examples of initiative
and NHS in the UK, both beset by that get closer to bottom-up
controversy and criticism, mainly approaches are:
around the lack of meaningful public
engagement, the choice of providers — Membership models, often
and issues of data governance. defined as ‘data co‑ops’, which
There are, however, very successful give people shared ownership
top-down initiatives that involve data and decision-making power
sharing such as: over a platform that gathers
Data sharing toolkit / A: The decision matrix / Canvas 5 26
Stakeholder data is consolidated and Predefined datasets reside within Parts of the system exist in
Questions to ask housed in the same physical location. the infrastructure of data holders separate locations.
— What is the structure that best fits and metadata is searched through
the purpose and why? a central system engine.
— Will the data need curation? Suitable when Suitable when Suitable when
If so, who is responsible for it? — Interoperability across stakeholders’ — There’s a need for predefined — There is need for higher fault
— Future proofing: what measures systems is not required. control over what is shared and tolerance and scalability potential.
have been considered if — Use of legacy data systems and with whom. — Less central control and higher
circumstances change? existing structures are preferred — Both local security and regulatory security (through encrypted
to creating a new one. compliance measures are communication protocols)
— Projects require lower mandatory but the need for global are required.
implementation and scale is also present.
maintenance costs.
Nationally and internationally, there is increasing — User registration e.g. The SeaDataNet portal and
commitment to the principle that data which are all metadata services are public domain. However,
Questions to ask a user registration is required for submitting requests
publicly funded should be publicly available. Open
— Will data be accessible? data is becoming increasingly available throughout for datasets and for downloading datasets from
the world, released by governments as part of the the distributed data centres, which is arranged via
— If so, what is the access model the Common Data Index (CDI) service. The user
to the data? transparency agenda, in the forms of open APIs or
open data hubs. registration is required to ensure that users agree
— Is there the need to have multiple with the SeaDataNet data policy and its associated
access models? Below is a selection of links to open data hubs. This is User Licence, which rules all dataset deliveries via
— If restricted, what kind of restriction not intended to be a complete list, but an indicator of SeaDataNet. Moreover, it gives SeaDataNet partners
does it require and why? what is openly available. insight in its users and their data requirements.
Open access
.........................................
.........................................
.........................................
.........................................
.........................................
Restricted access .........................................
.........................................
.........................................
.........................................
.........................................
.........................................
......................................... No access
.........................................
.........................................
.........................................
.........................................
.........................................
.........................................
29
B:
PROJECT
FOUNDATIONS
It is critical to recognise the important role that Investing in open and clear communication
people play in supporting (or hindering) the success and effective working relationships is
of data sharing initiatives. as critical as any other aspect of this work
to ensuring success.
At the end of the day, something as sophisticated
as a data sharing initiative will only succeed
if the people involved are working constructively
in partnership with each other.
Data sharing toolkit / B: Project foundations 30
Project foundations
Once the key decisions are made, a data sharing Senior buy-in Incentives Equitable contributions
model, or perhaps a number of possibilities will
Checklist
emerge. The important question then becomes
how to turn that vision into a reality. There
are two sets of considerations in this phase.
Checklist
What to do
Senior buy-in to has compelled banks to make medical data. One of the things
To establish a good collaboration their data open through APIs that makes MIDATA different from
environment there are work together that authorised third-party other data storage platforms is
three elements that need to Are you engaging with people organisations can use to develop that it does not use monetary
be addressed. senior enough to make decisions personalised financial services. rewards, but wider societal benefit
and unlock issues when they arise? to encourage data sharing, as they
If you can’t tick all three boxes, Economic incentives consider financial incentives the
What signals have you been given Value that directly or indirectly
you might want to consider how that there is senior buy-in, on both wrong incentive for people to
to identify blockers and solve them affects the bottom line by share their health data.
sides? Do you think this can last increasing revenue or reducing
before starting the process. This will the test of time?
avoid more issues down the line. costs (such as efficiency gains, Another important set of
enlarging customer base incentives in this group respond
A clear incentive for or creating a competitive to the principle of reciprocity
Tools all parties to be involved advantage) or from the direct (i.e. people or organisations
commercialisation of data participate with the aim of
Nesta’s Partnership Toolkit is For a data sharing partnership to (e.g. any data exchange platform helping and receiving advantages
a very useful resource to support work, all parties must benefit. You or market). at the same time, of which
this element of the process, which might have to help your potential reputation is a good example).
identifies the practical steps partner to understand how they Sometimes an economic incentive
that help create a successful will benefit from the partnership. to solve a particular problem
partnership, write an effective Remember that they will need to can be created by announcing Equitable contributions
partnership agreement and get sell the idea internally, no matter a challenge prize, such as in the
case of the Taiwan Presidential Each party involved in the data-
stakeholders’ collaboration off how senior they are. Think outside sharing arrangement must be
to a good start. the box and seek input from others Hackathon.
able to offer something to the
who bring a fresh perspective. Non-economic incentives partnership; however, making
The EAST framework, developed To help them understand the If the benefits that arise from an equitable contribution does
by the Behavioural Insights Team business case for partnership, sharing data are not strictly not mean making an equal
from its experience of applying consider whether they might economic nor result from contribution. Examples of
behavioural insights over the benefit from these incentives: regulations, they fall into contributions include money, time,
past few years, sets out four this category. These include resources, expertise, connections
simple principles for influencing Legal incentives
Regulatory measures can considerations on the common or data. What is required for
behaviour – make it easy, simple, good (i.e. if the value generated a partnership to succeed is for
attractive and timely (EAST). be taken by the government
to compel data sharing. by sharing data benefits society all stakeholders to be clear and
at large). An interesting case is happy that the contributions
An example is Open Banking, MIDATA.coop, a Swiss co‑operative brought into the partnership are
whereby the UK government that gives people control over their fair and valuable to all parties.
Data sharing toolkit / B: Project foundations 32
Requirements
What to do
Each of these will vary considerably Regulation Technical Funding
from initiative to initiative and will Navigating this will require an Together with individual Different types of data sharing
require ad hoc advice from legal ad hoc analysis of the project at evaluations of data maturity partnerships will need funding
and technical experts. multiple levels and will require: and technological infrastructure structures and mechanisms
audits, other considerations designed to support adequately
Make sure that everything — Investigating what the should include: the type of partnership.
decided in this sphere is aligned relevant legal/regulatory
with the design decisions. requirements are. — Data quality, standards — Investment of one lead
— Ensuring that the project and sharing frequency organisation Such as
complies with these requirements Poor data quality and systems a government/private company
and justifying this to relevant that are not interoperable pose or third-sector organisation.
authorities. challenges where ongoing data — Equal or tiered funding One,
sharing is required. It is important some or all partners form the
— Being aware of any evolving legal to understand whether the
requirements and communicating oversight of the initiative, each
project will require a one-off contributing a predetermined
these appropriately to relevant sharing, or whether it will require
stakeholders and, when amount (as per their tier).
ongoing, routine data exchange.
appropriate, to the public. This will have a direct impact — Tiers may be organised based
on the technical architecture on the level of input and/or
For example, the data protection that is needed for supporting incorporate a ‘cost-free’ tier,
regime in the local context data sharing. where partners can document
where the project is held will their interest and support through
influence the structure of — Is the data sharing data provision or consultancy
the partnership. For global architecture going to where necessary.
technology companies, be outsourced or developed
ad hoc Commercial pre‑built — External funding e.g. a grant,
though, the problem isn’t the sum of money from a government
imposition of a single regulatory solutions are available on
the market, but might carry or other organisation for running
regime, but rather the need a particular data sharing project
to consider many, potentially constraints in the way data
is handled. Tailor-made data or pilot.
conflicting ones to maintain
their global businesses. sharing technical architectures — Commercialisation as part
can also be developed in-house of business model, with
or contracted from external income generated through selling
consultants and suppliers. data created (e.g. 23andMe).
Copyright © Smart Dubai. 2020.
All rights reserved.
Designed by soapbox.co.uk