0% found this document useful (0 votes)
75 views112 pages

From Privacy To Partnership The Royal Society

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views112 pages

From Privacy To Partnership The Royal Society

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 112

From privacy

to partnership
The role of privacy enhancing
technologies in data governance
and collaborative analysis
From privacy to partnership
Issued: January 2023 DES7924
ISBN: 978-1-78252-627-8
© The Royal Society

The text of this work is licensed under the terms


of the Creative Commons Attribution License
which permits unrestricted use, provided the
original author and source are credited.

The license is available at:


creativecommons.org/licenses/by/4.0

Images are not covered by this license.

This report can be viewed online at:


royalsociety.org/privacy-enhancing-technologies

Cover image: Visualisation of the Internet 1997 – 2021,


by Barret Lyon as part of the the Opte Project.
© Barrett Lyon / The Opte Project.
Contents

Contents
Foreword4
Executive summary 5
Scope  5
Methodology  5
Key findings 6
Recommendations8
Introduction18
Background  18
Key terms and definitions  19
Chapter one: The role of technology in privacy‑preserving data flows 22
Data privacy, data protection and information security 23
What are privacy enhancing technologies (PETs)?  23
A downstream harms-based approach: Taxonomy of harms 24
Recent international developments in PETs 25
Interest in PETs for international data transfer and use  28
Accelerating PETs development: Sprints, challenges and international collaboration 28
Chapter two: Building the PETs marketplace 32
PETs for compliance and privacy 32
PETs in collaborative analysis 33
Barriers to PETs adoption: User awareness and understanding in the UK public sector 35
Barriers to PETs adoption: Vendors and expertise 36
Chapter three: Standards, assessments and assurance in PETs 42
PETs and assurance: The role of standards 42
Chapter four: Use cases for PETs 56
Considerations and approach 56
Privacy in biometric data for health research and diagnostics 57
Preserving privacy in audio data for health research and diagnostics 65
PETs and the internet of things: enabling digital twins for net zero 67
Social media data: PETs for researcher access and transparency 74
Synthetic data for population-scale insights 81
Collaborative analysis for collective intelligence 86
Online safety: Harmful content detection on encrypted platforms 90
Privacy and verifiability in online voting and electronic public consultation 95
PETs and the mosaic effect: Sharing humanitarian data in emergencies and fragile contexts 97
Conclusions106
Appendices108
Appendix 1: Definitions 108
Appendix 2: Acknowledgements 109

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 3


Foreword

Foreword
The widespread collection and use of data is Our report arrives at a time of rapid innovation
transforming all facets of society, from scientific in PETs, as well as data protection legislation
research to communication and commerce. reform in the United Kingdom. The intention
The benefits of using data in decision making is not to provide a comprehensive view of
are increasingly evident in tackling societal all technologies under the broad umbrella of
problems and understanding the world around PETs; rather, we have chosen to focus on a
us. At the same time, there are inherent subset of promising and emerging tools with
Alison Noble OBE vulnerabilities when sensitive data is stored, demonstrable potential in data governance. In
FREng FRS used or shared. demonstrating this value, we cite examples from
the UK and international contexts. Realising the
From privacy to partnership sets out how full potential of PETs across national borders
an emerging set of privacy enhancing will require further harmonisation, including
technologies (PETs) might help to balance the consideration of data protection laws in
risks and rewards of data use, leading to wider various jurisdictions.
social benefit. It follows the Royal Society’s
Protecting privacy in practice: The current use, Artificial intelligence and machine learning
development and limits of Privacy Enhancing are transforming our capacity to assess and
Technologies in data analysis, which gave confront our greatest challenges, but these
a snapshot of this rapidly developing field in tools require data to ‘fuel’ them. As a biomedical
2019. This new publication offers a refreshed engineer using AI-assistive technologies to
perspective on PETs, not only as security tools, detect disease, I recognise that the greatest
but as novel means to establish collaborative research problems of our time – from cancer
analysis and data partnerships that are ethical, diagnostics to the climate crisis – are, in a
legal and responsible. sense, data problems.

We have three objectives for this report. Our The value of data is most fully realised through
first objective is that the use cases inspire aggregation and collaboration, whether
those collecting and using data to consider the between individuals or institutions. I hope this
potential benefits of PETs for their own work, report will inspire new approaches to data
or in new collaborations with others. Second, protection and collaboration, encouraging
for the evidence we present on barriers to further research in – and testing of – PETs in
adoption and standardisation to help inform various scenarios. PETs are not a silver bullet,
policy decisions to encourage a marketplace but they could play a key role in unlocking
for PETs. Finally, through our recommendations, the value of data without compromising
we hope the UK will maximise the opportunity privacy. By enabling new data partnerships,
to be a global leader in PETs – both for data PETs could spark a research transformation:
security and collaborative analysis – alongside a new paradigm for information sharing and
emerging, coordinated efforts to implement data analysis with real promise for tackling
PETs in other countries. future challenges.

Professor Alison Noble OBE FREng FRS,


Chair of the Royal Society Privacy Enhancing
Technologies Working Group

4 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Executive summary

Executive summary
Privacy Enhancing Technologies (PETs) are Scope
a suite of tools that can help maximise the From privacy to partnership outlines the current
use of data by reducing risks inherent to data PETs landscape and considers the role of these
use. Some PETs provide new techniques technologies in addressing data governance
for anonymisation, while others enable issues beyond data security. The aim of this
collaborative analysis on privately-held datasets, report is to address the following questions:
allowing data to be used without disclosing • How can PETs support data governance
copies of data. PETs are multi-purpose: they and enable new, innovative, uses of data for
can reinforce data governance choices, serve public benefit?
as tools for data collaboration or enable
• What are the primary barriers and enabling
greater accountability through audit. For these
factors around the adoption of PETs in
reasons, PETs have also been described
data governance, and how might these be
as ‘Partnership Enhancing Technologies’1 or
addressed or amplified?
‘Trust Technologies’2.
• How might PETs be factored into frameworks
This report builds on the Royal Society’s 2019 for assessing and balancing risks, harms and
publication Protecting privacy in practice: benefits when working with personal data?
The current use, development and limits
Methodology
of Privacy Enhancing Technologies in data
This work was steered by an expert Working
analysis3, which presented a high-level
Group as well as two closed contact group
overview of PETs and identified how these
sessions with senior civil servants and
technologies could play a role in addressing
regulators in April and October 2021 (on the
privacy in applied data science research, digital
scope and remit of the report, and on the use
strategies and data-driven business.
case topics and emerging themes, respectively).

This new report, developed in close


collaboration with the Alan Turing Institute,
considers how PETs could play a significant
role in responsible data use by enhancing data
protection and collaborative data analysis. It
is divided into three chapters covering the
emerging marketplace for PETs, the state
of standards and assurance and use cases
for PETs.

1 Trask A. in Lunar Ventures (Lundy-Bryan L.) 2021 Privacy Enhancing Technologies: Part 2—the coming age of
collaborative computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-
99e8-12a92d5d88a0 (accessed 20 September 2022).
2 Infocomm Media Development Authority (Singapore grows trust in the digital environment). See https://www.imda.gov.
sg/news-and-events/Media-Room/Media-Releases/2022/Singapore-grows-trust-in-the-digital-environment (accessed
5 June 2022).
3 The Royal Society. 2019 Protecting privacy in practice: The current use, development and limits of Privacy Enhancing
Technologies in data analysis. See https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/
privacy-enhancing-technologies-report.pdf (accessed 30 June 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 5


Executive summary

The findings in this report are the result Standardisation for PETs, including data
of consultations with a wide range of data standards, is lacking and is cited as a hindrance
and privacy stakeholders from academia, to adoption by potential users in the UK public
government, third sector, and industry, as well sector10. Technical standards are required to
as three commissioned research projects on ensure the underpinning technologies work as
the role of assurance in enabling the uptake intended, while process standards are needed
of PETs4, PETs market readiness in the public to ensure users know how and when to deploy
sector5, and a survey of synthetic data: data them. While few PETs-specific standards exist
that is artificially generated based on real-world to date, standards in adjacent fields (such as
data, but which produces new data points6. The cybersecurity and AI) will be relevant. In the
use cases were drafted with input from domain future, PETs-specific standards could provide
specialists, and the report was reviewed by the basis for assurance schemes to bolster
expert readers as well as invited reviewers. user confidence.
The details of contributors, Working Group
members, expert readers and reviewers are
provided in the Appendix.

Key findings
General knowledge and awareness of PETs
remains low amongst many potential PETs
users7, 8, with inherent risk of using new and
poorly understood technologies acting as a
disincentive to adoption. Few organisations,
particularly in the public sector, are prepared
to experiment with data protection9. Without
in-house expertise, external assurance
mechanisms or standards, organisations
are unable to assess privacy trade-offs for a
given PET or application. As a result, the PETs
value proposition remains abstract and the
business case for adopting PETs is unclear for
potential users.

4 Hattusia 2022 The current state of assurance in establishing trust in PETs. The Royal Society.
See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/
5 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/. This project was partly funded by a grant from CDEI.
6 Jordon J et al. 2022 Synthetic data: What, why and how? See https://arxiv.org/pdf/2205.03257.pdf (accessed 2
September 2022).
7 London Economics and the Open Data Institute. 2022 Privacy Enhancing
Technologies: Market readiness, enabling and limiting factors. The Royal Society.
See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/.
8 Lunar Ventures, Lundy-Bryan L. 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
9 London Economics and the Open Data Institute. 2022 Privacy Enhancing
Technologies: Market readiness, enabling and limiting factors. The Royal Society.
See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/.
10 Ibid.

6 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Executive summary

A significant barrier to the widespread use of Data protection is only one aspect of the right
PETs is a lack of clear use cases for wider public to privacy. In most cases, PETs address this
benefit. To address this, Chapter 4 illustrates the one aspect but do not address how data or
potential benefit of PETs in the contexts of: the output of data analysis is used, although
• Using biometric data for health research this could change as PETs mature. Some
and diagnostics; recent applications utilise PETs as tools for
accountability and transparency, or to distribute
• Enhancing privacy in the Internet of Things
decision-making power over a dataset across
and in digital twins;
multiple collaborators11, suggesting their
• Increasing safe access to social media data potential in addressing elements of privacy
and accountability on social media platforms; beyond data security.

• Generating population-level insights using


The field of PETs continues to develop rapidly.
synthesised national data;
This report aims to consolidate and direct
• Collective intelligence, crime detection and these efforts toward using data for public good.
voting in digital governance; and Through novel modes of data protection,
PETs are already enhancing the responsible
• PETs in crisis situations and in analysis of
use of personal data in tackling significant
humanitarian data:
contemporary challenges. The emerging role
The use cases demonstrate how PETs of PETs as tools for partnership, enhancing
might maximise the value of data without transparency and accountability may entail
compromising privacy. greater benefits still.

A core question for potential PETs users is:


What will PETs enable an analyst to do with data
that could not be accomplished otherwise?
Alternatively: What will PETs prevent an
adversary from achieving? As the use cases
illustrate, PETs are not a ‘silver bullet’ solution
to data protection problems. However, they
may be able to provide novel building blocks
for constructing responsible data governance
systems. For example, in some cases, PETs
could be the best tools for reaching legal
obligations, such as anonymity.

11 For example, Meta recently conducted a survey collecting personal data, which was encrypted and split into shares
between third-party facilitators, namely universities. Analyses can be run using secure multi-party computation;
requests for analysis must be approved by all third-party shareholders. See https://ai.facebook.com/blog/assessing-
fairness-of-our-products-while-protecting-peoples-privacy/ (accessed 10 October 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 7


Recommendations

Recommendations
AREA FOR ACTION: COORDINATED INTERNATIONAL ACTION TO ENSURE THE
RESPONSIBLE DEVELOPMENT OF PETS FOR PUBLIC BENEFIT

RECOMMENDATION 1

National and supernational organisations, including standards


development organisations (SDOs) should establish protocols and
standards for PETs, and their technical components, as a priority.

PETs have been developed by experts in Alongside technical standards, process


different fields and with little coordination standards should guide best practice in the
between them to date. The greatest potential application of PETs in data governance. Privacy
for PETs – whether used in isolation or best practice guides, codes of conduct and
combination – are as components of data process standards (such as the draft Institute
governance systems. Open standards (available of Privacy Design Process Standard13) could
for use by anyone) are likely to help drive the be used to integrate PETs into a privacy-
development, accessibility and uptake of PETs by-design approach to data governance
for data governance. Furthermore, standards systems. Whereas technical standards will be
will be necessary for audit and assurance, essential for technical interoperability, codes
encouraging a marketplace of confident PETs of conduct for PETs in data management and
users with effective regulation and quality use will be critical for ‘social interoperability’
assurance marks where appropriate. and acceptance in partnerships and digital
collaborations on new scales (such as
SDOs such as the British Standards Institute international or cross-sector partnerships).
(BSI) (UK), National Physical Laboratory (UK),
Institute of Electrical and Electronics Engineers
(IEEE) (US), the National Cyber Security Centre
(UK) and National Institute of Standards and
Technology (NIST) (US) should identify and
convene international expert groups to
address gaps in PETs technical standards.
These should build on existing standards in
cryptography and information security (Chapter
3). Open standards will be especially important
in PETs that enable information networks,
such as secure multi-party computation or
federated learning (similar to how HTTP12
provided a common set of rules that enabled
communication over the Internet).

12 Hypertext Transfer Protocol.


13 Institute of Privacy Design (The DRAFT Design Process Standard). See https://instituteofprivacydesign.org/2022/02/11/
the-draft-design-process-standard/ (accessed 2 September 2022).

8 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Recommendations

RECOMMENDATION 2

Science funders, including governments and intergovernmental


bodies, should accelerate and incentivise the development and
maturation of PETs by funding prize challenges, pathfinder projects
(such as topic guides or resource lists) and cross-border, collaborative
test environments (such as an international PETs sandbox).

Science funders should foster a network of Intragovernmental bodies such as the


independent researchers and universities United Nations and Global Partnership for
working on PETs challenges that address PETs Artificial Intelligence should lead by creating
in security, partnerships and transparency test environments and providing data for
applications. They could involve the private demonstrations to test the security, privacy,
sector (for example cloud providers and social and utility potentials of specific PETs, as well
media platforms) in designing challenges and as test configurations of PETs. An international
through international cooperation on standards, PETs sandbox would allow national regulators
guidance and regulation. To date, exemplary to collaborate and evaluate PETs solutions for
programmes include the UK-US PETs Prize cross-border data use according to common
Challenge led by the UK’s Centre for Data data governance principles.
Ethics and Innovation (CDEI) and the US White
House Office of Science and Technology Policy;
the Digital Security by Design Challenge14
funded through UK Research and Innovation;
the Data.org Epiverse Challenge funding call;
and the French data protection authority
sandbox on digital health and GDPR15.

14 UK Research and Innovation (Digital security by design challenge). See https://www.ukri.org/what-we-offer/our-main-


funds/industrial-strategy-challenge-fund/artificial-intelligence-and-data-economy/digital-security-by-design-challenge/
(accessed 20 September 2022).
15 Commission Nationale de l’Informatique et des Libertés (Un «bac à sable» RGPD pour accompagner des projets
innovants dans le domaine de la santé numérique). See https://www.cnil.fr/fr/un-bac-sable-rgpd-pour-accompagner-
des-projets-innovants-dans-le-domaine-de-la-sante-numerique (accessed 15 September 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 9


Recommendations

RECOMMENDATION 3

Researchers, regulators and enforcement authorities should


investigate the wider social and economic implications of PETs, for
example, how PETs might be used in novel harms (such as fraud or
linking datasets for increased surveillance) or how PETs might affect
competition in digitised markets (such as monopolies through new
network effects).

The potential follow-on effects of PETs Regulators, such as the Information


adoption are not well understood, particularly Commissioner’s Office (ICO) and the
whether and how they might amplify data Competition and Markets Authority (CMA),
monopolies, or what oversight mechanisms could investigate the wider economic
are required to prevent the type of implications of PETs, particularly where they
collaborative analysis that might be considered could enable competition through greater
state surveillance16. For example, the Arts and interoperability (as with open banking, for
Humanities Research Council could consider example). It is not understood how the
the ethical, social and economic implications adoption of PETs aligns with FAIR19 principles,
of PETs within their program on AI (particularly particularly where PETs (such as privacy-
where PETs could be dual use or surveillance preserving synthetic data) are used as an
technologies)17, 18. alternative to open data. In collaborative
analysis, the ability to audit data that is not
shared should be better understood by those
who might use PETs (to identify potential
for biased outcomes, for example). The
relationship between PETs and data trusts also
remains ambiguous.

16 Liberty Human Rights (Challenge hostile environment data-sharing). See https://www.libertyhumanrights.org.uk/


campaign/challenge-hostile-environment-data-sharing/ (accessed 20 September 2022).
17 Ongoing research highlights the negative consequences of data sharing in dual-use or otherwise
sensitive contexts. For example: Papageogiou V, Wharton-Smith A, Campos-Matos I, Ward H. 2020 Patient
data-sharing for immigration enforcement: a qualitative study of healthcare providers in England. BMJ
Open. (https://doi.org/10.1136/bmjopen-2019-033202)
18 Liberty Human Rights (Liberty and Southall Black Sisters’ Super-complain on data-sharing between the police
and home office regarding victims and witnesses to crime). See https://www.libertyhumanrights.org.uk/issue/
liberty-and-southall-black-sisters-super-complaint-on-data-sharing-between-the-police-and-home-office-regarding-
victims-and-witnesses-to-crime/ (accessed 20 September 2022).
19 Go FAIR (FAIR principles). See https://www.go-fair.org/fair-principles/ (accessed 20 September 2022).

10 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Recommendations

AREA FOR ACTION: A STRATEGIC AND PRAGMATIC APPROACH TO PETS ADOPTION


IN THE UK, LED BY THE PUBLIC SECTOR THROUGH PUBLIC-PRIVATE PARTNERSHIPS,
DEMONSTRATION OF USE CASES AND COMMUNICATION OF BENEFITS

RECOMMENDATION 4

The UK Government should develop a national PETs strategy to


promote the responsible use of PETs in data governance: as tools for
data protection and security, for collaboration and partnership (both
domestically and cross-border) and for advancing scientific research.

PETs could reform the way data is used The PETs strategy should offer a vision that
domestically and across borders, offering complements the Government’s National
potential solutions to longstanding problems of Data Strategy21 and National AI Strategy22.
siloed and underutilised data across sectors. The PETs strategy should prioritise a roadmap
To ensure the use of PETs for public good, for public sector PETs adoption, addressing
PETs-driven information networks should be public awareness and the PETs marketplace
stewarded by public sector and civil society (Chapter 2), technological maturity, appropriate
organisations using data infrastructure for public regulatory mechanisms and responsibilities,
good. A coordinated national strategy for the alongside standards and codes of conduct for
development and adoption of PETs for public PETs users (Chapter 3).
good will ensure the timely and responsible
deployment of these technologies, with the
public sector leading by example.

PETs have a role to play in achieving the


objectives outlined in Mission 2 of the National
Data Strategy, securing a ‘pro-growth and
trusted data regime,’ positioning the UK
internationally as a trusted data partner,
with wider implications for national security.
This recommendation reflects emerging,
coordinated PETs work in foreign governments
(such as that led in the US by the White House
Office for Science and Technology Policy)20.

20 US Office for Science and Technology Policy (Request for Information on Advancing Privacy-Enhancing Technologies).
https://public-inspection.federalregister.gov/2022-12432.pdf (accessed 17 July 2022).
21 HM Government (National Data Strategy). See https://www.gov.uk/government/publications/uk-national-data-strategy/
national-data-strategy (accessed 9 September 2022).
22 HM Government (National AI Strategy). See https://www.gov.uk/government/publications/national-ai-strategy
(accessed 9 September 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 11


Recommendations

RECOMMENDATION 5

Local, devolved and national governments across the UK should lead


by example in the adoption of PETs for data sharing and use across
government and in public-private partnerships, improving awareness
by communicating PETs-enabled projects and their results.

Public sector organisations could partner with Communication of PETs and their appropriate
small and medium-sized enterprises (SMEs) use in various contexts will be key to building
developing PETs to identify use cases, which trust with potential users23, encouraging the
could then be tested through low-cost, low-risk PETs marketplace (Chapter 2). The ICO should
pilot projects. Legal experts and interdisciplinary continue its work on using PETs for wider good
policy professionals should be involved from and communicating the implications – including
project inception, ensuring PETs meet data barriers and potential benefits. The CDEI
protection requirements and that outcomes and should continue to provide practical examples
implications are properly communicated to non- that will help organisations understand and
technical decision-makers. build a business case for PETs’ adoption.
Proof of concept and pilot studies should
Use cases illustrated in Chapter 5 highlight be communicated to the wider public to
areas of significant potential public benefit in demonstrate the value of PETs, foster trust in
healthcare and medical research, for reaching public sector data use and demonstrate value-
net zero through national digital twins and for for-money24.
population-level data collaboration.

23 The Royal Society. Creating trusted and resilient data systems: The public perspective. (to be published online
in 2023)
24 This is in line with the Digital Economy Act 2017. See: The Information Commissioner’s Office (Data sharing across
the public sector: the Digital Economy Act codes). See https://ico.org.uk/for-organisations/guide-to-data-protection/
ico-codes-of-practice/data-sharing-a-code-of-practice/data-sharing-across-the-public-sector-the-digital-economy-act-
codes/ (accessed 2 September 2022).

12 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Recommendations

RECOMMENDATION 6

The UK Government should ensure that new data protection


reforms account for the new systems of data governance enabled
by emerging technologies such as PETs and ensure any new
regulations are supported by clear, scenario-specific guidance and
assessment tools.

While data protection legislation should remain Further interpretation may be required to help
technology neutral so as to be adaptable, users understand how PETs might serve as
current plans to review UK data protection laws tools for meeting data protection requirements.
provide an opportunity to consider the novel For example, it may be required to clarify
and multipurpose nature of these emerging data protection obligations where machine
technologies, particularly as they provide the learning models are trained on personal data in
technical means for new types of collaborative federated learning scenarios26 or the degree to
analysis. The ICO should continue its work to which differentially private or homomorphically
provide clarity around PETs and data protection encrypted data meets anonymisation
law, encouraging the use of PETs for wider requirements27. Where PETs enable information
public good25 and drawing from parallel work networks and international data collaborations,
on AI guidance where relevant (such as privacy- the ICO might anticipate clarification questions
preserving machine learning). specific to international and collaborative
analysis use cases. Regulatory sandboxes (as
in Recommendation 2) will be useful for testing
scenarios, particularly for experimentation with
PETs in structured transparency28 (such as in
open research, credit scoring systems) and as
accountability tools29.

25 The Information Commissioner’s Office (ICO consults health organisation to shape thinking on privacy-enhancing
technologies). See https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2022/02/ico-consults-health-
organisations-to-shape-thinking-on-privacy-enhancing-technologies/ (accessed 20 March 2022).
26 Nguyen T, Sun K, Wang S, Guitton F, Guo Y. 2021. Privacy preservation in federate learning An insightful survey from
the GDPR perspective. Computers & Security 110. (https://doi.org/10.1016/j.cose.2021.102402)
27 See for example: Koerner K. 2021 Legal perspectives on PETs: Homomorphic encryption. Medium. 20 July 2021. See
https://medium.com/golden-data/legal-perspectives-on-pets-homomorphic-encryption-9ccfb9a334f (accessed 30
June 2022).
28 Trask A, Bluemke E, Garfinkel B, Cuervas-Mons CG, Dafoe A. 2020 Beyond Privacy Trade-offs with Structured
Transparency. See https://arxiv.org/ftp/arxiv/papers/2012/2012.08347.pdf (accessed 6 February 2022).
29 See for example: Meta AI (Assessing fairness of our products while protecting peoples privacy). See https://
ai.facebook.com/blog/assessing-fairness-of-our-products-while-protecting-peoples-privacy/ (accessed 15
August 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 13


Recommendations

RECOMMENDATION 6 (CONTINUED)
The ICO could expand on its PETs guidance, for
example, through developing self-assessment
guides. Data ethics organisations, such as the
CDEI, might also develop impact assessment
tools, for example, a PETs impact assessment
protocol that considers downstream
implications on human rights. The Alliance
for Data Science Professionals certification
scheme30, which defines standards for ethical
and well-governed approaches to data use,
could specifically consider the role of PETs
in evidencing Skill Areas A (Data Privacy and
Stewardship) and E (Evaluation and Reflection).

30 Alliance for Data Science Professionals (Homepage). See https://alliancefordatascienceprofessionals.co.uk/ (accessed


20 September 2022).

14 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Recommendations

AREA FOR ACTION: FOUNDATIONAL SCHOLARSHIP AND PROFESSIONALISATION


TO ENCOURAGE MATURATION OF PETS, FOSTER TRUST AND DRIVE UPTAKE OF
PETS IN DATA-USING ORGANISATIONS

RECOMMENDATION 7 RECOMMENDATION 8

Universities, businesses and Organisations providing


science funders should fund certifications and continuing
foundational scholarship in professional development
PETs-related fields, such as courses in data science,
cryptography and statistics. cybersecurity and related fields
should incorporate PETs modules
Foundational training and fellowships in
PETs fundamentals (such as cryptography)
to raise awareness among
for graduate level study will create the data professionals.
skilled workforce required for widespread
development and implementation of PETs. Professional certifications and Continuing
Critical future-proofing questions could be Professional Development opportunities
addressed through fellowships and research (including British Computer Society Professional
posts (for example, evaluating the security Certifications such as the Alliance for Data
guarantees of PETs in a post-quantum context, Science Professionals certification, Data
or the energy proportionality, sustainability and Science Professional Certificates offered by
scalability of energy-intensive, cryptography- Microsoft or IBM, or (ISC)² Certifications31) should
based PETs). Internships and work placement include a primer on PETs to raise awareness
programmes in organisations developing PETs and encourage baseline knowledge of PETs
could assist new graduates in moving from amongst in-house data professionals. For
academic fields into applied PETs research and example, the International Association of
development. Privacy Professionals now includes a module
on PETs in their Certified Information Privacy
Technologist Certification32.

31 (ISC)² ((ISC)² Information Security Certifications). See https://www.isc2.org/Certifications# (accessed 13 May 2022).
32 International Association of Privacy Professionals (Privacy Technology Certification). See https://iapp.org/media/pdf/
certification/CIPT_BOK_v.3.0.0.pdf (accessed 30 June 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 15


Recommendations

TABLE 1

Summary table of PETs explored in this report

Secure multi-party computation


÷ Trusted execution environment Homomorphic encryption
(PSI / PIR)
Context of data use Securely outsourcing to a server, Securely outsourcing specific Enabling joint analysis on sensitive
or cloud, computations on sensitive operations on sensitive data; Safely data held by several organisations
data providing access to sensitive data

Privacy risk Revealing sensitive attributes Revealing sensitive attributes Revealing sensitive attributes
addressed present in a dataset during present in a dataset during present in a dataset during
computation computation computation

Data protected In storage In storage X In storage

During computation During computation During computation

X On release X On release* X On release

Benefits Commercial solutions widely Can allow zero loss of information; No need for a trusted third party--
available; Zero loss of information; FHE can support the computation sensitive information is not revealed
efficient computation of any of any operation to anyone; The parties obtain only
operations the resulting analysis or model

Current limitations Many side-channel attacks FHE, SHE and PHE are usable Highly compute and
possible; current commercial Highly computationally intensive communication intensive; requires
solutions limited with regard to Bandwidth and latency issues expertise in design that meets
distributed computation on big Running time compute requirements and security
datasets PHE and SHE support the models
computation of limited functions
Standardisation in progress
Possibility for side channel attacks
(current understanding is limited)

Readiness level Product PHE / SHE / FHE in use PSI / PIR / Product, Proof of
(FHE on a smaller scale) concept--Pilot

Qualification Could be exclusive to established Specialist skills Specialist skills


criteria research groups Custom protocols Custom protocols
Computing resources Computing resources

KEY

FHE: Fully Homomorphic Encryption SHE: Somewhat Homomorphic Encryption PHE: Partial Homomorphic Encryption
PIR: Private Information Retrieval PSI: Private Set Intersection

* If the client encrypts their data and sends it to a server for homomorphic computation, only the client is able to access the results (by using their
secret decryption key).

16 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Recommendations

Federated learning /
Differential privacy Privacy-preserving synthetic data
federated machine learning
Enables the use of remote data for training Prevents disclosure about individuals when Prevents disclosure about individuals when
algorithms; data is not centralised releasing statistics or derived information releasing statistics or derived information

Revealing sensitive information, including Revealing sensitive information, including Revealing sensitive attributes or presence in
an individual’s presence in a dataset an individual’s presence in a dataset; a dataset
Dataset or output disclosing sensitive
information about an entity included in the
dataset

X In storage In storage (and at point of data X In storage


collection)
During computation During computation (with limitations)
During computation (with limitations)
X On release On release (with limitations)
On release (with limitations)

Very little loss of information Formal mathematical proof / privacy Applications beyond privacy
guarantee. Level of privacy protection may
Level of privacy protection may be
be quantifiable. Relative to other PETs, it is
quantifiable (eg, with differentially private
computationally inexpensive.
synthetic data)

Model inversion and membership inference Noise and loss of information, unless Noise and loss of information
attacks may be vulnerabilities datasets are large enough Setting the level of protection requires
Setting the level of protection requires expertise
expertise Privacy enhancement unclear
Precision of analysis limited inversely to
level of protection

Product, in use Proof of concept, in use Proof of concept, in use

May require scale of data within each Specialist skills Specialist skills required
dataset (cross-silo federated learning) Custom protocols
As yet, no standards for generation or setting
Very large datasets
Distributed systems are complex and privacy parameters
difficult to manage As yet, no standards for setting privacy
parameters

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 17


Introduction

Introduction
Background This work follows the Royal Society’s 2019
Data about individuals, their unique report Protecting privacy in practice: The
characteristics, preferences and behaviours, current use, development and limits of Privacy
is ubiquitous and the power to deliver data- Enhancing Technologies in data analysis37,
driven insights using this information is rapidly which highlighted the role of PETs in enabling
accelerating33, 34. This unprecedented availability the derivation of useful results from data without
of data, coupled with new capabilities to use providing wider access to datasets. Protecting
data, drives the frontiers of research and privacy in practice presented a high-level
innovation – addressing challenges from the overview of PETs and identified how these
climate crisis to the COVID-19 pandemic35, 36. potentially disruptive technologies could play
However, the greater collection, transfer and a role in addressing tensions around privacy
use of data – particularly data which is personal, and utility.
commercially sensitive or otherwise confidential
– also entails increased risks. The tension The 2019 report made several observations for
between maximising data utility (where data is how the UK could realise the potential of PETs,
used) and managing risk (where data is hidden) including:
poses a significant challenge to anyone using • The research and development of PETs can
data to make decisions. be accelerated through collaborative, cross-
sector research challenges developed by
This report, undertaken in close collaboration government, industry and the third sector,
with the Alan Turing Institute, considers the alongside fundamental research support for
potential for tools and approaches collectively advancing PETs;
known as Privacy Enhancing Technologies
• Government can be an important influencer
(PETs) to revolutionise the safe and rapid use
in the adoption of PETs by demonstrating
of sensitive data for wider public benefit. It
their use and sharing their experience around
examines the possibilities and limitations for
how PETs unlock new opportunities for data
PETs in responsible data governance and
analysis. At the same time, public sector
identifies steps required to realise their benefits.
organisations should be given the level of
expertise and assurance required to utilise
new technological solutions;

33 The Royal Society. 2017 Machine learning: the power and promise of computers that learn by example. See https://
royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf (accessed 30
May 2022).
34 The British Academy and the Royal Society. 2017 Data management and use: Governance in the 21st century. See
https://royalsociety.org/-/media/policy/projects/data-governance/data-management-governance.pdf (accessed 28
July 2022).
35 Alsunaidi A J et al. 2021 Applications of big data analytics to control COVID-19 pandemic. Sensors (Basel) 21, 2282.
(https://doi.org/10.3390/s21072282 s21072282)
36 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero. See https://
royalsociety.org/-/media/policy/projects/digital-technology-and-the-planet/digital-technology-and-the-planet-report.
pdf (accessed 20 September 2022).
37 The Royal Society. 2019 Protecting privacy in practice: The current use, development and limits of Privacy Enhancing
Technologies in data analysis. See https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/
privacy-enhancing-technologies-report.pdf (accessed 30 June 2022).

18 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Introduction

• PETs can promote human flourishing through Noise: noise refers to a random alteration of
enabling new and innovative ways of data/values in a dataset so that the true data
governing data, as well as promoting safe points (such as personal identifiers) are not as
and secure data use. The Department for easy to identify.
Digital, Culture, Media and Sport (DCMS), the
Privacy budget (also differential privacy
Centre for Data Ethics and Innovation (CDEI),
budget, or epsilon): a quantitative measure
Office for AI, regulators and civil society can
of the change in confidence of an individual
consider how PETs play a role in wider data
having a given attribute.
governance structures, including how they
operate alongside new data governance Privacy-preserving synthetic data (PPSD):
models such as ‘data trusts’. synthetic data generated from real-world
data to a degree of privacy that is deemed
Key terms and definitions acceptable for a given application.
This report draws on multidisciplinary concepts
Private Set Intersection (PSI): secure multiparty
from cryptography, business, cybersecurity,
computation protocol where two parties
ethics and analytics. Included here is a
compare datasets without revealing them in
quick reference glossary of key terms used
an unencrypted form. At the conclusion of the
throughout.
computation, each party knows which items
they have in common with the other. There are
Differential privacy: security definition which
some scalable open-source implementations of
means that, when a statistic is released, it
PSI available.
should not give much more information about
a particular individual than if that individual had Secure multi-party computation (SMPC or
not been included in the dataset. See also MPC): a subfield of cryptography concerned
‘privacy budget’. with enabling private distributed computations.
MPC protocols allow computation or analysis
Distributed Ledger Technology (DLT): an
on combined data without the different parties
open, distributed database that can record
revealing their own private inputs to the
transactions between several parties efficiently
computation.
and in a verifiable and permanent way. DLTs are
not considered PETs, though they can be used Synthetic data: data that is modelled to
(as some PETs) to promote tra nsparency by represent the statistical properties of original
documenting data provenance. data; new data values are created which, taken
as a whole, reproduce the statistical properties
Epsilon (Ɛ
Ɛ): see ‘privacy budget’.
of the ‘real’ dataset.
Homomorphic encryption (HE): a property that
Trusted Execution Environment (TEE): secure
some encryption schemes have, so that it is
area of a processor that allows code and data
possible to compute on encrypted data without
to be isolated and protected from the rest of
deciphering it.
the system such that it cannot be accessed
Metadata: data that describes or provides or modified even by the operating system or
information about other data, such as time and admin users. Trusted execution environments
location of a message (rather than the content are also known as secure enclaves.
of the message).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 19


20 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT
Chapter one
The role of technology in
privacy‑preserving data flows

Left
© iStock
credit. / Poike.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 21


Chapter one

The role of technology in


privacy‑preserving data flows
The ever-growing quantity of data collected • Exclusion: the use of personal data without
in contemporary life, coupled with increasing notice to individuals;
Data security power to compute, is opening new possibilities
• Disclosure: the revelation of personal data
relates to for data-driven solutions38. At the same time,
to others;
protecting data as there is unprecedented potential for the misuse
of data – whether intentional or unintentional • Exposure: the revelation of an individual’s
an asset, whereas
– leading to downstream harms at individual, physical or emotional attributes to others;
data privacy is community, corporate and national scales39, 40.
• Intrusion: invasive acts that interfere with an
more concerned
individual’s physical or virtual life (such as
with protecting The Royal Society’s 2019 report focused on
junk mail).
the role of PETs in addressing data privacy.
people: ensuring Acknowledging that privacy is a term with Data privacy tools can include technologies,
the rights of data multiple meanings41, 42, it referenced Daniel legal instruments or physical components
subjects follow Solove’s taxonomy of privacy. Solove’s (such as hardware keys) that mitigate the risk of
approach considers privacy violation as problematic data actions. However, data privacy
their data.
resulting from problematic data actions can mean many things, and can be subjective
pertaining to personal data, including: or contextual44. Broadly, privacy may be
• Aggregation: the gathering together of considered the right of individuals to selectively
information about an individual, which express themselves or be known. Data privacy
could be used to generate insights for entails a degree of control and influence over
reidentification or profiling43; personal data, including its use. It may therefore
be described as ‘the authorized, fair, and
• Identification: the linking of data (which
legitimate processing of personal information’45.
may otherwise be anonymised) to a
specific individual;

• Insecurity: the potential for data to be


accessed by an intruder due to glitches,
cybersecurity breach or intentional misuse
of information;

38 The British Academy and the Royal Society. 2017 Data management and use: Governance in the 21st century. See
https://royalsociety.org/-/media/policy/projects/data-governance/data-management-governance.pdf (accessed 28
July 2022).
39 Wolf LE 2018. Risks and Legal Protections in the World of Big-Data. Asia Pac J Health Law Ethics. 11, 1-15. https://www.
ncbi.nlm.nih.gov/pmc/articles/PMC6863510/
40 Jain P, Gyanchandani M, Khare N. 2016 Big data privacy: a technological perspective and review. Journal of Big Data
3, 25.
41 The British Academy and the Royal Society. 2017 Data management and use: Governance in the 21st century. See
https://royalsociety.org/-/media/policy/projects/data-governance/data-management-governance.pdf (accessed 28
July 2022).
42 The Israel Academy of Sciences and Humanities and The Royal Society. 2017 Israel-UK privacy and technology
workshop note of discussions. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/
(accessed 20 September).
43 This is distinct from aggregation across a population or group.
44 Nissenbaum H. 2010 Privacy In Context: Technology, Policy, and the Integrity of Social Life. Stanford: Stanford
Law Books.
45 Bhajaria N. 2022 Data privacy: A runbook for engineers. Shelter Island: Manning.

22 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter one

A specific definition of privacy may be less What are privacy enhancing technologies
useful than considering what privacy is for46 (PETs)?
and what is at stake by examining potential PETs are an emerging set of technologies
PETs are an
downstream harms. The loss of privacy may and approaches that enable the derivation of emerging set
also be considered intrinsically harmful to useful results from data without providing full of technologies
an individual. access to the data. In many cases, they are
and approaches
tools for controlling the likelihood of breach or
Data privacy, data protection and information disclosure. This potentially disruptive suite of that enable the
security tools could create new opportunities where derivation of
Data privacy is related to information security, the risks of using data currently outweigh the useful results
but there are important differences. Information benefits. PETs can reduce the threats typically
security focuses on external adversaries associated with collaboration48, motivating new
from data
and the prevention of undesired access to partnerships – for example, between otherwise without providing
information47. Security is a necessary condition competing organisations. For this reason, full access to
for data privacy, but privacy also entails the PETs have more recently been described as
the data.
legitimate and fair use of (secure) data. Data Partnership Enhancing Technologies49 and
security relates to protecting data as an asset, Trust Technologies50.
whereas data privacy is more concerned with
protecting people: ensuring the rights of data
subjects follow their data.

The unauthorised use of data shared for a


given purpose is loss of privacy (a violation of
intention). This suggests that data privacy tools
should address accountability and transparency
in data collection and use, in addition to helping
meet security requirements. Data protection, on
the other hand, refers to the legal safeguards
in place to ensure data rights are upheld while
data is collected, stored or processed.

46 Zimmermann C. 2022 Part 1: What is Privacy Engineering? The Privacy Blog. 10 May 2022. See https://the-privacy-
blog.eu/2022/05/10/part1-what-is-privacy-engineering/ (accessed 20 September 2022).
47 According to NIST, security is ‘[t]he protection of information and information systems from unauthorized access, use,
disclosure, disruption, modification, or destruction in order to provide confidentiality, integrity, and availability.’ National
Institute of Standards and Technology (Computer security resource center). See https://csrc.nist.gov/glossary/term/is
(accessed 20 September 2022).
48 World Economic Forum. 2019 The next generation of data-sharing in financial services: Using privacy enhancing
technologies to unlock new value. See https://www3.weforum.org/docs/WEF_Next_Gen_Data_Sharing_Financial_
Services.pdf (accessed 20 September 2022).
49 Lunar Ventures, Lundy-Bryan L. 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
50 Infocomm Media Development Authority (Singapore grows trust in the digital environment). See https://www.imda.gov.
sg/news-and-events/Media-Room/Media-Releases/2022/Singapore-grows-trust-in-the-digital-environment (accessed
5 June 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 23


Chapter one

The term Privacy Enhancing Technologies To demonstrate the interconnectedness of


originates in a 1995 report co-authored by risk factors and harms, the model shows both
the Information and Privacy Commissioner of practical elements that may result in harm, as
Ontario and the Dutch Data Protection Authority, well as downstream effects – including damage
which described technologies that allowed that can occur far outside the perceived
online transactions to remain anonymous51. system53. It is important to note that, while
Since then, PETs have evolved in different there are general trade-offs between privacy
fields with limited coordination, and there and utility, the relationship is rarely a simple or
is no consensus around a single definition linear one.
of PETs. This report follows the European
Union Agency for Cybersecurity (ENISA) Threats to privacy are not always external
definition of PETs: a group of technologies that to a data-holding institution. Internal actors
support data minimisation, anonymisation and may intentionally or unwittingly disclose
pseudonymisation as well as other privacy and personal data or other sensitive information.
security principles central to data protection52. Additionally, there is no simple one-to-one
mapping between an attack and the target
A downstream harms-based approach: (type of information release) or an outcome.
Taxonomy of harms Multiple attacks may be used in a sequence to
This report considers PETs beyond data security reveal information.
mitigation. However, a framework for data
protection and risk is useful in understanding The taxonomy is not an exhaustive list of all
the drivers of data governance decisions potential attacks and harms, but provides
(including reluctance to partner or share data). an illustrative tool designed to encourage a
harms‑based approach to data protection risks.
PETs can help prevent downstream harms
through bolstering data protection practices.
A taxonomy of harms (Figure 1) provides a
conceptual overview of how data might be
used or shared, alongside the harms that may
follow problematic data actions. It classifies
harms into domains (individual, organisation,
societal, national) and types (physical/
psychological, relational, reputational, personal,
economic, security).

51 Information and Privacy Commissioner of Ontario and Registratiekamer (Netherlands) 2008. Privacy-Enhancing
Technologies: The Path to Anonymity. Volume 1.
52 European Union Agency for Cybersecurity (Data Protection: Privacy enhancing technologies). See https://www.enisa.
europa.eu/topics/data-protection/privacy-enhancing-technologies (accessed 20 September 2022).
53 National Institute of Standards and Technology (NIST Privacy Engineering Objectives and Risk Model Discussion
Draft). See https://www.nist.gov/system/files/documents/itl/csd/nist_privacy_engr_objectives_risk_model_discussion_
draft.pdf (accessed 20 September 2022).

24 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter one

Recent international developments in PETs In June 2022 Singapore’s Minister for


Beyond data security applications, PETs are Communications and Information launched
Beyond data
gaining attention for their role in facilitating the new Digital Trust Centre, which will
data use across national borders. In 2019, lead research and development in ‘Trust security
the World Economic Forum published a Technologies’, including PETs and explainable applications,
comprehensive review of PETs in financial artificial intelligence57.
PETs are gaining
services, a sector that is among the most
cited in emerging PETs uptake54. In 2020 The Also in June 2022, the US Office for Science attention for their
Organisation for Economic Cooperation and and Technology Policy and DCMS in the role in facilitating
Development (OECD) recommended data UK launched a joint PETs prize challenge to data use across
sharing arrangements that use technological accelerate the adoption of PETs as tools for
national borders.
access controls, such as PETs, in guidance on democracy58. Both governments are working
cross-border data flows and international trade. closely with NIST (US) and the US National
For international data use, they suggest PETs Science Foundation in developing the
may be complemented with ‘legally binding challenge. The transatlantic initiative is deemed
and enforceable obligations to protect the an ‘expression of our shared vision: a world
rights and interests of data subjects and other where our technologies reflect our values and
stakeholders55. innovation opens the door to solutions that
make us more secure’59.
In January 2022, the United Nations Committee
of Experts on Big Data and Data Science
for Official Statistics launched a pilot PET
lab programme, which aims to enhance
international data use with PETs56. The UN
PET Lab is currently working with four National
Statistical Offices (NSOs) and collaborating with
PETs providers to safely experiment with PETs
and identify barriers to their implementation.

54 World Economic Forum. 2019 The Next Generation of Data-Sharing in Financial Services: Using Privacy Enhancing
Techniques to Unlock New Value). See https://www3.weforum.org/docs/WEF_Next_Gen_Data_Sharing_Financial_
Services.pdf (accessed 20 September 2022).
55 Organisation for Economic Co-operation and Development (Recommendation of the Council on Enhancing Access
to and Sharing of Data). See https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0463 (accessed 20
September 2022).
56 Hurst A. 2022 UN launches privacy lab pilot to unlock cross-border data sharing benefits. Information Age. 25
January 2022. See https://www.information-age.com/un-launches-privacy-lab-pilot-to-unlock-cross-border-data-
sharing-benefits-19414/ (accessed 20 March 2022).
57 Infocomm Media Development Authority (Singapore grows trust in the digital environment). See https://www.imda.gov.
sg/news-and-events/Media-Room/Media-Releases/2022/Singapore-grows-trust-in-the-digital-environment (accessed
5 June 2022).
58 HM Government (U.K. and U.S. governments collaborate on prize challenges to accelerate development and
adoption of privacy-enhancing technologies). See https://www.gov.uk/government/news/uk-and-us-governments-
collaborate-on-prize-challenges-to-accelerate-development-and-adoption-of-privacy-enhancing-technologies
(accessed 13 June 2022).
59 Ibid.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 25


Chapter one

FIGURE 1

Taxonomy of harms

Insider threats
Security
(data disclosed or reused for intended or
violation
unintended purposes)

INTERNAL TRUST BOUNDARY

Privacy Data is held

Data is used
By others for processing and analytics

Information is released as:

Anonymised Aggregated Machine learning External code is


Utility dataset(s) statistics model executed on model

EXTERNAL TRUST BOUNDARY

Data Poisoning/classifier
Model inversion/
De-anonymisation: reconstruction influence/trojan:
reconstruction
EXAMPLE individuals’ dataset is altered to
attack: the data used
ATTACKS identities are disrupt the robustness
Tracing to train a model is
revealed or integrity, warping
attack reconstructed
outcomes

Personal information, Unintended


Sensitive information Personal information
especially related disclosure of
EXAMPLE of individuals is is used for profiling
to protected information, eg
OUTCOMES released or widely and predictive
categories, is biproduct of training
known purposes
revealed neural networks

DOWNSTREAM HARMS

26 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter one

DOMAIN OF HARM

INDIVIDUAL ORGANISATIONAL SOCIETAL NATIONAL

Democratic processes Security


Identity theft Loss of revenue
are undermined compromised

Detriment to National intelligence


Financial harm Loss of profits
public services is breached

Discrimination Less trust in


Discrimination Operations disrupted
against groups justice system

Punitive damages Less trust in Less trust in


Loss of life
(legal) research/industry democracy

Loss of competitive Less willingness to


Anxiety / worry
advantage share data or participate

Embarrassment Damaged reputation

Wrongful accusation Damaged trust


(eg of illegal activity) relationship (with
customers, research
participants)
Detriment to personal
reputation/public
perception

KEY

Physical/ Economic Reputational Relational Security Personal


Psychological

Source: Royal Society meetings with Working Group for Privacy Enhancing Technologies, November 2021 and April 2022.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 27


Chapter one

Interest in PETs for international data transfer As well as online voting (see Use case 5.3,
and use page 95), PRIViLEDGE developed a number
A fragmented array of legal requirements of toolkits and prototypes61, including privacy-
covers data use across the globe. As of preserving data storage using ledgers (data
March 2022, there are 157 countries with data residing on a blockchain) and secure multi-party
protection laws, entailing various stipulations computation (SMPC) on distributed ledgers,
for data transfer and use60. PETs can provide which allows two or more parties to compute
means for secure collaboration across borders, using a ledger as a communication channel.
preventing unauthorised access to datasets; Many of these resources have been opened
however, data use is still subject to local legal further development.
requirements. PETs do not provide ‘loopholes’
to data protection laws in the UK. Rather, State-level collaborations to accelerate PETs
PETs can be used as tools to help data users include the Digital Trust Centre (DTC), launched
comply with regulatory requirements, such in 2022 in Singapore62, 63. The DTC is set
as anonymisation. While this report refers to lead Singapore’s efforts in research and
primarily to current UK GDPR, it restricts legal development for ‘trust technologies’, such as
commentary to high-level observations, noting PETs, which provide solutions for data sharing
ongoing data reform in the UK and international and evaluation of trustworthy AI systems. This
relevance of PETs in other jurisdictions. national effort includes sandbox environments,
academic-enterprise partnerships and national
Accelerating PETs development: Sprints, and international collaborations between
challenges and international collaboration research institutes. As a founding member of
Other PETs development initiatives include the Global Partnership for AI (GPAI), Singapore
the PRIViLEDGE project, funded by Horizon intends to use this platform to enhance its
Europe between 2017 and 2021. The project contributions to GPAI.
aimed to develop cryptographic protocols in
support of privacy, anonymity and efficient These initiatives have the potential to drive
decentralised consensus using distributed innovation and are raising the profile of PETs for
ledger technologies (DLTs). privacy, partnership and trust. This will be key
in motivating new users and creating a wider
marketplace for PETs. The following section
focuses on the UK public sector, describing
enabling factors and barriers in the adoption
of PETs.

60 Greenleaf G. 2022 Now 157 Countries: Twelve Data Privacy Laws in 2021/22. Privacy Laws & Business International
Report 1, 3—8. See https://ssrn.com/abstract=4137418 (accessed 24 May 2022).
61 Livin L. 2021 Achievements of the priviledge project. Priviledge blog. 30 June 2021. See https://priviledge-project.eu/
news/achievements-of-the-priviledge-project (accessed 30 June 2022).
62 Infocomm Media Development Authority (Singapore grows trust in the digital environment). See https://www.imda.gov.
sg/news-and-events/Media-Room/Media-Releases/2022/Singapore-grows-trust-in-the-digital-environment (accessed
5 June 2022).
63 The DTC will serve as implementation partner for an international collaboration between the Centre of Expertise of
Montreal for the Advancement of Artificial Intelligence (CEIMIA) and the Infocomm Media Development Authority
(IMDA) in Singapore. This partnership seeks to develop solutions to demonstrate how PETs can help organisations
leverage cross-institution and cross-border data.

28 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter one

BOX 1

PETs in financial services

A series of challenges, technology sprints and collaborative projects have propelled the
development of PETs in financial services. The World Economic Forum has outlined potential
uses for PETs in determining creditworthiness, identifying collusion, or flagging fraudulent
transactions between multiple banks64. Financial information sharing is key in tackling financial
crime, which amounts to around $1.6 trillion annually (between 2-5% of the global GDP). This
requires collaboration and data sharing in a way that safeguards client data, adheres to legal
requirements and does not compromise competitive advantage of banking institutions.

In the UK, the Financial Conduct Authority • Using secure multi‑party computation to
(FCA) explored potential use cases for PETs uncover patterns of suspicious transactions
such as secure multi-party computation in across networks involving multiple banking
enabling data-based financial crime detection institutions, or to highlight transactional
and prevention, launching a TechSprint on mismatches in risky categories, such as
Global Anti-Money Laundering and Financial account names;
Crime in July 201965, 66.
• Using federated learning to improve risk
assessment between multiple banks by
This event included over 140 active
enabling sharing of typologies;
participants, and concluded with ten proofs
of concept, including: • Using pseudonymised and hashed
• Using homomorphic encryption to enable customer data to enable sharing and cross-
banks to share and analyse sensitive referencing, to highlight potential areas of
information in order to uncover money- concern or for further investigation.
laundering networks, or to support the
These demonstrations illustrate how PETs
identification of existing and new financial
can be used for a particular end goal: to
crime typologies, or to allow banks to
identify criminal behaviour in order to target
distinguish good from bad actors through
enforcement action. While this use case
question-and-answer when onboarding
is applauded by those working to tackle
new clients;
financial crime, it is worth considering how the
same methods might be used for surveillance
of other behaviours (for example, to profile
customers for targeted advertisements,
or for enhanced credit scoring).

64 World Economic Forum. 2019 The Next Generation of Data-Sharing in Financial Services: Using Privacy Enhancing
Techniques to Unlock New Value). See https://www3.weforum.org/docs/WEF_Next_Gen_Data_Sharing_Financial_
Services.pdf (accessed 20 September 2022).
65 Financial Conduct Authority (2019 Global AML and Financial Crime TechSprint). See https://www.fca.org.uk/events/
techsprints/2019-global-aml-and-financial-crime-techsprint (accessed 20 September 2022).
66 Cook N. 2019 It takes a network to defeat a network: tech in the fight against financial crime. Royal Society blog. 19
September 2022. See https://royalsociety.org/blog/2019/09/it-takes-a-network-to-defeat-a-network/ (accessed 16
February 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 29


30 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT
Chapter two
Building the PETs marketplace

Left
© iStock / PeopleImages.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 31


Chapter two

Building the PETs marketplace


As highlighted in market research PETs for compliance and privacy
commissioned by the Royal Society and CDEI, Neither EU nor UK data protection regulation
the market for PETs is nascent67. However, explicitly mention PETs (nor ‘privacy’). However,
a growing number of documented examples compliance with data protection law is a
demonstrate PETs already being used in a substantial motivating factor for organisations
range of contexts68, with a substantial number using data protection approaches. One
of large organisations expected to use one investment firm contends that the EU GDPR
or more privacy-enhanced computation has ‘created the enterprise privacy market’71.
techniques by 2025, particularly in secure cloud Data processors want to understand how
infrastructures69. In addition to safeguarding PETs can help them in compliance (particularly
personal data (which is required by data where data analysis is a weakness in the
protection legislation), PETs are increasingly data lifecycle).
used wherever data is sufficiently valuable
(for example, where data is tied to intellectual While privacy challenges are risk-related,
property or natural resource management). they are not always assessed as commercial
problems72, particularly where the use of
PETs are rapidly evolving through private data is not commercially motivated (or where
enterprise, as well as significant third sector data use is altogether optional). Many data-
and open initiatives. The development of holding organisations already use secure
the technology is thus greater than might be cloud services and analytics by default, and
expected, given the modest size of the PETs PETs are unlikely to be more cost-effective
market70. While this chapter explores the UK security tools in the near-term. In the wider
public sector market for PETs, it does not fully marketplace, collaborative analysis may
consider how PETs might shape future digital provide the most compelling business case for
and data markets at large. In some cases, PETs these technologies.
negate the need to make copies of datasets,
allowing data holders to provide insights as-a-
service and potentially disincentivising open
data approaches. Considering the potentially
disruptive nature of PETs in this way, further
research is required to understand the full
implications of PETs in digital and data markets.

67 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
68 Centre for Data Ethics and Innovation (Privacy Enhancing Technologies Adoption Guide). See https://cdeiuk.github.io/
pets-adoption-guide/ (accessed 20 September 2022).
69 Gartner (Gartner identifies the top strategic technology trends for 2022). See https://www.gartner.com/en/newsroom/
press-releases/2021-10-18-gartner-identifies-the-top-strategic-technology-trends-for-2022 (accessed 20 September
2022). Note that in Gartner’s analysis PETs are defined similarly to this report.
70 Lunar Ventures (Lundy-Bryan L). 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
71 Ibid.
72 Ibid.

32 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter two

PETs in collaborative analysis Data availability and access is a priority for


Collaborative analysis (including collaborative public sector bodies with remit to use data
computing73 and collaborative learning74) is a for public benefit, provision of services or to
growing area of interest in PETs applications. provide digital functions. For example, the
Researchers requiring data to generate insights, Greater London Authority’s London Datastore
or to ‘fuel’ machine learning and other AI is designed to proactively link data assets
applications, can leverage PETs to establish to generate insights. Likewise, DataLoch –
data partnerships – effectively augmenting a service developed between the University
the data available to them. For example, of Edinburgh and NHS Lothian – aims to
organisations with a mandate to use data for encourage ‘non-typical researchers’, such
public good are using PETs to make in-house as charitable organisations, to use in-house
data usable for external analysts75; cross-sector health and social care data for the region of
partnerships between crime agencies and South-East Scotland. In interviews, PETs for
human rights NGOs involves the pooling of collaborative analysis were seen by such public
datasets for analysis without revealing their sector bodies as possible methods for reaching
contents to one another76, enabling efficient, these aims; however, no examples of this
collective intelligence between analysts who do application of PETs were identified by the UK
not see the original data. organisations interviewed77.

73 Ibid.
74 Melis L, Song C, De Cristofaro E, Shmatikov V. 2018 Inference attacks against collaborative learning. Preprint. See
https://www.researchgate.net/publication/325074745_Inference_Attacks_Against_Collaborative_Learning (accessed
20 September 2022).
75 See Use case 1.1, page 57.
76 See Use case 6, page 97.
77 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 33


Chapter two

Legal and technical friction points prevent timely For these reasons, collaborative analysis has
and straightforward access to public sector been predicted by one firm as the largest
data, limiting its value as a public resource. new technology market to develop in the
PETs that allow the sending or processing of current decade80. Cloud services are one
datasets internationally could be key to realising substantial market already being impacted
the value of data use across institutions through the widespread use of Trusted
and borders, which has been estimated to Execution Environments (TEEs), which allow
be between $3-5 trillion USD annually78. for data processing and analysis in a secure
Governments and data-holding organisations environment with restricted access81. TEEs
are beginning to understand this value in terms can provide an application domain for SMPC,
of both economic and social benefits, and are enabling collaborative analysis of confidential
seeking technology-based tools to enable datasets82. Given its role in secure and
collaboration79. The same PETs could also collaborative analysis, confidential cloud could
enhance data use across departments within an be an area of significant market growth in the
organisation, whether for reuse or when subject near future83, 84.
to further restrictions (as with International Traffic
in Arms Regulations compliance in the US).

78 McKinsey. 2013 Collaborating for the common good: Navigating public-private data partnerships. See https://
www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/collaborating-for-the-common-
good#:~:text=Overall%2C%20McKinsey%20estimates%20that%20connecting (accessed 18 July 2022).
79 World Economic Forum. 2019 The Next Generation of Data-Sharing in Financial Services: Using Privacy Enhancing
Techniques to Unlock New Value). See https://www3.weforum.org/docs/WEF_Next_Gen_Data_Sharing_Financial_
Services.pdf (accessed 20 September 2022).
80 Lunar Ventures (Lundy-Bryan L.) 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
81 Gartner (Gartner Top Strategic Technology Trends for 2021). See https://www.gartner.com/smarterwithgartner/gartner-
top-strategic-technology-trends-for-2021 (accessed 26 September 2022).
82 Geppert T, Deml S, Sturzenegger D, Ebert N. 2022 Trusted Execution Environments: Applications and Organizational
Challenges. Front. Comput. Sci. 4 (https://doi.org/10.3389/fcomp.2022.930741)
83 Gartner (Gartner Top Strategic Technology Trends for 2021). See https://www.gartner.com/smarterwithgartner/gartner-
top-strategic-technology-trends-for-2021 (accessed 26 September 2022).
84 The Confidential Computing Consortium, which is run by the Linux Foundation, is promoting the use of TEEs in cloud
services internationally. The Consortium includes every large cloud provider (Alibaba, Baidu, Google Clous, Microsoft,
Tencent), demonstrating confidential computing as a priority to leaders in digital technology. Confidential Computing
Consortium Defining and Enabling Confidential Computing (Overview). See https://confidentialcomputing.io/wp-
content/uploads/sites/85/2019/12/CCC_Overview.pdf (accessed 15 March 2022).

34 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter two

Barriers to PETs adoption: User awareness A lack of understanding around PETs within
and understanding in the UK public sector wider data protection requirements means
A number of barriers prevent the widespread stakeholders are hesitant to adopt them88. For
use of PETs for data protection and example, anonymised personal data is not
collaborative data analysis in the UK public subject to the principles of data protection
sector. The first obstacle is general knowledge requirements detailed in the UK GDPR or
and awareness of PETs, their benefits and EU GDPR89, 90; however, in the UK, there is
potential use cases85, 86. Researchers and no universal test of anonymity. Technology-
analysts are often familiar with traditional specific guidance may be useful in interpreting
privacy techniques (such as anonymisation, requirements and best practices in emerging
pseudonymisation, encryption and data technologies, for example, how archived
minimisation); for some, it is unclear what PETs synthetic data should be handled91. Currently,
can add to these approaches. organisations must turn to assessments by
internal or external parties for guidance. These
PETs that enable collaborative analysis include uncertainties lead to a culture of risk-aversion
some of the most technically complex and described by some UK public bodies92.
least used to date (such as secure multi‑party Without assurance or technical standards,
computation and federated learning). While some question the genuine security PETs
PETs may be some of the most promising, offer, particularly where privacy threats and
the risk inherent to using new and poorly adversaries are undefined or hypothetical93.
understood technologies is a strong
disincentive to adoption: few organisations,
particularly in the public sector, are prepared to
experiment with privacy87.

85 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
86 Lunar Ventures (Lundy-Bryan L.) 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
87 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
88 Ibid.
89 Information Commissioner’s Office (What is personal data?). See https://ico.org.uk/for-organisations/guide-to-data-
protection/guide-to-the-general-data-protection-regulation-gdpr/key-definitions/what-is-personal-data/#:~:text=If%20
personal%20data%20can%20be,subject%20to%20the%20UK%20GDPR (accessed 20 September 2022).
90 GDPR Info (EU GDPR Recital 26). See https://gdpr-info.eu/recitals/no-26/ (accessed 20 September 2022).
91 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
92 Ibid.
93 Ibid.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 35


Chapter two

Where organisations are unable to assess Other barriers are institutional in nature. For
privacy trade-offs for a given PET or application, example, where technical expertise does
Where
cost-benefit analysis becomes impractical. As exist in-house, these individuals are often
organisations are a result, the PETs value proposition remains organisationally removed from decision-
unable to assess speculative and the business case for adopting makers97. Foundational data governance
PETs is unclear. Demonstrations are needed issues, such as data quality and interoperability,
privacy trade-offs
to establish the potential benefit of PETs, for are primary concerns for many organisations
for a given PET example, through case studies that include and, as such new, unknown technologies are
or application, cost-benefit analyses94. The use cases and deprioritised. Compute power is also a practical
cost-benefit examples in Chapter Four (page 56) provide limiting factor, particularly with energy-intensive
a starting point for such an approach. approaches such as homomorphic encryption98.
analysis becomes
impractical. According to those interviewed, market Barriers to PETs adoption: Vendors and
confidence could be enhanced through expertise
better data readiness and the development The development of PETs requires a deep
of standards (Chapter Three)95. PETs are understanding of cryptography. However, unlike
subject to relevant legal frameworks and other computing-related fields (such as software
existing regulators, such as the ICO in the UK. engineering), the cutting edge of cryptography
However, they are not specifically regulated as remains largely in academia. This leads to
technologies, and their efficacy is ‘illegible’ to a gap between cryptography expertise and
non-experts. Standards could be followed by market drivers, such as cost and convenience.
assurance and certifications. Implementation As a result, theoretical cryptography ‘risks
frameworks for PETs would allow some over-serving the market on security’99.
elements of decision-making to be outsourced, Bridging the gap between cryptography
although additional expertise will likely be talent and entrepreneurs could create viable
required in practice96. PETs vendors.

94 Ibid.
95 Ibid.
96 Ibid.
97 Ibid.
98 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
99 Lunar Ventures (Lundy-Bryan L.) 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).

36 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter two

Professional certifications and online courses Today, businesses are incentivised to


for privacy professionals could integrate a PETs accumulate data for exclusive use. PETs may
primer into existing courses to raise awareness engender new business models, for example
and expertise in the profession. For example, data or analytics as-a-service. This could entail
the Alliance for Data Science Professionals100, a data-holding organisation allowing clients to
which defines standards to ensure ethical and query or run analyses on in-house datasets.
well-governed data use, could consider PETs in This could be done using PETs that do not
designing standards around data stewardship reveal the data, only the insights or solutions
and analysis. gathered from the query or analysis. Data is
not transferred and remains unseen by the
Modules on general and specific PETs are external client.
appearing in university syllabuses, particularly
at the postgraduate study level. Several of the In this way, PETs may enable a shift from data
universities within the Academic Centres of sharing (through agreements or otherwise)
Excellence in Cyber Security Research have to a dynamic data processing and analytics
a focus on privacy, and PETs and privacy is a market103, such as through ‘commissioned
remit of the doctoral training. In more informal analyses’104. It will be important to consider this
education, online courses are starting to appear potential shift and incentivise organisations
such as OpenMined’s ‘Our Privacy Opportunity’ to utilise PETs for collaboration, rather than
‘Foundations of Private Computation’ and data gatekeeping.
‘Introduction to Remote Data Science’101. These
can go a long way in raising general awareness
and inspiring use cases.

Conclusions
A flourishing PETs market will require both trust
in the technology and users’ ability to discern
appropriate applications. PETs vendors can
help address scepticism by integrating PETs
in wider data governance approaches, rather
than promoting one-size-fits-all solutions. Where
public sentiment around the use of PETs is
unknown, further research – including focus
groups or public dialogues – could be used
toward ensuring end-user acceptance of (and
demand for) the technologies102.

100 British Computing Society (The Alliance for Data Science Professionals: Memorandum of Understanding July 2021).
See https://www.bcs.org/media/7536/alliance-data-science-mou.pdf (accessed 2 September 2022).
101 OpenMined (The Private AI Series). See https://courses.openmined.org/ (accessed 7 October 2022).
102 The Royal Society. Creating trusted and resilient data systems: The public perspective. (to be published online
in 2023)
103 Lunar Ventures (Lundy-Bryan L.) 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
104 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 37


Chapter two

TABLE 2

PETs described in this report and their function with regard to security and collaborative analysis105.

Homomorphic encryption Trusted Execution Environments (TEEs)

What does this PET do? Allows the use, or analysis, of encrypted data Allows data to be used or analysed within a
without decrypting it. secure, isolated environment.

In what circumstances would it be To create meaningful insights in computation When data needs to be stored securely, or to
used? without revealing the contents of a dataset generate insights from data without revealing
to those running the analysis (which could be the dataset to party running the analysis or
done by a trusted third-party). hosting the TEE.

Whose data is being protected and The data held by the institution running the The data held by the institution running the
from whom? computation is being protected from whoever research can only be decrypted and used
runs the analysis, whether a third-party or the within the TEE, and only used by approved
institution themselves. If the third-party were code. The TEE is protected from outside
to act in bad faith, they would not have access environment, including the operating system
to the data in question. and admin users.

Whose interests are being The data controller They have an interest to The data controller They have an interest to
protected and what are they? carry out their computation in the safest and carry out their research in the safest and most
most effective way possible. effective way possible.

The data subjects Those who the data is The data subjects Those who the data is
about have an interest in making sure their about have an interest in making sure their
data is not accessed by bad actors. data is not accessed by bad actors.

Relevance to security and Security Data is protected from unauthorised Security Data is protected from unauthorised
collaborative analysis access. access.

105 Modified from Hattusia 2022 The current state of assurance in establishing trust in PETs.
The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/ (accessed 20 September 2022).
106 A type of HE called ‘multi-key FHE’ can perform a similar function: several parties each have a secret key and can encrypt their own data, which is
sent to a trusted third party who for computation. The result can be decrypted by all parties who contributed data to the process.

38 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter two

Secure multi-party computation (SMPC) Differential privacy Federated l earning

This allows multiple parties to run analysis Mostly for use with large data sets, DP This allows for the training of an algorithm
on their combined data, without revealing allows institutions to reveal data or derived across multiple devices or datasets held on
the contents of the data to each other106. information to others without revealing servers.
sensitive information about the groups or
individuals represented in the data set.

Removes the need for a trusted central An institution may want to share analytical An organisation wants to train a machine
authority that would have access insights that they have derived from learning model, but has limited training
to everyone’s data. Rather, multiple their data with another group or with the data available. They ‘send’ the model to
organisations can keep their data sets public, but their data set contains sensitive remote datasets for training; the model
private from each other, but still run joint information which should be kept private. returns having benefitted from those
analysis on the combined data. datasets.

Each collaborating organisation holds Sensitive information about the groups or Each collaborating organisation holds
data about individuals (or other sensitive individuals present in the dataset is being data about individuals (or other sensitive
data), and that data is protected from those protected from whoever the data is being data) and that data is protected from those
collaborating on analysis. The data also is shared with or analysed by, whether that’s collaborating on analysis. Only the trained
protected from any potential misconduct or a trusted third-party, the general public, or model is exchanged.
incompetence from any of the parties. the institution themselves.

The collaborating organisations They The data controller They have an interest The collaborating organisations They
have an interest to carry out their research to carry out their research and share have an interest to carry out their research
in the safest and most effective way data in the safest and most effective way in the safest and most effective way
possible. possible. possible.

The data subjects Those who the data is


The data subjects Those who the data is The data subjects Those who the data is
about have an interest in making sure their
about have an interest in making sure their about have an interest in making sure their
data is not accessed by bad actors.
data is not accessed by bad actors. data is not accessed by bad actors.

Security Data is protected from Security Data is protected from Security Data is protected from
unauthorised access. unauthorised access. unauthorised access.

Collaborative analysis Multiple parties Collaborative analysis There is potential Collaborative analysis Federated learning
can work on datasets held by parties of for open access to the data without is also called collaborative learning;
‘mutual distrust’; the data remains safe from revealing the presence or attributes of multiple parties are required.
unwarranted interference. individuals.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 39


40 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT
Chapter three
Standards, assessments
and assurance in PETs

Left
© iStock / olaser.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 41


Chapter three

Standards, assessments and


assurance in PETs
The Royal Society’s 2019 report, Protecting Trust in privacy systems is similarly twofold
privacy in practice suggested a system of (see Table 3):
PETs are
standards and certification for PETs may • Trust that the PET will be used in a way
generally best provide a pathway for assurance, leading to that protects the rights of the data subject
used in a systems wider adoption of the technologies. Similar (moral) and;
approach to initiatives have shaped the development and
• Trust in the technical ability of PET as a
uptake of emerging technologies (such as
data privacy cybersecurity products) and global information
security tool (competence).
by addressing sharing platforms (as with the protocols that Currently, only technical standards exist for
the twin goals continue to enable the internet). However, PETs (and these are few). These pertain to
PETs are unlike cybersecurity in that they the technical capabilities of PETs in achieving
of compliance
address highly contextual, often intersectional, security (trust in competency). The following
and trust. privacy concerns107. sections explore data privacy frameworks,
technical standards and assurances in fostering
This chapter reviews the role of trust and the rapid and responsible use of PETs.
assurance in PETs implementation108. The
review finds that, given their current state PETs and assurance: The role of standards
of maturation, PETs are generally best used Assurance in new technologies takes many
in a systems approach to data privacy by forms. Certifications, Kitemarks and other formal
addressing the twin goals of compliance and guarantees for products are perhaps most
trust109. Compliance is adherence to legal and well-known. These official marks of assurance
statutory obligations (such as the UK GDPR) to require external audit based on formal
avoid penalties, while trust enables data flows standards, which set out requirements for a
and collaboration. product or system.

The 2020 Edelman Trust Barometer110 identified Global standards have been effective in
two types of trust: cybersecurity and privacy; likewise, encryption-
• Moral – the trustor believes the trustee can based PETs may rely on encryption standards.
articulate and act on the best interests of Similar approaches may be feasible where
the trustor and; risk of disclosure is quantifiable, such as with
differential privacy.
• Competence – the trustor believes the
trustee has the ability to deliver on what has
been agreed.

107 Hattusia 2022 The current state of assurance in establishing trust in PETs. The Royal Society. See https://royalsociety.
org/topics-policy/projects/privacy-enhancing-technologies/ See also Table 3.
108 Ibid.
109 Zimmermann C. 2022 Part 1: What is Privacy Engineering? The Privacy Blog. 10 May 2022. See https://the-privacy-
blog.eu/2022/05/10/part1-what-is-privacy-engineering/ (accessed 20 September 2022).
110 Edelman (2020 Trust Barometer). See https://www.edelman.com/trust/2020-trust-barometer (accessed 15
February 2022).

42 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter three

TABLE 3

Assurances and trust relationships in the use of PETs in privacy-preserving data governance systems.

Trustors Trustees Moral trustworthiness Trust in competence Assurances needed

PETs users The technology itself; Have the executives or Will the PET fulfil its Technical assurance
(eg, engineers or collaborators; external PETs vendors prescribed expected technical Technological
data scientists) actors; organisation’s the right PET for the function? specifications
executives (decision- application, such that it demonstrating the PET
Will the data remain
makers) or PETs vendors functions in a privacy- will function as intended.
secure from outside
(if using). preserving way?
actors who want access Assurance in the
to it? application
The use of the PET is
appropriate for the given
use case; the PET is part
of wider responsible data
governance.

Executives and PETs users; PETs vendors; N/A Are the developers Technical assurance
PETs vendors PETs developers; the competent in delivering Professional qualifications
(those technology itself. a fit-for-purpose detailing the PET user’s
‘diagnosing’ technology? ability.
use cases and
Will the PET fulfil its Technical assurance
deploying PETs)
expected function? Technological
specifications
demonstrating the PET
will function as intended.

Data subjects The data governance Will personal data be Will data remain safe Assurance in the
(the people whom ecosystem of used in accordance with from interference from application
the data is about) organisations that collect intent, and not lead to unauthorised users? The PET is used as part
and use their data increased surveillance of wider responsible data
and exploitation? governance.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 43


Chapter three

A standardisation of approach to PETs will be A popular privacy framework approach entails:


essential in: 1. mapping of information flows.
• Developing higher-level guidance for ‘best
2. conducting a privacy risk assessment (or
practice’ and codes of conduct;
‘privacy impact assessment’).
• facilitating the early phases of PETs adoption;
3. strategising to manage identified risks.
• incorporating PETs into privacy frameworks
Frameworks do not prescribe methods or
and impact assessments in an informed and
technologies for implementation; rather,
responsible manner.
the implementer may decide to use
The National Institute of Standards and classic and emerging PETs to fulfil the
Technology (NIST) also highlight the need framework requirements.
for technical standards111. NIST promotes
the standardisation of technologies that Existing standards, guidance and frameworks
underpin PETs (such as secret-sharing and that address privacy systems are highlighted in
encryption regimes), alongside a guidance- Table 4.
based approach to the standardised use of
PETs themselves. The pathway to PETs standards
Standards for PETs are being developed
Process standards for data protection through a range of international, national and
Process standards can be used to assist in sector-specific SDOs. In addition, there is
compliance with data protection law and an emergence of open standards initiatives.
general privacy protection. Privacy frameworks These initiatives seek to make standards
are one example; these are built around a on PETs accessible by anyone and can
set of questions or controls: points that must entail a collaborative approach to standards
be considered and addressed in building development, involving community-led groups
an effective system. This structure allows and stakeholders from government, industry
frameworks to specifically address data and academia. There is a growing movement
protection laws, such as the UK GDPR. for this standardisation approach, particularly
within emerging technologies. An example of
this is the UK’s AI Standards Hub which aims to
create practical tools and standards to improve
the governance of AI112.

111 National Institute of Standards and Technology (Roadmap to the Privacy Framework). See https://www.nist.gov/
privacy-framework/roadmap (accessed 15 March 2022).
112 The AI Standards Hub is led by the Alan Turing Institute with support from the British Standards Institution and the
National Physical Laboratory. HM Government (New UK initiative to shape global standards for Artificial Intelligence).
See https://www.gov.uk/government/news/new-uk-initiative-to-shape-global-standards-for-artificial-intelligence
(accessed 19 March 2022).

44 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter three

BOX 2

Lessons from standardisation: Open standards and the internet

The internet operates smoothly thanks to Open standards can be led by


consensus-driven protocols that continue technologists, who know what is technically
to be developed by a vast community of possible and can propose standards
technologists. The Internet Engineering Task to adapt and meet new legal or other
Force (IETF) is an informal, volunteer-led requirements. They may also benefit from
group that serves as the standards body for additional inputs from other stakeholders. In
the Internet. The IETF has played a critical being ‘open’, standards are made available
role in the development of the internet for anyone who wishes to use them.
without a formal, centralised standards Innovators can then use these protocols
body. They developed such inter-domain in the development of new technology;
standards such as HTTP (HyperText Transfer assurance against such standards
Protocol) and TCP (Transmission Control becomes a marketable added value to
Protocol), allowing users to access the same such organisations.
internet and transfer data around the world.
The development of open standards in
PETs will be crucial in ensuring PETs work
for everyone by allowing for the global and
interoperable use of data.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 45


Chapter three

TABLE 4

Example standards and guidance relevant to data privacy

Standards
development
Name Number organisation Date published

ISO/IEC 29100:2011/ ISO and IEC June 2018


Information technology – Security techniques – Privacy framework
AMD 1:2018

ISO/IEC 29101:2018 ISO and IEC Nov 2018


Information technology – Security techniques – Privacy architecture
framework

ISO/IEC 27001:2013 ISO and IEC Oct 2013, will be


Information technology – Security techniques – Information replaced by ISO/IEC
security management systems – Requirements FDIS 27001 (under
development)

Information security, cybersecurity and privacy protection – ISO/IEC 27002:2022 ISO and IEC Feb 2022
Information security controls

Security techniques – Extension to ISO/IEC 27001 and ISO/IEC ISO/IEC 27701:2019 ISO and IEC Aug 2019
27002 for privacy information management – Requirements and
guidelines

ISO/IEC 29190:2015 ISO and IEC Aug 2015, reviewed


Information technology – Security techniques – Privacy capability
2021
assessment model

BS 10012:2017+A1:2018 BSI Jul 2018


Data protection – Specification for a personal information
management system

IEEE 7002-2022 IEEE Apr 2022

IEEE Standard for Data Privacy Process

ISO/IEC 20889:2018 ISO and IEC Nov 2018


Privacy enhancing data de-identification terminology and
classification of techniques

ISO/IEC DIS 27559 ISO and IEC TBC


Privacy enhancing data de-identification framework

Anonymisation, pseudonymisation and privacy enhancing ICO TBC


technologies guidance

Information technology – Security techniques – Code of practice for ISO/IEC 29151:2017 ISO and IEC Aug 2017
personally identifiable information protection

NISTIR 8053 NIST Oct 2015


De-Identification of Personal Information

113 See for example the Professional Evaluation and Certification board training courses https://pecb.com/en/education-and-certification-for-individuals.

46 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter three

Training
available Description Reference to PETs

Certificate Privacy framework for Personal Identifiable


Information (PII) use

Focus on ICT systems for PII PETs used as privacy controls; refers to PETs ‘such as
pseudonymization, anonymization or secret sharing’. Briefly
mentions HE in regards to encryption.

Certificate Cyber security focussed standard with related


standards that include guidance for auditing.

Courses available113 Includes reference materials for security


controls and implementation guidance, used
regularly in conjunction with ISO/IEC 27001.

Certificate Guidance for Privacy Information Management


Systems (PIMS), building on ISO 27001

Provides high-level guidance for organisations


to assess their management of privacy-related
processes.

Yes, no certificate Guidance for PIMS with specific application


to UK law (also a mapping to ISO/IEC 27701
exists). Training covered under GDPR
implementer/Self Assessor training.

Requirements for a systems/software ‘Organizations should also put in place policies on the
engineering for privacy following: Privacy enhancing technologies and techniques:
Which technologies the organization uses, and how and when
these technologies are used.’

Description of privacy enhancing data de- Content on homomorphic encryption, differential privacy and
identification techniques and measures to be synthetic data.
used in accordance with ISO/IEC 29100.

Framework for identifying and mitigating re-


identification risks, building on ISO/IEC 20889.

Upcoming guidance on anonymisation and Forthcoming.


PETs, suggests motivated intruder tests.

Information security guidelines specifically for Recommends to ‘consider whether, and which, privacy
PII. enhancing technologies (PETs) may be used.’

Guidance on de-identification, suggests Suggestions of use of differential privacy and synthetic data.
motivated intruder tests.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 47


Chapter three

TABLE 4 (continued)

Example standards and guidance relevant to data privacy

Standards
development
Name Number organisation Date published

UK Jul 2012
The Anonymisation Decision-Making Framework: European
Anonymisation
Practitioners’ Guide
Network

The NIST Privacy Framework: A Tool for Improving Privacy through NIST Jan 2020
Enterprise Risk Management

Roadmap for Advancing the NIST Privacy Framework: A Tool for NIST Jan 2020
Improving Privacy through Enterprise Risk Management

CDEI Jul 2021


PETs Adoption Guide

48 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter three

Training
available Description Reference to PETs

Framework for anonymisation by an open Suggestions of use of differential privacy and synthetic data.
group, led by academics at the University of
Manchester.

Privacy framework based on NIST’s successful -


cybersecurity framework.

Roadmap for NIST’s Privacy framework Passing reference to differential privacy


highlighting challenges

Guidance including flowchart for identifying


appropriate PETs

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 49


Chapter three

TABLE 5

Existing and forthcoming initiatives related to PETs standards development

PET Name Number SDO

HE IT Security techniques – Encryption algorithms – Part 6: Homomorphic encryption ISO/IEC 18033-6:2019 ISO/IEC

HE Information security – Encryption algorithms – Part 8: Fully Homomorphic ISO/IEC AWI 18033-8 ISO/IEC
Encryption

HE Homomorphic Encryption Security Standard Open

TEEs Advanced Trusted Environment OMTP TR1 OMTP

TEEs TEE Trusted User Interface Low-level API GPD_SPE_055 GlobalPlatform

TEEs PSA Certified IoT Security Framework PSA Certified

TEEs IEEE Standard for Technical Framework and Requirements of Trusted Execution IEEE 2830-2021 IEEE
Environment based Shared Machine Learning

TEEs Standard for Secure Computing Based on Trusted Execution Environment P2952 IEEE

TEEs Information technology – Trusted platform module library ISO/IEC 11889-1:2015 ISO/IEC

DP Privacy enhancing data de-identification terminology and classification of ISO/IEC 20889:2018 ISO/IEC
techniques

DP NIST blog series NIST

DP ε KTELO: A Framework for Defining Differentially-Private Computations Academic

SMPC Information technology – Security techniques – Secret sharing – Part 1: General ISO/IEC 19592-1:2016 ISO/IEC

SMPC Information technology – Security techniques – Secret sharing – Part 2: ISO/IEC 19592-2:2017
Fundamental mechanisms

SMPC Information security – Secure multi-party computation – Part 1: General ISO/IEC CD 4922-1.2 ISO/IEC

SMPC Information security – Secure multi-party computation – Part 2: Mechanisms ISO/IEC WD 4922-2.3 ISO/IEC
based on secret sharing.

SMPC IEEE Recommended Practice for Secure Multi-Party Computation IEEE 2842-2021 IEEE

SD Synthetic Data – Industry Connections IC21-013-01 IEEE

SD Synthetic Data – what, why and how? The Alan Turing


Institute

50 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter three

Date Type Comment

May 2019 Standard Looks at two PHE algorithms, appropriate parameters and the process of homomorphically operating on
the encrypted data.

Standard Continuation of ISO/IEC 18033-6:2019 for FHE

Mar 2018 Standard Standard produced by an open consortium of industry, government and academia.

May 2009 Standard Originally made for mobile phone TEEs, but applicable more generally, setting out core requirements,
best practice and examples.

Oct 2018 Standard Highly technical standard used extensively in industry products.

Standard Internet of Things (IoT) certification for hardware, software and devices. This is used In the
standardisation of TEE hardware (e.g. ARM TrustZone).

Oct 2021 Standard Standard on the applied use of TEEs in privacy preserving machine learning done using third parties and
MPC.

Project Standard on cyber security application of TEEs.

Aug 2015 Standard A four-part standard on trusted platform modules, a related technology, developed by an industry
collaboration and later adopted by ISO/IEC.

Nov 2018 Guidance Discusses differential privacy as a metric and also related noise addition methods.

Dec 2021 Project General explainer on DP in 12 parts, concluding with a statement that they have plans to use it as a
foundation on which to develop technical guidelines.

May 2018 Guidance Example academic paper sharing a framework for developing DP algorithms.

Nov 2016 Guidance Sets out terminology.

Oct 2017 Standard Covers five secret sharing algorithms that meet requirements of message confidentiality and
recoverability.

Standard Incoming standard on SPMC.

Standard Incoming standard on SPMC specifically where it uses secret sharing.

Nov 2021 Standard A ‘technical framework’ for SMPC including security levels and use cases.

Project Industry (and academic) collaboration, sets out goals to produce best practice and terminology guidance
for a standard project authorization request for a synthetic data privacy and accuracy standard.

May 2022 Guidance An academic review of synthetic data as a technology highlighting some of the challenges.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 51


Chapter three

Measuring privacy and utility in PETs Using a single privacy metric also risks over-
One potential barrier in developing PETs simplification, failing to adequately address
standards is achieving consensus on metrics all relevant harms (as privacy metrics can only
for privacy and utility. There are many different account for one harm at a time).
metrics that can be used for privacy; one
review categorises over 80 privacy metrics and Threat modelling can be used to identify
suggests a method of how to choose them114. potential risks, attacks or vulnerabilities in a
data governance system. Threat models are
The cybersecurity community uses security constantly evolving as attacks reach new levels
metrics. Encryption, for example, has security of sophistication. For example, anonymisation
metrics such as key length, which estimate originally meant zero risk of reidentification.
the computing power it would take to break However, increasingly sophisticated
encryption and therefore the degree of security reidentification techniques, such as those
provided. SDOs are also interested in privacy that make use of statistical approaches and
metrics, as in Privacy enhancing data de- publicly available datasets, are changing the
identification terminology and classification of requirements of adequate anonymisation117.
techniques (ISO/IEC 20889), which concerns
differential privacy and its use as a measure. Considering these constraints, the best
However, privacy-utility trade-offs vary approach may be technical standards and
according to context, making metrics and metrics where feasible (as with encryption or
thresholds difficult to generalise115, 116. noise addition algorithms), complemented
by scenario-based guidance, assessment
protocols and codes of conduct.

114 Wagner I, Eckhoff D. 2018 Technical Privacy Metrics: A Systematic Survey. See https://arxiv.org/abs/1512.00327
(accessed 20 September 2022). Note that more general mathematical approaches also exist, which aim for a
definition of privacy more like that of epsilon in differential privacy. One example of this is Pufferfish, a self-professed
framework for mathematical privacy definitions, which can be used in the context of PETs: Kifer D, Machanavajjhala
A. 2014 Pufferfish: a framework for mathematical privacy definitions. ACM Transactions on Database Systems 39,
1—36. (https://doi.org/10.1145/2514689).
115 Lee J, Clifton C. 2011 How Much Is Enough? Choosing ε for Differential Privacy (conference paper). See https://link.
springer.com/chapter/10.1007/978-3-642-24861-0_22 (accessed 23 April 2022).
116 Abowd JM, Schmutte IM. 2019 An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices.
Am Econ Rev. 109, 171—202.
117 A useful analysis of the UK’s approach to anonymisation in data protection regulation can be found in: Bristow and
Privitar. 2021 Introduction to Anonymisation. See https://www.bristows.com/app/uploads/2021/07/Introduction-to-
Anonymisation-Privitar-and-Bristows.pdf (accessed 20 September 2022).

52 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter three

BOX 3

Motivated intruder tests

A type of attack-based risk assessment, this Motivated intruder testing can provide
is an established method for assessing the a degree of assurance. However, this
efficacy of a privacy regime (or PET). This approach cannot provide a quantitative
requires anticipation of 1) the technologies measure of assurance. In the past, a
and methods that might be used to attack motivated intruder has been defined
the data; 2) the vulnerabilities of a given as someone without specialist skills or
PET to various attacks; 3) the kinds of computing power, which may not be a
knowledge that could enable the attack; realistic adversary for some data sets (such
4) the goals of potential attacks and how as highly desirable datasets). More explicit
they might cause harms. An exhaustive guidance on testing, including choosing
list and test of every attack is not feasible. what and how to test a PET, could be
Rather, it is important to know what kind of included either in process standards or
attacks are most possible and most likely. PETs guidance.
Because it is impossible to anticipate every
scenario, even motivated intruder testing Testing does not remove the need for
does not provide a guarantee of privacy. expert users and developers. Social and
Nonetheless, it has been the primary educational infrastructure must be in place
legal test in determining whether data is to educate data scientists (and privacy
identifiable or not. professionals) on PETs and risk assessment.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 53


54 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT
Chapter four
Use cases for PETs

Left
Elizabeth Line, London, UK. © Unsplash / Kevin Grieve.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 55


Chapter four

Use cases for PETs

This chapter is comprised of a set of use cases Considerations and approach


highlighting the various roles PETs are playing These use cases were chosen for their
– or could play – in real-world data governance relevance to significant real-world data-driven
scenarios. The use cases are intended to challenges. The choice of scenarios was
represent broad scenarios where PETs could informed by two workshops with a PETs Contact
help reach a wider data objective. Group and validated through further discussions
with stakeholders. They were developed with
As these examples demonstrate, the role technical and legal input of the report’s Working
of PETs is not exclusively one of protecting Group, as well as invited external experts and
privacy – rather, they can serve to enhance desk-based review.
transparency, increase collaboration and
strengthen data partnerships. The intention is for these cases to be an aide
for anyone who relies on information flows to
The efficacy and appropriateness of a PET imagine how PETs could enhance a systems
in data governance is highly dependent on approach to data governance. The use cases
context. Therefore, the aim of these use cases are meant to explain PETs in various scenarios;
is not to prescribe reproduceable solutions, they are not intended to be an endorsement or
but rather: recommendation for action.
1. To inspire discussions between UK
government, regulators and organisations
that use data to consider how technology
may play an enabling role in data
governance, along with allowing faster and
safer ways of partnering to find data-driven
solutions to multidisciplinary challenges;

2. To illustrate the importance of context-


based solutions and a privacy by design
approach by including various types of data
and circumstantial sensitivities (individual,
commercial, national);

3. To showcase where PETs could play a


critical difference in data-driven problem
solving, allowing for data use that would
otherwise be legally, technically or
socially prohibitive.

56 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

USE CASE 1.1

Privacy in biometric data for health


research and diagnostics
The challenge
Recent advances in medical imaging, audio and
AI have led to unprecedented possibilities in
healthcare and research. This is especially true
of the UK, where the public health system is
replete with population-scale electronic patient
records. These conditions, coupled with strong
academic and research programmes, mean
that the UK is well positioned to deliver timely
and impactful health research and its translation
to offer more effective treatments, track and
prevent public health risks, utilising health data

© iStock / simonkr.
to improve and save lives118.

Internationally, hospitals produce an estimated


50 petabytes119 of data annually120, though only
Anonymous data is not covered by current
20% is structured for digitisation121, let alone
data protection law in the UK and EU. However,
further research or analysis. The public benefit
it is difficult to be certain that health data is
of utilising this joint resource is substantial, and
anonymous, particularly in biometric and other
AI-assisted analytics are essential for realising
non-textual data. Health data is subject to
the value of big health data. Because patient-
specific legal requirements in the UK, as well
level health data is inherently personal, there
as the common law duty of confidentiality. The
is potential for public distrust if health data is
following three examples illustrate how PETs
misused and privacy is compromised.
could help in meeting best practice standards
in non-textual health data use, while making
data more readily available for researchers
and innovators122.

118 HM Government (Life sciences industrial strategy update). See https://www.gov.uk/government/publications/life-


sciences-industrial-strategy-update (accessed 15 March 2022).
119 One petabyte is roughly the equivalent of 500 billion pages of standard printed text.
120 InfoDocket (How Large is the Digital Universe? How Fast is It Growing?). See https://www.infodocket.com/2014/04/16/
how-large-is-the-digital-universe-how-fast-is-it-growing-2014-emc-digital-universe-study-now-available/ (accessed 20
September 2022).
121 HIT Consultant (Why unstructured data holds the key to intelligent healthcare systems). See https://hitconsultant.
net/2015/03/31/tapping-unstructured-data-healthcares-biggest-hurdle-realized/#.XFvZ1lwvOUk (accessed 20
September 2022).
122 HM Government (National Data Strategy). See https://www.gov.uk/government/publications/uk-national-data-strategy/
national-data-strategy (accessed 9 September 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 57


Chapter four

FIGURE 2

Federated machine learning

Aggregation

New global model New global model


Server

A B C D

Local updates
(A, B, C, D)

A B C D

Preserving privacy in medical imaging for MRI imaging and metadata can reveal sensitive
research and diagnostics information about a patient. Indeed, even an
Magnetic Resonance Imaging (MRI) is a type individual’s presence in a dataset may be
of scan that produces detailed images of the sensitive. While the images themselves may
inside of the body and internal organs by using be de-identified through removal of names,
strong magnetic fields and radio waves. The addresses and scan date, neuroimages can
images produced by MRI scanning provide sometimes be reidentified (as demonstrated in
critical information in the diagnosis and staging a 2019 Mayo Clinic study)123.
of disease progression. Sets of MRI images can
be used to train machine learning algorithms
to detect certain features or abnormalities in
images. This technology can be deployed to
screen large numbers of images for research
purposes: identifying patterns that link
variables like patient behaviour, genetics, or
environmental factors with brain function.

123 Schwarz C G et al. 2019 Identification of Anonymous MRI Research Participants with Face-Recognition Software. N
Engl J Med. 381, 1684—1686. (https://doi.org/10.1056/nejmc1908881

58 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

Privacy solutions that enable collaboration In either approach, all users’ models are
Federated learning is a type of remote improved by ‘learning’ from remote datasets,
execution in which models are ‘sent’ to remote which are themselves never revealed. By
data-holding machines (eg, servers) for local using federated learning, raw data is not
training. This can allow researchers to use shared, which rules out the most common
data at other sites for training models without issues associated with data protection. At the
accessing those data sets. For example, if same time, federated learning does not offer
researchers at different universities hold perfect privacy; models are still vulnerable to
neuroimaging data, a federated approach some advanced attacks. These attacks may
would allow them to train models on all be of a sufficiently low risk to be acceptable
participants’ imaging data, even as that data to the parties such that they can proceed.
remains ‘invisible’ to analysts. This is an Other safeguards may also be put in place.
example of Federated Machine Learning These could include detecting when repeated
(see Figure 2). queries are made of an MRI dataset, which
could be cross-referenced with public data to
There are two approaches to accomplishing reidentify subjects.
Federated Machine Learning in this case:
• In one approach, each site analyses its own
data and builds a model; the model is then
shared to a remote, centralised location (a
node) common to all researchers involved.
This node then combines all models into one
‘global’ model and shares it back to each
site, where researchers can use the new,
improved model124;

• In a second approach, the model is


built iteratively, where the remote node
and local nodes take turns sending and
returning information125.

124 In this approach, a single-shot algorithm can be used.


125 Each participant sends a gradient on its data set until the algorithm converges. Iterations use an optimisation routine
(such as stochastic gradient descent). In this approach, a multi-shot algorithm can be used.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 59


Chapter four

BOX 4

Collaborative Informatics and Neuroimaging Suite Toolkit for Anonymous


Computation (COINSTAC)

COINSTAC126, an open-source, cross-platform application created by the Center for Translational


Research in Neuroimaging and Data Science (TReNDS) in Atlanta, Georgia, is one example
illustrating how to overcome data access barriers in neuroimaging through federated learning
and privacy preserving algorithms.

COINSTAC allows users who cannot directly Additionally, TReNDS researchers are
share their data to collaboratively run developing optimised algorithms for deep
open, reproduceable federated learning learning to reduce transmission bandwidth
and coordinated pre-processing using without sacrificing accuracy. In a third
software packages that can run in any example, brain age estimation algorithms
environment (such as personal devices, were trained to predict actual subject age
private data centres, or public clouds). It using neuroimaging; this was then applied
uses containerised software (software which to estimate the biological brain age of new
runs all necessary code within one subjects129. This is useful because large
environment that is executable regardless gaps between estimation of biological
of host operating system and is therefore brain age and actual age are potential
consistent across platforms). This software is biomarkers of brain disorders such as
available on GitHub under an MIT license127. Alzheimer’s disease. This model achieved
results that were statistically equivalent to
COINSTAC developers have documented centralised models.
several case studies. In one study, a
federated analysis using datasets from TReNDS is also currently developing a
Europe and India found structural changes network of COINSTAC vaults, which will
in brain grey matter linked to age, smoking, allow researchers to perform federated
and body mass index (BMI) in adolescents128. analysis with multiple large, curated
Another case study uses a federated neural datasets. This open science infrastructure
network classifier to differentiate smokers will enable rapid data reuse, create more
from non-smokers in resting-state functional generalisable models on diverse datasets,
MRI (fMRI) data. The federated models and democratise research by removing
typically achieve results similar to those barriers to entry for small or under-
using pooled data and better than those resourced groups.
drawing data only from isolated sites.

126 Coinstac (Homepage). See https://coinstac.org/ (accessed 30 March 2022).


127 Github (Coinstac release v6.5.3). See https://github.com/trendscenter/coinstac (accessed 20 September 2022).
128 Gazula H et al. 2021 Decentralized Multisite VBM Analysis During Adolescence Shows Structural
Changes Linked to Age, Body Mass Index, and Smoking: a COINSTAC Analysis. Neuroinformatics. 19,
553—566. (https://doi.org/10.1007/s12021-020-09502-7)
129 Basodi S et al. 2022 Decentralized Brain Age Estimation using MRI Data. Neuroinform 20,
981–990. (https://doi.org/10.1007/s12021-022-09570-x)

60 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

Differential privacy can also be applied to


prevent reidentification of neuroimages.
Differential privacy entails the addition of ‘noise’
(irrelevant or unwanted data items, features, or
records) to the results; this makes the task of
cross-referencing with public data more difficult.
Differential privacy also allows for risk to be
quantified as the probability of reidentification,
allowing the controller to ‘dial up or down’ and
adjust for performance-privacy trade-offs by
referring to a set ‘privacy budget’, or how much
data is determined acceptable to be leaked
from the site130.

Conclusions
Large, robust, international neuroimaging
datasets are required for training machine
learning models. These datasets exist around
the world in various institutions. Securely
using remote datasets to train machine
learning models could transform research in
this field. Further, safeguarding the privacy of
imaging subjects could increase participation
in research, enhancing the diverse, large-
scale data required to make future strides
in neuroscience.

130 Differential privacy and federated learning can be combined in two ways: output perturbation (where noise is added
to the output of an optimisation algorithm) and objective perturbation (noise is added at every step of the optimisation
algorithm). The latter may hold more functionality but requires identical pre-processing across sites and good local
feature mapping.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 61


Chapter four

BOX 5

PETs for machine learning with medical images: Emerging challenges

Radiology uses medical imaging to Lastly, model inversion or reconstruction


diagnose, treat disease and monitor in attacks may allow an attacker to reverse
clinical and research settings. High-quality engineer the training dataset from a
machine learning-based models can model133. As a relatively new possibility134,
provide a second reading of images, acting risk-benefit assessment in model inversion is
as a ‘digital peer’ to medical researchers relatively immature.
and clinicians. Once the model can identify
patterns of disease, it can be exported for Data protection regulation (such as the UK
use by other clinicians and researchers (if GDPR) can lack clarity regarding models
the model is transferable). The potential trained on sensitive data. Traditionally,
public benefit of using these trained models have been treated as intellectual
models is significant and currently being property or trade secrets, rather than
investigated, for instance, by Health Data personal data. However, ‘trained models
Research UK and other stakeholders131. can transform seemingly non-sensitive
data, such as gait or social media use, into
Once the model is exported, the original sensitive data, such as information on an
creators relinquish control. While a model individual’s fitness or medical conditions.’
is not ‘raw data’, there are potential Legally, the possibility of revealing training
vulnerabilities. Over-trained models may data ‘might render models as personal
remain so faithful to the training dataset that data in the sense of European protection
they risk revealing granular details about the law [...]’135. Recent publications demonstrate
training data. Linkage attacks could harvest how inference attacks present real threats
information derived from the model which, for collaborative analysis136.
when linked with third-party data, result in
the exposure of personal data132.

131 Health Data Research UK (HDR UK Strategic Delivery Plan 2021/22). See https://www.hdruk.ac.uk/wp-content/
uploads/2021/02/Strategic-Delivery-Plan-2021_22.pdf (accessed 7 October 2022).
132 White T, Blok E, Calhoun V D. 2020 Data sharing and privacy issues in neuroimaging research:
Opportunities, obstacles, challenges, and monsters under the bed. Hum Brain Mapp. 43,
278—291. (https://doi.org/10.1002/hbm.25120)
133 Veale M, Binns R, Edwards L. 2018 Algorithms that remember: model inversion attacks and data protection law. Philos
T R Soc A. 376. (https://doi.org/10.1098/rsta.2018.0083).
134 First proposed by Fredrikson M, Jha S, Ristenpart T. 2015 Model Inversion Attacks that Exploit Confidence Information
and Basic Countermeasures. See https://rist.tech.cornell.edu/papers/mi-ccs.pdf (accessed 6 September 2022).
135 Veale M, Binns R, Edwards L. 2018 Algorithms that remember: model inversion attacks and data protection law. Philos
T R Soc A. 376. (https://doi.org/10.1098/rsta.2018.0083)
136 Melis L, Song C, De Cristofaro E, Shmatikov V. 2018 Exploiting Unintended Feature Leakage in Collaborative
Learning. See https://arxiv.org/abs/1805.04049 (accessed 10 October 2022).

62 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

BOX 6

Privacy and compliance concerns around trained models

This should be addressed by considering: • commercial sensitivities inherent to the


• accessibility (who will use, or have access model (such as the risk of a private actor
to, the trained model); improving upon and re-selling the model
back to a public entity).
• identification of adversaries and their
incentives (in the case of model inversion, ICO guidance on model inversion and
the ‘honest but curious’ persona and model inferencing attacks entails a series of
deliberate reverse engineering of models actions to be documented. These include
for commercial gain)137; reviewing trade-offs on a regular basis,
establishing clear lines of accountability
• legality (what contractual obligations or
with a risk-based approval process,
data sharing regimes model users are
and consideration of available technical
subject to);
approaches that minimise trade-offs139. Data
• public acceptability of proposed model management could include best practices
usage (for example, public health for deidentification and removal of metadata.
application versus commercial enterprise; Legal instruments, such as Data Transfer
UK implementation versus international Agreements (DTAs) or contracts provide
humanitarian applications); further safeguarding. For example, NHS
Digital has implemented, in collaboration
• potential for model reuse or
with Privitar, a de-identification tool using
repurposing for other tasks beyond
a variety of pseudonymisation techniques
original intentions138 and;
and a form of homomorphic encryption to
ensure safer linkage of data140, 141.

137 The honest-but-curious adversary is ‘a legitimate participant in a communication protocol who will not deviate from
the defined protocol but will attempt to learn all possible information from legitimately received messages’, as defined
in Paverd A, Martin A, Brown I. Modelling and Automatically Analysing Privacy Properties for Honest-but-Curious
Adversaries. See https://www.cs.ox.ac.uk/people/andrew.paverd/casper/casper-privacy-report.pdf (accessed 10
September 2022).
138 Melis L, Song C, De Cristofaro E, Shmatikov V. 2018 Exploiting Unintended Feature Leakage in Collaborative
Learning. See https://arxiv.org/abs/1805.04049 (accessed 10 October 2022).
139 The Information Commissioner’s Office. Guidance on the AI auditing framework: Draft guidance for consultation.
See https://ico.org.uk/media/2617219/guidance-on-the-ai-auditing-framework-draft-for-consultation.pdf (accessed 20
September 2022).
140 The Royal Society. 2019 Protecting privacy in practice: The current use, development and limits of Privacy Enhancing
Technologies in data analysis. See https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/
privacy-enhancing-technologies-report.pdf (accessed 30 June 2022).
141 National Health Service Digital (Improving our Data Processing Services). See https://digital.nhs.uk/data-and-
information/data-insights-and-statistics/improving-our-data-processing-services (accessed 15 May 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 63


Chapter four

BOX 6 (continued)

Training models could be submitted to


robust motivated intruder testing, where
potential attacks are simulated. For
especially sensitive data, stronger attacker
profiles (such as criminal groups) may be
preferable. Libraries of attacks, which detail
potential attacks and relative risks, provide
resources for simulating and testing against
potential attacks142.

Biometric and imaging data shared within


the usual research and clinical settings
is handled by professionals who are
incentivised to protect patient confidentiality.
With the right data sharing practices
in place, the risk of patient data being
compromised is greatly reduced. Similar
protocols could be followed in model
training and use. For example, the model
could be restricted to sharing in a trusted
research environment143 with access limited
to approved researchers. Perhaps the most
secure option, the researcher may retain
control of the model and provides analysis
as a service.

142 Github (Privacy Trust Lab Privacy Meter). See https://github.com/privacytrustlab/ml_privacy_meter (accessed 10
September 2022).
143 Trusted research environments can vary significantly in scope and security guarantee.

64 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

USE CASE 1.2

Preserving privacy in audio data for


health research and diagnostics
The opportunity The challenge
Audio data containing verbal content and Vocal data is vulnerable because there are
nonverbal vocalisations (coughing, breathing, large open datasets of identifiable audio
speech pauses) can be used to train machine publicly available (eg on YouTube)150, making
learning models for predicting disease144. reidentification relatively straightforward.
Alzheimer’s Disease (AD) is a type of dementia Beyond the content of verbal data, the very
that affects an individual’s memory, motor skills presence of an individual within a dataset
and cognition. AD researchers seek non- reveals sensitive information.
invasive techniques for screening and detecting
AD. Speech and audio data are growing areas As biometric data, vocal data is personal data.
for research and diagnosis of AD and other It is not considered anonymous under UK and
diseases145, 146, 147. AD may affect the content EU GDPR. Additionally, audio data may also be
of speech – such as the range of a person’s transcribed, doubling the data used (vocal and
vocabulary – or the cadence of speech – textual data)151. In this case, data minimisation
such as increased hesitation due to difficulty would mitigate risk of information leakage, for
finding words. Other neurological conditions example, only retaining transcripts or audio.
such as Parkinson’s disease may alter speech
characteristics including pitch, cadence and
articulation. Vocal biomarkers are thus a
promising avenue for AD and other research,
particularly when coupled with AI148, 149.

144 Wroge TJ, Özkanca Y, Demiroglu C, Si D, Atkins D C, Ghomi RH. 2018 Parkinson’s disease diagnosis using machine
learning and voice. See https://www.ieeespmb.org/2018/papers/l01_01.pdf (accessed 23 April 2022).
145 König A, Satt A, Sorin A, Hoory R, Toledo-Ronen O, Derreumaux A, Manera V, Verhey F, Aalten P, Robert PH, David
R. Automatic speech analysis for the assessment of patients with predementia and Alzheimer’s disease. Alzheimers
Dement. 1, 112—124. (https://doi.org/10.1016/j.dadm.2014.11.012)
146 Haulcy R, Glass J. 2021 Classifying Alzheimer’s Disease Using Audio and Text-Based Representations of Speech.
Front. Psychol. Sec. Human-Media Interaction. 11 (https://doi.org/10.3389/fpsyg.2020.624137)
147 University College London (Meet the C-PLACID Audio-Recording Research Team). See https://www.ucl.ac.uk/drc/c-
placid-study/audio-recording-c-placid/meet-c-placid-audio-recording-research-team (accessed 1 September 2022).
148 Fagherazzi G, Fischer A, Ismael M, Despotovic V. 2021 Voice for health: The use of vocal biomarkers from research to
clinical practice. Digit Biomark. 5, 78—88. (https://doi.org/10.1159/000515346)
149 Arora A, Baghai-Ravary L, Tsanas A. 2019 Developing a large scle population screening tool for the assessment of
Parkinson’s disease using telephone-quality voice. J Acoust Soc Am. 145 5 2871. (https://doi.org/10.1121/1.5100272)
150 Examples include: Mozilla Labs (Common Voice). See https://labs.mozilla.org/projects/common-voice/ (accessed 15
August 2022); Google Audio Set: Gemmeke JF et al. 2017 Audio Set: An ontology and human-labaled dataset for
audio events. Proc. IEEE ICASSP 2017 New Orleans. See https://research.google/pubs/pub45857/, https://research.
google.com/audioset/dataset/index.html (accessed 2 June 2022), and open data sets such as Oxford University’s
VoxCeleb: Oxford University (VoxCeleb). See https://www.robots.ox.ac.uk/~vgg/data/voxceleb/ (accessed 14
May 2022).
151 Haulcy R, Glass J. 2021 Classifying Alzheimer’s Disease Using Audio and Text-Based Representations of Speech.
Front. Psychol. Sec. Human-Media Interaction. 11 (https://doi.org/10.3389/fpsyg.2020.624137)

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 65


Chapter four

Privacy preservation in biometric audio data Privacy-preserving synthetic data (PPSD) may
When using biometric audio data, PETs should be generated from audio recordings prior
be layered with audio-specific approaches to sharing or querying156. However, this is
to anonymisation. For example, voice an emerging application of PPSD157, 158. New
transformation techniques may be used to synthetic datasets may need to be created
alter a patient’s voice quality152. Transcription specific to various research queries159, 160, which
of audio data can be automated using AI- could become costly.
based applications (eg Google Cloud’s Speech
API), then scanned using a machine learning Conclusions
algorithm that tags identifiers such as names, Voice recognition technology is becoming
dates, ages, or geographical location. By ever more sophisticated, such that speaker
highlighting identifiable elements, identifiers can identification is now feasible even under noisy
be swiftly redacted. conditions. These methods may be applied
even where masking techniques such as
Audio data collection techniques may include transformation have been used161. Without
phone or web-based recording153; these can greater sharing of audio data there is a risk that
entail potential for eavesdropping. Voice Over audio-trained models become biased according
IP (VOIP) can include end-to-end homomorphic to language-, accent-, age-, and culture-specific
encryption, ensuring that no other parties listen biomarkers. This could be countered through
during data collection154. It is also possible to open and crowd-sourced initiatives162, which
encrypt voice data for cloud storage155, or to could be rolled out most safely with PETs.
split voice data into random fragments, which
are each processed separately.

152 Jin Q, Toth AR, Schultz T, Black AW. 2009 Voice convergin: Speaker de-identification by
voice transformation. 2009 IEEE International Conference on Acoustics, Speech and Signal
Processing. (https://doi.org/10.1109/icassp.2009.4960482)
153 Fagherazzi G, Fischer A, Ismael M, Despotovic V. 2021 Voice for health: The use of vocal biomarkers from research to
clinical practice. Digit Biomark. 5, 78—88. (https://doi.org/10.1159/000515346)
154 In this case, VOIP signals from multiple parties are mixed at a central server, improving the scalability of the solution
and protecting the data held on the central server, were the server to be compromised. See: Rohloff K, Cousins D
B, Sumorok D. 2017 Scalable, Practical VoIP Teleconferencing with End-to-End Homomorphic Encryption. IEEE T Inf
Foren Sec. 12, 1031—1041. (https://doi.org/10.1109/tifs.2016.2639340)
155 Shi C, Wang H, Hu Y, Qian Q, Zhao H. 2019 A speech homomorphic encryption scheme with less data expansion in
cloud computing. KSII T Internet Inf. 13, 2588—2609. (https://doi.org/10.3837/tiis.2019.05.020)
156 Fazel A et al. 2021. SynthASR: Unlocking Synthetic Data for Speech
Recognition. (https://doi.org/10.48550/arXiv.2106.07803)
157 Tomashenko N et al. 2020 Introducing the VoicePrivacy initiative. See https://doi.org/10.48550/arXiv.2005.01387
(accessed 30 March 2022).
158 Shevchyk A, Hu R, Thandiackal K, Heizmann M, Brunschwiler T. 2022 Privacy preserving synthetic respiratory sounds
for class incremental learning. Smart Health. 23. (https://doi.org/10.1016/j.smhl.2021.100232)
159 Fazel A et al. 2021 SynthASR: Unlocking Synthetic Data for Speech Recognition. See https://doi.org/10.48550/
arXiv.2106.07803 (accessed 10 October 2022).
160 Rossenbach N, Zeyer A, Schlüter R, Ney H. 2020 Generating synthetic audio data for attention-based speech
recognition systems. See https://doi.org/10.48550/arXiv.1912.09257 (accessed 10 October 2022).
161 Chung J S, Nagrani A, Zisserman A. 2018 VoxCeleb2: Deep speaker recognition. See https://doi.org/10.48550/
arXiv.1806.05622 (accessed 2 September 2022).
162 Fagherazzi G, Fischer A, Ismael M, Despotovic V. 2021 Voice for health: The use of vocal biomarkers from research to
clinical practice. Digit Biomark. 5, 78—88. (https://doi.org/10.1159/000515346)

66 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

USE CASE 2

PETs and the internet of things: enabling


digital twins for net zero
The opportunity
The UK has committed to reach net zero carbon
A digital twin is a
emissions by the year 2050 as part of a wider
effort to mitigate climate change. Data-driven relevant, virtual
digital technologies are poised to play a key counterpart of a
role in meeting these targets163. Digitalising physical object
energy systems will be an important step in
decarbonising sectors such as energy, heat,
(such as a wind
and transport, as well as supporting a greener, turbine or electric
circular economy. motor) or process
(such as patterns

© iStock / Jinli Guo.


Digital twins are an emerging area of focus in
climate technologies. A digital twin is a relevant, of economic
virtual counterpart of a physical object (such transactions).
as a wind turbine or electric motor) or process
(such as patterns of economic transactions).
While small-scale digital twins are in use,
When integrated with other models and
large-scale digital twins are at a relatively early
physical-virtual systems through sensors, digital
stage of development, where security and
twins can function as decision-support tools.
privacy are emerging concerns164. Establishing
best practice and privacy solutions will be key
to the acceptability of digital twins, as well as
ensuring interoperability and other technical
requirements are met. A digital twin of the UK’s
energy system would help balance real-time
energy ‘smart’ grids. This will be important
alongside wider uptake of decentralised and
intermittent sources of renewable energy.

163 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero. See https://
royalsociety.org/-/media/policy/projects/digital-technology-and-the-planet/digital-technology-and-the-planet-report.
pdf (accessed 20 September 2022).
164 Dietz M, Putz B, Pernul G. 2019 A Distributed Ledger approach to Digital Twin secure data sharing. See https://core.
ac.uk/download/pdf/237410573.pdf (accessed 27 September 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 67


Chapter four

FIGURE 3

A digital twin of the UK energy system

Data is needed from a range of sources to develop, evaluate, and ‘fuel’ a digital twin of the UK
energy system. Emerging privacy and security concerns must be addressed to allow the safe flow
of data between digital twin models and real-world assets.

Energy management system

Digital twin model

Energy generation

Buildings

E-storage

Agriculture

Industry
HOSPITAL
Hospitals

E-vehicle
charge points

Solar panels Smart meters Transport

68 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

The challenges • Government: Data pertaining to the built


Data is needed to develop, evaluate, and environment could expose vital infrastructure
‘fuel’ digital twins. Presuming energy data as or utilities to attack, leading to national
open165 will help unlock research and innovation security concerns;
potential (such as through digital twinning). At
• Regulators: Perception of data misuse could
the same time, emerging privacy and security
lead to loss of public trust, compromising
concerns must be addressed.
efforts to use data for net zero (for example,
leading to low uptake of smart meters).
In this case, data sharing issues concern several
stakeholder groups: A flow of data must enable communication
• Individuals: UK energy consumers’ metering between digital twins and real-world assets167.
data was once only read monthly, but Data infrastructure must be able to link physical
readings can now be taken at a more assets, accounting for different data types,
granular level (typically half-hourly), meaning components, technical standards, and analytical
energy usage patterns can be used to track capabilities – a lightweight ‘digital spine’168.
household activities166; Some steps have already been taken, as with
the creation of the Information Management
• Industry: Energy sector actors may be
Framework within the National Digital
disincentivised to share data that is
Twin Programme169, 170.
commercially sensitive (eg algorithmically
derived pricing models);

165 Catapult Energy Systems. 2019 A strategy for a Modern Digitalised Energy System: Energy Data Taskforce report. See
https://esc-production-2021.s3.eu-west-2.amazonaws.com/2021/07/Catapult-Energy-Data-Taskforce-Report-A4-v4AW-
Digital.pdf (accessed 27 September 2022).
166 In one smart heating example, analysts demonstrated the ability to uncover users’ sleeping patterns, location within a
home, even whether a user was sitting or standing. While this level of detail goes beyond what is possible with typical
smart metering, it is one example where perceived potential invasiveness of smart fixtures in the home may prevent
uptake of this technology: Morgner P, Müller C, Ring M, Eskofier BM. 2017 Privacy Implications of Room Climate Data.
Lecture Notes in Computer Science vol 10493. See https://doi.org/10.1007/978-3-319-66399-9_18 (accessed 27
September 2022).
167 Dietz M, Putz B, Pernul G. 2019 A Distributed Ledger approach to Digital Twin secure data sharing. See https://core.
ac.uk/download/pdf/237410573.pdf (accessed 27 September 2022).
168 Catapult Energy Systems (Energy Digitalisation Taskforce publishes recommendations for a digitalised Net Zero
energy system). See https://es.catapult.org.uk/news/energy-digitalisation-taskforce-publishes-recommendations-for-a-
digitalised-net-zero-energy-system/ (accessed 22 September 2022).
169 University of Cambridge (Centre for Digital Built Britain). See https://www.cdbb.cam.ac.uk/subject/information-
management-framework-imf (accessed 20 September 2022).
170 More specifically, one of CReDo’s aims is to trial the MFI to evaluate the framework’s capacity to operate at a
national level.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 69


Chapter
Chapterfour
four

Privacy solutions should be implemented at • Appliance and usage patterns that might
several critical points in the coupled digital be used in unsolicited targeted marketing,
twin-asset ecosystem. This use case focusses for example, ads or messages prompting
on energy consumption, where private data consumers to have their boiler serviced.
may disclose:
While these inferences could be made
• What appliances are used and when171;
using contemporary smart meter data,
• What behaviour patterns might be revealed future versions may take readings at shorter
by consumers’ energy usage – particularly intervals, allowing for detection of which
occupancy patterns172; appliances are used, or which TV channels are
watched (through discernible electromagnetic
• What information might be inferred about the
interference signatures)175.
building / utilities and other features, leading
to security risks in national energy systems
Individual privacy solutions: Smart meter
assets173;
data privacy
• How energy companies’ processing Smart meter data is personal data176. Privacy
algorithms might give away proprietary concerns around smart meter data have gained
knowledge and commercially sensitive attention with the roll-out of devices in Europe
behavioural insights; and the UK177, However, smart meter data
holds substantial value for renewable energy
• What billing or other pseudonymised
integration: there is no other way of measuring
records might reveal private information
energy consumption in real time, or so close to
about consumers174, including consumer
consumer end-use.
responsiveness to changes in price;

171 Molina-Markham A, Shenoy P, Fu K, Cecchet E, Irwin D. 2010 Private memoirs of a smart meter. Proceedings
of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building. See https://doi.
org/10.1145/1878431.1878446 (accessed 2 September 2022).
172 Lisovich MA, Mulligan DK, Wicker SB. 2010 Inferring Personal Information from Demand-Response Systems. IEEE
Secur Priv. 8, 11—20. (http://dx.doi.org/10.1109/MSP.2010.40)
173 Beckel C, Sadamori L, Staake T, Santini S. 2014 Revealing household characteristics from smart meter data. Energy.
78 397—410. (http://dx.doi.org/10.1016/j.energy.2014.10.025)
174 Jawurek M, Johns M, Rieck K. 2011 Smart metering de-pseudonymization. ACSAC 2011 Proceedings of the 27th
Annual Computer Security Applications Conference. See http://doi.acm.org/10.1145/2076732.2076764 (accessed 20
March 2022).
175 Enev M, Gupta S, Kohno T. 2011 Televisions, video privacy, and powerline electromagnetic interference. See http://doi.
acm.org/10.1145/2046707.2046770 (accessed 2 September 2022).
176 The UK government’s Smart Metering Implementation Programme (2018) outlined the smart metering Data Access
and Privacy Framework, which aimed to ‘safeguard consumers’ privacy, whilst enabling proportionate access to
energy consumption data’. HM Government. 2018 Smart metering implementation programme: Review of data
access and privacy framework. See https://assets.publishing.service.gov.uk/government/uploads/system/uploads/
attachment_data/file/758281/Smart_Metering_Implementation_Programme_Review_of_the_Data_Access_and_
Privacy_Framework.pdf (accessed 22 September 2022).
177 For example: Pöhls HC, Staudemeyer RC. 2015 Privacy enhancing techniques in Smart City applications. See https://
cordis.europa.eu/docs/projects/cnect/4/609094/080/deliverables/001-RERUMdeliverableD32Ares20153669911.pdf
(accessed 26 September 2022).

70 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter
Chapterfour
four

A number of solutions have been proposed in BOX 7


handling smart meter data. Non-cryptographic
methods may meet the resource constraints Secure multi-party computation
of smart meters most effectively; for example, for smart meter data privacy
differential privacy could be used to add
‘noise’ to datasets. Other approaches include The Netherlands implemented smart
spatial aggregation, where smart meters are metering in 2006, along with mandates
geographically clustered (such as in a block of for data sharing at 15-minute intervals. This
houses), allowing for load balancing without was subsequently found to violate Article
collecting household-level information178. 8 of the European Convention on Human
Rights (respect for private and family life)179.
As a result, new legislation was passed
allowing Dutch customers to opt out entirely
or retain smart meter administrative and
shutdown capabilities.

More recently, the privacy officer of the DSA


has approved the use of smart meter data
in cohorts of six neighbouring households.
However, this requires averaging six
numbers without an analyst seeing those
six numbers.

Secure multi-party computation (SMPC) is


being piloted in the Netherlands through
a public-private partnership with Roseman
Labs. SMPC is used to total and average
the energy use of six neighbouring houses
‘in the blind’. This provides mid-level
network views of power consumption
for the first time. This solution is currently
in trial phase using hardware, which is
retrofitted onto smart meters. In the future,
the SMPC software could be run as part of
software built into smart meters, with data
encrypted locally before being sent to the
secured server.

178 UN Conference of European Statisticians. 2019 Protecting Consumer Privacy in Smart Metering by Randomized
Response. See https://unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.46/2019/mtg1/SDC2019_S4_URV_
Protecting_Consumer_Privacy_AD.pdf (accessed 24 August 2022).
179 Cuijpers C, and Koops B-J. 2013 Smart Metering and Privacy in Europe: Lessons from the Dutch Case. In: Gutwirth S,
Leenes R, de Hert P, Poullet Y. (eds) European Data Protection: Coming of Age. Berlin: Springer, Dordrecht.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 71


Chapter
Chapterfour
four

Government, regulators and national Data coming from physical assets may be
security harms used to control the grid and national power
Combined summary statistics of energy data distributions. TEEs – potentially coupled with
sets will be key to maximising the benefits of an homomorphic encryption – could safeguard
energy digital twin. Privacy-preserving synthetic collaborative cloud computing from attacks,
data (PPSD) could be used to share relevant protecting security to critical national
properties of rich microdata – in essence, how infrastructure181. Homomorphic encryption
the datasets relate to one another – collected can be highly compute-intensive and would
through smart systems. Simpler, differentially require significant development to be used at
private summary statistics could be shared a large scale.
(where the privacy-utility trade-off would
be more transparent). This would enable
decision-making by government and regulators
without releasing full datasets. However, the
utility and privacy trade-offs of PPSD must
be better understood and will be highly
case‑dependent180.

FIGURE 4

Trusted Execution Environment (TEE)

TEEs are secure areas inside a processor, which are isolated from the rest of the system. The
code contained in the TEE cannot be read by the operating system, nor the hypervisor (a
process that separates a computer’s operating system and applications from the underlying
physical hardware).

App App

Operating system
CODE DATA

Hypervisor

Hardware

180 Jordon J et al. 2022 Synthetic data: What, why and how? See https://arxiv.org/pdf/2205.03257.pdf (accessed 2
September 2022).
181 Archer et al. 2017 Applications of homomorphic encryption. See https://www.researchgate.net/
publication/320976976_APPLICATIONS_OF_HOMOMORPHIC_ENCRYPTION/link/5a051f4ca6fdcceda0303e3f/
download (accessed 23 April 2022).

72 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapterfour
Chapter four

Commercially sensitive data solutions for Ofgem and other regulatory bodies should
digital twins ensure that data usage reflects consumer
Energy providers could use insights from smart interests. In a digital twin, this could entail
meter data to provide new service models allowing users to audit and challenge their
(eg heating as a service). In addition to SMPC, smart meters’ outputs, for example183. Where
federated learning could allow users’ data to algorithms are trained on real-time data, every
stay localised while training models are used effort must be made to ensure sections of the
by energy providers. For example, a machine population are not over- or under-represented,
learning model could be sent to individual as this could reproduce systemic biases and
smart home systems and ‘learn’ locally about promote inaccuracies. A consumer consent
certain energy consumption patterns in order dashboard, such as the one proposed by the
to predict demand182. Energy Digitalisation Taskforce184 in the UK,
may provide a greater sense of control and
Conclusions encourage consumer trust.
Digital twins hold significant potential in
enabling the net zero transition. A privacy-
enhanced digital twin using PETs should
be bolstered with basic security measures,
including the physical restriction of access to
critical infrastructure, servers and computers
(eg using hardware keys). For PETs to be
embedded into the realisation of an energy
digital twin, data protection regulation and
related guidance should consider what
mandates or advice would be effective and
ethical in promoting the uptake of smart meters.

182 Fuller A, Fan Z, Day C, Barlow C. 2020. Digital Twin: Enabling Technologies, Challenges and Open Research. IEEE
Access. 8, 108952—108971. (https://doi.org/10.1109/access.2020.2998358)
183 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero. See https://
royalsociety.org/-/media/policy/projects/digital-technology-and-the-planet/digital-technology-and-the-planet-report.
pdf (accessed 20 September 2022).
184 HM Government 2022. Energy Digitalisation Taskforce report: joint response by BEIS, Ofgem and Innovate UK. See
https://www.gov.uk/government/publications/digitalising-our-energy-system-for-net-zero-strategy-and-action-plan/
energy-digitalisation-taskforce-report-joint-response-by-beis-ofgem-and-innovate-uk (accessed 24 August 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 73


Chapter
Chapterfour
four

USE CASE 3

Social media data: PETs for researcher


access and transparency
The opportunity
Over 4 billion people use social media –
including networking platforms, online games,
wellbeing applications, and budgeting tools
– to upload and share media and messages,
log activities and access entertainment185.
The extent to which people interact with, and
generate content on, these platforms has
made social media services an increasingly
valuable source of data for research. The Royal
Society has recommended that social media
platforms establish ways to allow independent
researchers to access data in a secure and
privacy compliant manner186, particularly for
audit and to encourage the accountability
of platforms.

Data generated through social media includes


content posted and shared (such as text
posts or photos) as well as metadata (such
as demographic information, location, time of
upload, and behavioural patterns – for example,
how often a user opens a fitness app or inferred
relationships depicted in a user’s photos)187.
User data is often volunteered by users, such
© Unsplash / Gian Cescon.
as an uploaded profile photo, or self-described
location. Most metadata is logged automatically,
such as the geotag on an image, or the
timestamp on a message.

185 Statista (Number of internet and social media users worldwide as of July 2022). See https://www.statista.com/
statistics/617136/digital-population-worldwide/ (accessed 18 August 2022).
186 The Royal Society. 2022 The online information environment: Understanding how the internet shapes people’s
engagement with scientific information. See https://royalsociety.org/-/media/policy/projects/online-information-
environment/the-online-information-environment.pdf?la=en-GB&hash=691F34A269075C0001A0E647C503DB8F
(accessed 30 March 2022).
187 See Lomborg S, Bechmann A. 2014 Using APIs for Data Collection on Social Media. The Information Society 30 4
256—265. (https://doi.org/10.1080/01972243.2014.915276)

74 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter
Chapterfour
four

BOX 8 However informative, the use of social media


data can be resource intensive and invasive.
Social media data for research and Accessing and curating social media data
decision making is hindered by technical capabilities and
public distrust.
An increasing number of studies use social
media platforms and mobile applications Researcher access: APIs and PETs
as rich data sources188. Interdisciplinary Researchers typically use an API (Application
research using social media data includes: Programming Interface) to access social
• Disaster management and media data logs (or data streams), which can
emergency response189; be analysed for patterns. An API is a backend
interface that connects social media services
• Social patterns of influence and the
and their data to third parties. Making APIs
dynamics of social movements;
available to researchers can be part of a social
• Information cascades (how information media company’s business model. Some large
propagates in social media sites, companies like Facebook and Twitter provide
understanding the spread and impact of free, if restricted, access to datasets of public-
misinformation)190; facing data. Private user data is released
through APIs only to approved researchers,
• Event monitoring by location to enhance
who may submit queries for specific datasets193
physical safety and security;
or use the Twitter 1% sampled stream, which
• Vulnerability management191, identifying delivers a random selection of roughly 1% of
and communicating with communities public Tweets in real-time194.
most at risk of natural disaster, climate
emergencies or disease outbreak;

• Studying online harms (including bullying


and harassment, toxicity, radicalisation);

• Political research and opinion


forecasting192.

188 For example: Giglietto F, Rossi L, Bennato D. 2012 The Open Laboratory: Limits and Possibilities of Using
Facebook, Twitter, and YouTube as a Research Data Source. Journal of Technology in Human Services. 30,
145–159. (https://doi.org/10.1080/15228835.2012.743797)
189 Teodorescu H-N. 2015 Using analytics and social media for monitoring and mitigation of social disasters. Procedia
Engineer. 107 325—334. (https://doi.org/10.1016/j.proeng.2015.06.088)
190 Harvard Kennedy School Misinformation Review (Tackling misinformation: What researchers could do with social
media data). See https://misinforeview.hks.harvard.edu/article/tackling-misinformation-what-researchers-could-do-
with-social-media-data/ (accessed 20 November 2021).
191 Gundecha P, Barbier G, Huan L. 2011 Exploiting Vulnerability to Secure User Privacy on a Social Networking Site.,
Proceedings of the 17th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, KDD,
2011. 511–519.
192 Sobkowic P; Kaschesky M; Bouchard G. 2012 Opinion mining in social media: Modeling, simulating, and forecasting
political opinions in the web. Gov Inform Q. 29, 470–479. (https://doi.org/10.1016/j.giq.2012.06.005)
193 For example, Meta’s Graph API. Meta for Developers (Graph API Overview). See https://developers.facebook.com/
docs/graph-api/overview/ (accessed 17 July 2022).
194 Twitter Developer Platform (Volume streams). See https://developer.twitter.com/en/docs/twitter-api/tweets/volume-
streams/introduction (accessed 27 September 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 75


Chapter
Chapterfour
four

BOX 9

Public health is an emerging and growing area for research using


social media data

For example: • In April 2020, Facebook’s Data for Good


• Economists are using mobile game scores programme released new visualisations
and geolocation data from Lumocity, a and datasets including Movement
brain-training game, to see whether local Range Maps, Co-Location Maps and
air pollution spikes correlate with declines symptom surveys to enable researchers,
in cognitive function and productivity. international agencies, non-profits and
This research could establish exposure public sector institutions track and combat
to particulate matter as a mechanism for COVID-19197. The usage of this data
inequality in the workforce195; influenced international public policy
responses and helped researchers
• Following the hashtag #cheatmeal
identify economic, health and social
on Instagram, kinesiologists and
impacts in communities198. Researchers
psychologists analysed tagged images to
may now access recent survey datasets
characterise an emerging dietary trend,
on the future of business199 and gender
which they linked to binge eating196;
equality at home200.

Greater visibility is needed around the full


lifecycle of social media data for researchers to
fully utilise social media data201. Transparency in
social media data, including how it is used by
platforms, would also promote the rights of data
subjects to exercise informed consent around
how their data is used.

195 La Nauze A, Severnini ER. 2021 Air pollution and adult cognition: Evidence from brain training. See https://www.nber.
org/system/files/working_papers/w28785/w28785.pdf (accessed 30 April 2022).
196 Pila E, Mond JM, Griffiths S, Mitchison D, Murray SB. 2017 A thematic content analysis of #cheatmeals images on
social media: Characterizing an emerging dietary trend. Int J Eat Disord. (https://doi.org/10.1002/eat.22671)
197 Meta (Data for Good: New Tools to Help Health Researchers Track and Combat COVID-19). See https://about.fb.com/
news/2020/04/data-for-good/ (accessed 27 September 2022).
198 Office for National Statistics Data Science Campus (Using Facebook data to understand changing mobility patterns).
See https://datasciencecampus.ons.gov.uk/using-facebook-data-to-understand-changing-mobility-patterns/ (accessed
24 August 2022).
199 Humanitarian Data Exchange (Future of Business Survey—Aggregated Data). See https://data.humdata.org/dataset/
future-of-business-survey-aggregated-data (accessed 21 February 2022).
200 Humanitarian Data Exchange (Survey on Gender Equality At Home). See https://data.humdata.org/dataset/survey-on-
gender-equality-at-home (accessed 21 February 2022).
201 COVID-19 Mobility Data Network (Facebook Data for Good Mobility Dashboard). See https://visualization.
covid19mobility.org/?date=2021-09-24&dates=2021-06-24_2021-09-24&region=WORLD (accessed 27
September 2022).

76 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter
Chapterfour
four

The challenge There are technical challenges around


Social media data entails a variety of personal collecting and using social media data in a
data, including a user’s age, gender, political privacy-preserving way. One challenge is
orientation202 and moods203. Images can expose deletion: data shared on social media platforms
location204, residence205 and relationship status as unrestricted (available to anyone without
between individuals in a photo. These privacy logging onto the platform) may be collected
issues are related to contextual downstream for research purposes without violation of
harms, for example, inferring sexual orientation terms of use. However, data subjects have the
in countries where homosexuality is a criminal right to request their data be excluded from
offence. Mobility data can provide detailed studies, regardless of how it was shared or
history of whereabouts, leading to novel accessed. For example, researchers using a
inferences (eg cultural background)206. Twitter stream must also verify whether Tweets
used in analysis have been deleted. This can
Under the UK GDPR, an identifiable person be particularly difficult in longitudinal studies.
includes someone who can be identified Social media users posting anonymously or
indirectly. In this sense, social media metadata using pseudonyms may not be matched across
is personal data. Using metadata, even platforms, making cross-platform studies at user
pseudonymised datasets can be reidentified level difficult or impossible.
– for example, by comparing the structure of
social networks207 to uncover a third party’s
approximate whereabouts208. Inferring an
individual’s identity or location through
metadata without consent – for example, with
targeted advertising – violates the UK Data
Protection Act 2018.

202 Rao D, Yarowsky D, Shreevats A, Gupta M. 2010 Classifying latent user attributes in twitter. See https://www.cs.jhu.
edu/~delip/smuc.pdf (accessed 30 March 2022).
203 Tang J, Zhang Y, Sun J, Rao J, Yu W, Chen Y, and Fong A C M. 2012 Quantitative Study of Individual Emotional States
in Social Networks. IEEE T Affect Comput. 3, 132–144.
204 Hays J, Efros A. 2008 Im2gps: estimating geographic information from a single image. Proceedings of the IEEE Conf.
on Computer Vision and Pattern Recognition (CVPR) 2008. http://graphics.cs.cmu.edu/projects/im2gps/im2gps.pdf
(accessed 27 September 2022).
205 Jahanbakhsh K, King V, Shoja GC 2012. They Know Where You Live! See https://arxiv.org/abs/1202.3504 (accessed 10
October 2022).
206 Silva TH, de Melo POSV, Almeida JM, Musolesi M, Loureiro AA F 2014. You are What you Eat (and Drink): Identifying
Cultural Boundaries by Analyzing Food & Drink Habits in Foursquare. See https://arxiv.org/abs/1404.1009 (accessed
27 September 2022).
207 Narayanan A, Shmatikov V 2009. De-anonymizing social networks. See https://www.cs.utexas.edu/~shmat/shmat_
oak09.pdf (accessed 15 August 2022).
208 Li R, Wang S, Deng H, Wang R, Chang K C C. 2012 Towards social user profiling: Unified and discriminative influence
model for inferring home locations. KDD 2012: Proceedings of the 18th ACM SIGKDD International conference on
Knowledge Discovery and Data Mining. 1023–1031. (https://doi.org/10.1145/2339530.2339692)

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 77


Chapter
Chapterfour
four

BOX 10

Facebook data: Researcher access and Cambridge Analytica

Researchers can request access to The Cambridge Analytica controversy


non-public Facebook data by creating a highlights how amalgamated data used
Facebook app using Facebook’s Open for research can be experienced as an
Graph API. Apps created by researchers invasion of collective and individual privacy.
often take the form of games, which can The result was a tightening of access to
then be installed by Facebook users who APIs and reformed policies, particularly at
agree to the app’s terms and conditions. large social media companies211. This case
These terms list which types of data the app also demonstrates how so-called privacy
will collect from your Facebook activity and mechanisms, such as APIs that restrict
share with the app developers. access to approved researchers, can
be applied in ways that do not preserve
Cambridge Analytica used this method privacy. The API performed to its technical
in creating the thisisyourlife app. The app specifications, but the use case violated
included in its terms and conditions access the intent of data subjects. Facebook
to the data of app users as well as their and Google have since experimented
friends’ data. While just over 300,000 with homomorphic encryption, federated
consenting users installed the thisisyourlife learning and differential privacy to enable
app, data from 87 million profiles advertising and market research212. In these
was collected209, 210. ways, PETs are being used to support
business-as-usual, enhancing user profiling
and targeted advertising.

Preserving privacy in social media data use


Privacy by design in social media data should APIs can be designed with consideration
address two primary concerns. The first is poor of user interface and data minimisation
information scoping, where access to user’s approaches. API users could mediate access
private information may expose more than is themselves, for example, through prompts that
required (eg sharing a user’s entire calendar contextualise the data request. The request
rather than one calendar event). The second could be embedded in the flow of the data
is the tracking of individuals, for example, subject’s intended action (not diverting their
through user ‘fingerprinting’ or cookies, or by attention). Data minimisation can be used to
logging the user’s unique metadata (eg screen expose minimal information by limiting queries
resolution, plugins installed, list of fonts and to specifics.
time zone).

209 Rosenberg M, Dance GJX. 2018 You Are the Product’: Targeted by Cambridge Analytica on Facebook. New York
Times. 8 April 2018. See https://www.nytimes.com/2018/04/08/us/facebook-users-data-harvested-cambridge-
analytica.html (accessed 14 May 2022).
210 Lawmakers publish evidence that Cambridge Analytica work helped Brexit group. Reuters. 16 April 2018. See https://
www.reuters.com/article/us-facebook-cambridge-analytica-britain/lawmakers-publish-evidence-that-cambridge-
analytica-work-helped-brexit-group-idUSKBN1HN2H5 (accessed 2 March 2022).
211 Kelly H. 2018 California just passed the nation’s toughest data privacy law. CNN. 29 June 2018. See https://money.
cnn.com/2018/06/28/technology/california-consumer-privacy-act/index.html (accessed 16 March 2022).
212 Ion M et al. 2017 Private Intersection-Sum Protocol with Applications to Attributing Aggregate Ad Conversions. See
https://eprint.iacr.org/2017/738.pdf (accessed 25 March 2022).

78 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter
Chapterfour
four

Differential privacy can be used to safeguard PETs may also be used to share social
datasets for release to researchers by media data between researchers, or to
obscuring information pertaining to specific enable open access social media databases
users in a dataset . In social media datasets, without compromising privacy. For example,
this could mean sharing regional or other centralised data stores could be built and
cohort-based data to prevent reidentification queried. This could include specific attributes,
of individuals. There are limitations around keywords, locations or other demographics in
combining data (such as layering spatial data a centralised model. Homomorphic encryption
using maps) from multiple sources, alongside or other cryptographic tools may be applied to
the addition of noise. This is one area for social network data, allowing researchers to
further research213. query to the data holders without requesting
data. The data holder could then run the
Facebook’s Data for Good programme214, query and release differentially private results.
launched in 2017, has used differential privacy Synthetic data may also be used to release
to provide access to researchers studying versions of datasets.
crucial topics, including disease transmission,
humanitarian responses to natural disasters and
extreme weather events. Where public datasets
are considered sensitive in aggregation, noise
is added to prevent reidentification using
a Differential Privacy Framework215, 216, 217.
Facebook’s Data for Good programme has
received criticism for its execution; researchers
have been denied access to the programme,
or provided with inaccurate data, invalidating
months of research218.

213 With regard to mobility data, for example, ‘as various providers stack up different sources of data in a collaborative
project such as the Network, it often erodes corrections made for differential privacy noise in a single dataset.’ Open
Data Institute COVID-19 Mobility Data Network’s Use of Facebook Data for Good Mobility Data. See http://theodi.
org/wp-content/uploads/2021/04/5-COVID-19-Mobility-Data-Networks-Use-of-Facebook-Data_v2.pdf (accessed 7
October 2022).
214 Facebook (Data for Good). See https://dataforgood.fb.com/ (accessed 18 August 2022).
215 Facebook Research (Privacy protected data for independent research on social media data). See https://research.
fb.com/blog/2020/02/new-privacy-protected-facebook-data-for-independent-research-on-social-medias-impact-on-
democracy/ (accessed 2 September 2022).
216 Jin KX, McGorman L. Data for Good: New tools to help health researchers track and combat COVID-19. Facebook
News. 6 April 2020. See https://about.fb.com/news/2020/04/data-for-good/ (accessed 15 March 2022).
217 Facebook Research (Protecting privacy in Facebook mobility data during the Covid-19 response). See https://
research.fb.com/blog/2020/06/protecting-privacy-in-facebook-mobility-data-during-the-covid-19-response/ (accessed
23 September 2022).
218 Moon M. Facebook has been giving misinformation researchers incomplete data. Engadget. See https://www.
engadget.com/facebook-misinformation-researchers-incomplete-data-050143486.html (accessed 30 August 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 79


Chapter
Chapterfour
four

BOX 11 Conclusions
In this use case, PETs are used as tools
PETs for transparency: for privacy and confidentiality, as well as
Twitter and OpenMined partnership accountability and transparency through
for algorithmic accountability external audit. While social media data is not
usually sold, social media business models
In January 2022, Twitter’s ML Ethics, depend on personal data – and derived
Transparency, and Accountability (META) insights – collected and analysed through
team announced a partnership with opaque processes. A privacy-enhanced
OpenMined to explore the use of PETs for strategy for enhancing access to data and
public accountability over social media data. increasing transparency will improve user trust
OpenMined is an open-source non-profit and mitigate legal or reputational risks for social
organisation that aims to build and promote media platforms. Furthermore, the amount
the use of PETs through educating data of compute power required to analyse large
owners and making privacy-preserving social media datasets may motivate platforms
technologies more accessible to private and to use networked PETs to provide analysis as
public organisations. a service219.

The Twitter-OpenMined partnership As the types and scale of personal data shared
proposes the use of PETs as a tool for on social media continues to expand, novel
accountability. Currently, one barrier to privacy concerns will emerge. For example, the
algorithmic accountability is external linking of consumer genomics products with
researchers and third parties lack access social media platforms is increasingly popular
to proprietary algorithms and the data on sites like Ancestry.com, or opensource
they use, rendering it difficult to conduct genetics databases such as GEDmatch or
independent investigations and audits. PETs Promethease. While open DNA databases have
in this instance may allow companies to prompted some users to consider the risks
share internal algorithms and datasets for associated with making their genome public220,
algorithmic audits and replicating research, the implications of linking an individual’s DNA
while avoiding concerns around privacy, to social media metadata (such as location,
security or intellectual property. behavioural patterns or social networks) are
less understood.
The first project will involve developing
a method of replicating internal research
findings on algorithmic amplification of
political content on Twitter by using a
synthetic dataset. Long-term, Twitter
suggests they will share their actual
internal data through PETs to enable
external researchers to conduct their own
investigations on currently non-public data.

219 The Royal Society. 2022 The online information environment: Understanding how the internet shapes people’s
engagement with scientific information. See https://royalsociety.org/-/media/policy/projects/online-information-
environment/the-online-information-environment.pdf?la=en-GB&hash=691F34A269075C0001A0E647C503DB8F
(accessed 30 March 2022).
220 Mittos A, Malin B, De Cristofaro E. 2018 Systematizing genome privacy research: A privacy-enhancing technologies
perspective. See https://arxiv.org/abs/1712.02193 (accessed 23 March 2022).

80 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter
Chapterfour
four

USE CASE 4

Synthetic data for


population‑scale insights
The opportunity
A vast amount of national-scale data and
In 2017, an Office
microdata is held in various public records
controlled by different institutions. This data for Statistics
enables greater understanding of population- Regulation
level behaviour, forecasting and ‘nowcasting’ investigation
important metrics (such as GDP or disease
prevalence) and monitoring regional
found that the
development across the UK. UK’s statistical
system’s capacity

© Unsplash /John Cameron.


The UK’s Office for National Statistics (ONS)
to link data and
is the UK’s largest independent producer of
national statistics and serves as the national provide insights
statistical institute. As the body responsible for to users was
collecting and sharing official statistics relevant lacking, causing a
to the UK economy and population, the ONS
stores and controls a wealth of high-value
There is significant appetite across the UK significant loss of
public sector to use national data to drive
data and microdata221 and substantial national value to society.
innovation and growth, to support better policy
datasets, including census data222.
and decision-making and to use AI to improve
service efficiencies. In 2017, an Office for
Statistics Regulation investigation found that the
UK’s statistical system’s capacity to link data and
provide insights to users was lacking, causing a
significant loss of value to society223.

The ONS is currently exploring how PETs


might help reverse this trend by supporting
anonymisation at population scale.

221 The Digital Economy Act 2017 provides a gateway for the ONS to access the data of all public authorities and Crown
bodies in support of the production of National Statistics and other official statistics, including the census. It also
entails powers to mandate data from some UK businesses. In some (limited) circumstances, ONS-held data may also
be shared with devolved administrations for statistical purposes. HM Government (Digital Economy Act 2017). See
https://www.legislation.gov.uk/ukpga/2017/30/contents/enacted (accessed 13 May 2022).
222 HM Government (Census Act 1920). See https://www.legislation.gov.uk/ukpga/Geo5/10-11/41/contents (accessed 23
April 2022).
223 The Office for Statistics Regulation (Joining up data for better statistics). See https://osr.statisticsauthority.gov.uk/
publication/joining-up-data/ (accessed 30 March 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 81


Chapter
Chapterfour
four

The challenge: Anonymisation in big data Privacy-preserving synthetic data (PPSD) is


Data controllers are often unable to share synthetic data generated from real-world
Synthetic data
datasets without compromising legal or ethical data to a degree of privacy that is deemed
is data that requirements to protect confidentiality. The acceptable for a given application225. PPSD may
is modelled growing availability of population-scale data, be used to enable broader access to high-value
linked datasets, access to powerful analytical datasets to drive exploration and innovation.
to represent
techniques and compute power means that It may also reduce the time for development
the statistical the risk of ‘hacking’ or ‘reverse-engineering’ of new data products by allowing early access
properties of anonymised datasets is growing224. to ‘good enough’ models, and to develop
original data. models and build pipelines while access to
Privacy-preserving synthetic data ‘real’ data is negotiated. This could also unlock
Synthetic data is data that is modelled to sensitive datasets, for example, by synthesising
represent the statistical properties of original microdata currently held in the Secure Research
data. New data values are created which, taken Service to provide access to a wider range
as a whole, reproduce the statistical properties of users226.
of the ‘real’ dataset without including any
original datapoints. Users of synthetic datasets High-value synthetic population-level
optimised for privacy may be virtually unable datasets
to identify any information pertaining to original The Data Science Campus at the ONS is
datapoints. For this reason, synthetic data has working in partnership with the Alan Turing
significant privacy-preserving potential. Institute to explore the role of PPSD in using
national datasets for public benefit227. This
does not include the use of PPSD for decision-
making, but rather to supply provisional
datasets to researchers who wish to test
systems or develop methods in non-secure
environments. It may also be used to educate,
promoting the use of ONS data sources228. The
programme is exploring the generation of PPSD
with an aim to develop a robust framework for
assessing privacy-utility trade-offs.

224 For example Rocher L, Hendrickx JM, de Montjoye Y-A. 2019 Estimating the success of re-identifications in
incomplete datasets using generative models. Nat Commun 10 3069. (https://doi.org/10.1038/s41467-019-10933-3)
225 Gartner Research 2022. Top strategic technology trends for 2022: Privacy-Enhancing Computation. See https://
www.gartner.co.uk/en/information-technology/insights/top-technology-trends#:~:text=Trend%203%3A%20
Privacy%2Denhancing%20Computation,well%20as%20growing%20consumer%20concerns (accessed 23
September 2022).
226 Government Statistical Service (Examples of data linking within the government statistical service). See https://
gss.civilservice.gov.uk/examples-of-data-linking-within-the-government-statistical-service/ (accessed 23
September 2022).
227 Office for National Statistics (Office for National Statistics and the Alan Turing Institute join forces
to produce better and faster estimates of changes to our economy). See https://www.ons.gov.
uk/methodology/methodologicalpublications/generalmethodology/onsworkingpaperseries/
onsmethodologyworkingpaperseriesnumber16syntheticdatapilot (accessed 23 September 2022).
228 Office for National Statistics (Synthetic data pilot working paper). See https://www.ons.gov.
uk/methodology/methodologicalpublications/generalmethodology/onsworkingpaperseries/
onsmethodologyworkingpaperseriesnumber16syntheticdatapilot (accessed 23 September 2022).

82 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter
Chapterfour
four

Synthetic data can also be used to improve There are several prerequisites to
the quality of data. This is achieved through implementing PPSD. The first is a consistent
data augmentation and other techniques229 and comprehensive way to evaluate synthetic
that address incompleteness in datasets, datasets. The ONS is addressing this issue
particularly where populations are small or through a framework, which will be in the form
less represented. There are potential issues of a Python library. The framework will assess
with skew or bias in these cases, which must the performance of synthetic datasets in terms
be addressed. of both utility and privacy.

Although synthetic data techniques may be Second is the investigation and assessment
applied to virtually any data, ranging from of synthetic data generation methods. This
imagery to textual, three high-value datasets means exploring off-the-shelf methods such
illustrate the potential for this technology: as Synthpop230, as well as more sophisticated
• A high-quality synthetic version of the machine and deep learning methods such
Census-Health-Mortality dataset as Generative Adversarial Networks and
(the ‘health asset’) would allow the ONS Evolutionary Optimisation. This requires a deal
to share realistic data quickly with many of technical expertise in implementation, as well
research partners, speeding up research and as deep knowledge of the context, risk factors
innovation by allowing a wide variety of users (adversaries and threat models) and potential
to rapidly develop models and hypotheses, for downstream harms.
and build pipelines which can then be applied
to the real data for decision-making; A synthetic dataset with all the utility of the
original dataset cannot offer privacy. For this
• Synthetic versions of telecoms mobility data
reason, high-dimensional datasets (which
would enable ONS and cross-government
contain many variables) may not be suitable for
partners to fully assess the opportunities for
PSSD generation. Rather, an external researcher
this data, before going to procurement. This
or client might request a custom synthesised
would provide better value for money and
dataset pertaining to a specific question (calling
would improve official mobility-based statistics
on a limited number of attributes or variables). In
such as those relating to COVID-19 analysis
this way, greater utility may be offered without
• Synthesis of administrative data higher risk of privacy loss231.
would allow for the off-line exploration of
synthetic data allowing for a single, well PPSD may also be layered with other PETs to
defined data extract request being made enhance its privacy preserving potential. For
to the data owners. If this is not practical, a example, synthetic data can be generated with
full tested and robust data pipeline could differential privacy guarantees, offering greater
be developed to process and analyse the assurance of privacy. However, further erosion
sensitive data in situ. of utility must be considered when adding noise
to a synthetic dataset232.

229 For example, missing value imputation and removing class imbalances.
230 Synthpop (Homepage). See https://cran.r-project.org/web/packages/synthpop/vignettes/synthpop.pdf (accessed 23
September 2022).
231 Jordon J et al. 2022 Synthetic data: What, why and how? See https://arxiv.org/pdf/2205.03257.pdf (accessed 2
September 2022).
232 Jordon J, Yoon J, van der Schaar M. 2019 PATE-GAN: Generating synthetic data with differential privacy guarantees.
See https://openreview.net/pdf?id=S1zk9iRqF7 (accessed 26 September 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 83


Chapter
Chapterfour
four

BOX 12

Privacy-preserving synthetic data framework for population-scale patient data

The Clinical Practice Research Datalink CPRD has now made two high-fidelity
(CPRD)233 is the Medicine and Healthcare synthetic datasets available234: a
products Regulatory Agency (MHRA’s) cardiovascular disease synthetic dataset
real world data research service created and a COVID-19 symptoms and risk
to support retrospective and prospective factors synthetic dataset. Both synthetic
public health and clinical studies. CPRD datasets are generated from anonymised
is jointly sponsored by the MHRA and the real primary care patient data extracted
National Institute for Health Research (NIHR) from the CPRD Aurum database235 and
as part of the Department of Health and are available to researchers for a nominal
Social Care. administrative fee.

CPRD collects anonymised patient data The MHRA was motivated to explore
from a network of GP practices across the synthetic data generation methods to
UK. Since 2018, CPRD has working on the support regulatory requirements for
development of synthetic datasets based on external validation of machine learning
GP patient data to maximise the benefit of (ML) and AI algorithms. Anonymised health
this valuable data, while balancing privacy datasets have high utility, but still carry
concerns and preventing downstream harm residual privacy risks which limit their
to data subjects. These synthetic datasets wider access236; a fully synthetic approach
can be used as sample datasets, enabling can substantially mitigate these risks237.
third parties to develop, validate and test In some cases, synthetic data may even
analytic tools. They can also be used improve the utility of anonymised data –
for training purposes, and for improving its potential to be clinically meaningful.
algorithms and machine learning workflows. This is because anonymised data may entail
gaps, which can lead to biased inferences.
Synthetic data can be used in these cases
to supplement real data by filling the gaps
or boosting underrepresented subgroups
in the dataset238.

233 Clinical Practice Research Datalink (Homepage). See https://cprd.com/ (accessed 17 September 2022).
234 Clinical Practice Research Datalink (Synthetic data CPRD cardiovascular disease synthetic dataset). See https://cprd.
com/synthetic-data#CPRD%20cardiovascular%20disease%20synthetic%20dataset (accessed 23 September 2022).
235 CPRD Aurum contains routinely collected data from practices using EMIS Web® electronic patient record system
software. Clinical Practive Research Datalink (Primary care data public health research). See https://cprd.com/primary-
care-data-public-health-research (accessed 23 September 2022).
236 Sweeney L. 2000 Simple Demographics Often Identify People Uniquely. Carnegie Mellon University, Data Privacy
Working Paper 3.
237 Park Y, Ghosh J. 2014 PeGS: perturbed Gibbs samplers that generate privacy-compliant synthetic data. Trans Data
Privacy. 7, 253—282.
238 Wu, L., He, H., Zaïane, O. R. 2013 Utility of privacy preservation for health data publishing. Proceedings of the 26th
IEEE International Symposium on Computer-Based Medical Systems. 510—511.

84 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter
Chapterfour
four

Conclusions
Synthetic data can be useful for expediting
CPRD uses the Synthetic Data Generation data projects and enabling partnerships. For
and Eva luation Framework239 to guide example, organisations can test whether a
synthetic data generation. It consists of partnership is worthwhile and start building
a set of procedures, including a ground models while waiting for access (such as
truth selection process as input, a synthetic through data sharing agreements or other
data generation procedure, and an means). Whether or not synthetic data will
evaluation process. provide a stand-in for useful and sufficiently
private data for analytical use cases remains
The Synthetic Data Generation Framework an open question.
has been proven to produce effective
synthetic alternatives to ‘real’ health data. The generation of synthetic datasets,
This is particularly beneficial when 1) access even ‘good enough’ synthetic versions, is
to the ground truth data is restricted; 2) challenging. As yet, there are no standards
when the sample size is not large enough, related to privacy in PPSD generation, though
or representative of a population; 3) when emerging synthetic data standards may include
lacking machine learning, AI training or privacy metrics241. Further research is required
testing datasets. There are limitations and to quantify the privacy-utility trade-offs242. To
challenges to consider during synthetic these ends, the ONS plans to test with data
data generation outside the framework, owners and the wider data community as part
including data missingness and the of their synthetic data project.
complex interactions between variables.
The Synthetic Data Generation Framework
used by CPRD is flexible enough to allow
for generation of different types of synthetic
datasets, while at the same time enabling
researchers to demonstrate that they have
balanced data utility with patient privacy
needs.

Access to the synthetic datasets requires


a data sharing agreement with the
applicant’s organisation for access (this is
in line with advice received from the ICO
Innovation Hub)240.

239 Wang Z, Myles P, Tucker A. 2021 Generating and evaluating cross-sectional synthetic electronic healthcare data:
Preserving data utility and patient privacy. Comput Intell. 37, 819—851.
240 The Synthetic Data Generation and Evaluation Framework, owned by the MHRA, was developed through a grant from
the Regulators’ Pioneer Fund launched by BEIS and managed by Innovate UK. Further development of the COVID-19
synthetic data and refinement of synthetic data generation methods was funded by NHSX.
241 Institute of Electrical and Electronics Engineers (Synthetic data standards). See https://standards.ieee.org/industry-
connections/synthetic-data/ (accessed 18 August 2022).
242 One recent publication finds the privacy gain is highly variable, and utility loss unpredictable, when used in high-
dimensional datasets: Stadler T, Oprisanu B, Troncoso C et al. 2021 Synthetic Data—Anonymisation Groundhog Day.
See https://arxiv.org/abs/2011.07018 (accessed 27 September 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 85


Chapter
Chapterfour
four

USE CASE 5.1

Pets in the public sector: Collective intelligence, crime prevention and online voting

Collaborative analysis
for collective intelligence
The opportunity
A wealth of data is collected and stored
across government departments and non-
public bodies in the UK and abroad. This
data potentially holds insights that could save
substantial money, make government services
more efficient and effective, drive the transition
to net zero by 2050 (see page 67), guide
life-saving choices during a pandemic or
understand the effect of regional policies.

Much of the data required to tackle social

© iStock / vm.
challenges is sensitive. Particularly where
politically sensitive data is used, there are
inherent security risks. While there are some
special accessions to using health data
SMPC has been demonstrated using large-
during emergencies, intra-departmental
scale studies on government data since
collaboration must adhere to privacy legislation,
2015243. The performance of SMPC relates
including data protection. As such, the risk of
to the analysis, or functions, to be computed.
collaboration between departments may be
Summations (adding numbers together) are
deemed larger than potential benefits.
faster than more complex computations244.
This is a rapidly advancing technology with the
Collaborative analysis with SMPC
potential for use in long-term data governance;
Secure multi-party computation (SMPC) allows
this is because SMPC depends on access
multiple parties to jointly compute a function
control by all parties involved, meaning
using inputs from all parties, while keeping
analysis can only be performed if all parties
those inputs private. In this way, SMPC is a tool
agree. SMPC protocols ensure input privacy
for securely generating insights using data
(no information can be obtained or inferred by
held by different departments or organisations.
any party aside from their own input and the
For example, in a health study, patient data
output). As such, SMPC may provide a generic,
may be input from different hospitals, or even
standardised – and potentially certifiable –
combined with other datasets – such as social
method for computation on encrypted data245.
demographic data – without researchers ever
seeing or accessing the data directly.

243 Bogdanov D, Kamm L, Kubo B, Rebane R, Sokk V. 2015 Students and taxes: a privacy-preserving social study using
secure computation. See https://eprint.iacr.org/2015/1159.pdf (accessed 25 September 2022).
244 UN PET Lab Handbook. See https://unstats.un.org/bigdata/task-teams/privacy/UN%20Handbook%20for%20Privacy-
Preserving%20Techniques.pdf (accessed 17 July 2022).
245 Archer DW et al. 2018 From keys to databases: real-world applications of secure multi-party computation. See https://
eprint.iacr.org/2018/450 (accessed 10 October 2022).

86 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter
Chapterfour
four

There are a number of potential uses of BOX 13


SMPC in the public sector and in cross-sector
partnerships; a few examples include: Improving data quality and
• Combining cyber intelligence housed in accuracy: Collaborative analysis
various government departments and for compliance
ministries to identify cyber threats and
incidents (such as the Dutch Ministry of Part of Société Générale’s London-based
Justice and National Cyber Security Centre Greenhouse incubator programme,
partnership, Secure Net)246; Secretarium is an ‘integrity and
confidentiality platform’ that uses PETs
• Combining data from different social domains
to help financial institutions meet EU
to ensure government funds and interventions
reporting requirements247. The EU Markets
are well targeted; for example, combining
in Financial Instruments Directive II (MiFID
detailed social statistics with healthcare
II) requires financial organisations to report
costs to see where government actions on
trade data to regulators. This entails using
preventions should be targeted and;
reference data of varying quality, making
• Establishing a decentralised register for the task difficult and potentially ineffective.
businesses, law enforcement and banks Data quality can be improved if multiple
to log fraud incidents (including details on firms compare their client reference data
company name, individuals and account to identify inaccuracies. However, financial
numbers). Parties would only be able to test institutions are not inclined to share data
whether a name or account number has been with competitors, as this would include client
registered in a previous fraud, such as during lists and sensitive personal data.
due diligence checks. Parties could also run
analysis to identify trends in modus operandi, Secretarium uses a distributed, confidential
allowing them to take preventative measures. computing platform to enable multiple
institutions to compare data in a ‘blind’
fashion. It uses a distributed confidential
computing format to benchmark reference
data quality. A group of secured computers
contain the organisations’ reference data
in an encrypted form, and the computers
process the data without providing access
to any individuals or organisations, even
Secretarium itself.

246 Hazebroek E, Jonkers K, Segers T. 2021 Secure net NCSC partnership for rapid and safe data sharing. See https://
emagazine.one-conference.nl/2021/secure-net-ncscs-partnership-for-rapid-and-safe-information-sharing/ (accessed
23 September 2022).
247 Secretarium (Homepage) See https://secretarium.com/ (accessed 27 September 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 87


Chapter
Chapterfour
four

SMPC can drive public sector efficiency Only registered parties can contribute to SMPC
by allowing for safe and rapid collective analyses. Registered parties should not have
intelligence. While technical challenges such as intent to input information that is invalid (eg
performance and compute power were once reporting false information).
primary challenges to implementation, this is no
longer the case. One of the biggest challenges SMPC applications may be purchased as a
around SMPC is the understanding of legal software package, which enables different
implications, for instance the impact of EU and parties to collaborate on sensitive data through
UK GDPR requirements. Other challenges analysis ‘in the blind’. While open frameworks
include alignment of data structures and formats require deep knowledge of SMPC, suppliers
(interoperability), reliability and auditability, are trialling software that will be usable by
data availability and complexity of ongoing data scientists with no previous experience
management of SMPC248.. with SMPC.

248 The Financial Action Taskforce. 2021 Stocktake on data pooling, collaborative analytics and data protection. See
https://www.fatf-gafi.org/media/fatf/documents/Stocktake-Datapooling-Collaborative-Analytics.pdf (accessed 22
September 2022).

88 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter
Chapterfour
four

BOX 13

Public-private partnerships for PETs in the Dutch public sector

Roseman Labs in the Netherlands is encouraging the uptake of PETs in the Dutch public sector
through creative, low-risk collaborations that demonstrate the value of SPMC.

First, they identify use cases for SMPC • Reducing money laundering.
relevant to a given public body. In some Where multiple banks are able to
instances, a use case idea is generated generate graphs using transaction
through scoping conversations between data, these graphs can be compared
public sector stakeholders. The use case using SMPC for patterns that suggest
idea is then formulated into a pilot project, money laundering (namely, money going
or proof-of-concept, which can be carried in circles, or ‘smurfing’, where many
out within the low-risk public procurement small transactions that are ultimately
threshold (for example, conducting a six- deposited with one entity). This cross-
month trial). Once the economic and social bank identification of patterns is far more
value of the SMPC solution becomes clear, reliable than each individual bank looking
the public sector organisation may begin an at their own data – often generating a
informed RFI process with an aim to scale- very large number of false positives. With
up the solution long-term. this cross-bank approach, the likelihood
of spotting true positives increases and
This has resulted in successful applications, Banks and LEAs can then focus resources.
including: This allows law enforcement to set
• Increasing digital resilience with the Dutch priorities. This should open up private
National Cyber Security Centre (NCSC). partnerships between banks, for example,
The NCSC collects cybersecurity where thousands of employees are
intelligence from organisations across dedicated to identifying potential money
the Netherlands, which report risks such laundering incidents (compared to just
as hacking or ransomware incidents. hundreds at the national public sector
Organisations are not motivated to level)249.
publish data on security breaches, which
Roseman Labs has technical and in-house
could compromise their reputation and
legal expertise (complemented with external
marketability. An SMPC system now
privacy experts), meaning they are able to
allows the NCSC to collect intelligence
prescribe a data-use solution that meets
on cyber security risks from over tens of
current data protection requirements,
organisations (scaling to 15.000 over time)
helping clients to complete the Data
in the Netherlands in a private fashion:
Protection Impact (DPI) process together.
each organisation inputs data on cyber
This added value bolsters their work with
attacks and breeches in a fully anonymous
public sector clients.
and confidential way on a weekly basis.
The NCSC does not see provenance
information but is able to identify trends
and take action accordingly;

249 Roseman Labs (Secure data collaboration in financial services). See https://rosemanlabs.com/blog/financial_services.
html (accessed 10 October 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 89


Chapter four

USE CASE 5.2

Online safety: Harmful content detection


on encrypted platforms
The challenge Detection of harmful and illegal
In April 2019, the UK Government published content online
the Online Harms White Paper250; this paper Many social media companies use live
identified the need to address the negative moderation, employed teams of human
consequences that arise from individuals moderators, or automated detection systems to
being online, both for social cohesion and help combat harmful and illegal content on their
for democratic society. The paper set out platforms. Automated detection systems range
a programme of action to tackle content or from simple approaches such as matching
activity that harms individual users, particularly images (using hashes) to the use of deep
children, either by undermining national learning models trained on material that may
security, or by destabilising shared rights be illegal. This poses a particular challenge
and responsibilities. Many of the measures to an automated approach to detecting
suggested in the white paper require social harmful content255.
media platforms to take action251; Social media
companies are required to identify and prevent Efforts to detect CSEA material can be severely
the sharing of harmful and illegal content for a restricted by end-to-end encryption: a secure
number of legal reasons252. Other motivations method of transferring information (including
for regulating harmful content on private messages or images). Due to the privacy
platforms include fear of new legislation and afforded, encrypted messaging platforms can
potential public relations backlash influencing mask the sharing of illegal content.
their user base253, 254.

One of the most serious forms of online


offending is child sexual exploitation and abuse
(CSEA). Since CSEA material can be shared and
disseminated through social media platforms,
the Online Harms White Paper identified social
media companies as responsible for protecting
their users from harm.

250 HM Government. Online Harms White Paper. See https://assets.publishing.service.gov.uk/government/uploads/


system/uploads/attachment_data/file/973939/Online_Harms_White_Paper_V2.pdf (accessed 23 January 2022).
251 HM Government (Online Harms White Paper: consultation outcome). See https://www.gov.uk/government/
consultations/online-harms-white-paper (accessed 15 March 2022).
252 Current legislation is complex, and includes the Malicious Communications Act 1988, the Communications Act 2003,
the Public Order Act 1986, and the Investigatory Powers Act 2016.
253 Internet Watch Foundation (Our MOU, the law and assessing content). See https://www.iwf.org.uk/what-we-do/how-
we-assess-and-remove-content/laws-and-assessment-levels (accessed 28 July 2022).
254 House of Commons Library. 2022 Regulating online harms (research briefing). See https://researchbriefings.files.
parliament.uk/documents/CBP-8743/CBP-8743.pdf (accessed 27 September 2022).
255 Gillespie T. 2020 Content moderation, AI and the question of scale. Big Data & Society.
7. (https://doi.org/10.1177/2053951720943234)

90 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

FIGURE 7

Homomorphic encryption depicted in the context of a client-server model.

The client sends encrypted data to a server, where a specific analysis is performed on the
encrypted data, without decrypting that data. The encrypted result is then sent to the client,
who can decrypt it to obtain the result of the analysis they wished to outsource.

KEY

X Encryption of x
FF X Encryption of F(x)

F X
ANALYSIS

F X
FF X

CLIENT SERVER

While the UK Government has considered the Homomorphic encryption (HE) has been
banning of end-to-end encryption in efforts demonstrated as a PET that allows for the
to stymie CSEA material sharing256, end- analysis of encrypted data, and which could be
to-end encryption offers critical benefits to used as a tool for identifying CSEA material on
private citizens and must be preserved and encrypted platforms. Apple’s planned roll-out
promoted257. Recent technical advances may of a very similar programme received criticism
provide a solution to detecting harmful content from privacy rights groups. The Apple case
without ending end-to-end encryption or the illustrates how PETs may be applied in ways
privacy of individual users. perceived to violate, rather than preserve,
privacy. This use case is intended to provide
an explanation, rather than an endorsement,
of how PETs could be used to detect illegal
material on encrypted platforms.

256 HM Government (International statement: End-to-end encryption and public safety). See https://www.gov.
uk/government/publications/international-statement-end-to-end-encryption-and-public-safety (accessed 20
September 2022).
257 The Royal Society. 2016 Progress and research in cybersecurity: Supporting a resilient and trustworthy system for
the UK. See https://royalsociety.org/-/media/policy/projects/cybersecurity-research/cybersecurity-research-report.pdf
(accessed 27 September 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 91


Chapter four

Underpinning technology: Image matching PETs and image matching


through hashing PETs can help to ensure that image matching is
Image hashing is a simple way to detect a done securely. A system used to detect CSEA
specific image being shared, or to identify material may be held within the image library
an image contained in high volumes of data. of a mobile device. A verified third party (such
Hashing algorithms produce an output called as a law enforcement agency) would retain a
a message digest, which is a unique text matching database with hashes of images that
‘fingerprint’ derived from an image (or other have been categorised as CSEA material. The
input). Message digests are short and easy system could check whether the hashes of the
to compare, yet unique to individual images. images stored on the mobile device match any
Message digests cannot be reversed from of the known illegal material in the matching
their text form back into an image. As such, database without sending the matching
these ‘fingerprints’ are well suited for detecting database out to the mobile device or revealing
matching images held digitally without revealing the user’s photos.
the images themselves. First, an example
image is hashed to a message digest, then this Private Set Intersection (PSI) allows two parties
is compared to the digests of candidate files. who independently hold data elements to
If a match is found, this means the image with find the intersection of their data – that is,
an identical digest is the same as the example the elements held in common between two
image. Such lists of illicit or illegal hashes are parties. In this system, PSI could be used to
known as ‘matching databases’. allow a third party to detect any image hashes
which match their matching database without
Typical hashing does not account for similarity sharing the hashes. In this way, the third party
between images. If one bit in an image is learns only about any images which match their
changed, the hash will change completely. This own hash list, but nothing about any images
property helps ensure a low false detection which do not match. Security is preserved for
rate; however, this means that small alterations all non-matching elements. In not sharing the
– like subtle changes in colour, rotations, hash lists, the risks of bad actors being able to
skewing, or mirroring – could enable the image circumvent the detection using this knowledge
to evade typical hashing. is eliminated.

Alternative hashing techniques may address The ability to securely detect CSEA on mobile
these issues. One alternative is Locality devices requires the combination of several
Sensitive Hashing (LSH), which accounts for techniques. Perceptual hashing can help to
visually similar images (it intentionally hashes match images with a matching database of illicit
similar inputs to close or identical outputs). LSH material, even images with small perceptual
and similar alternatives are useful where small changes. Combining this with private set
variations in the input image are expected. intersection preserves the security of the
However, transformations of the image, such matching database, whilst providing privacy
as mirroring, could result in completely distinct for individuals. Together, these technologies
raw data and would not be matched. The could be used to develop a robust CSEA
digital definition of ‘similar’ is not necessarily detection system that does not compromise
comparable to human perceptions of similarity. end‑to‑end encryption.

92 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

Risks and challenges Legal challenges and public trust must also be
One challenge is the potential for ‘scope addressed. Law enforcement agencies would
creep’ – the adding of additional functionality need to be clear on the legal basis for running
above and beyond the detection of CSEA such systems, which may constitute a passing
material. This may include, for example, state on of their legal duties to third parties. This has
actors using the technology to counter digital implications for public trust, particularly where
piracy, digital rights management, or for national on-device screening is used.
security and surveillance purposes. Platforms
may face reputational risk and loss of users if The UK government aims to minimise the
systems were perceived as disproportionate existence of spaces online where illegal
surveillance tools. material can be securely shared. Likewise,
social media companies are motivated to
While perceptual hashing algorithms allow for ensure users are not breaching their terms
modified images to be matched against the of use, even in encrypted spaces. A PETs-
database, they are also more likely to flag false enabled system for identifying illegal material is
positives. Innocent images may appear close an alternative to privacy rollbacks such as the
enough to illicit images to be flagged by the outright banning of end-to-end encryption.
machine learning system. This could lead to
innocent people being identified as possessing
CSEA material, entailing negative impact for the
individual. The performance of the perceptual
hashing system would need to be closely
tested and monitored to measure the false
positive rates. Ultimately, a human moderator
should always verify whether flagged material
is illegal or harmful; a user should never be
charged based on the automated system
detection alone.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 93


Chapter four

BOX 14

Apple Tech child safety features

In August 2021 Apple announced new child Critics argued this could make the system
safety features to be implemented on its US vulnerable to state censorship of political
devices. Three planned changes aimed to dissent or LGBTQ+ content, or flagging of
mitigate child sexual abuse. One change innocent images, causing unnecessary
related to iCloud Photos, which would scan distress. Further criticism targeted the
images to find CSEA. While cloud service efficacy of the system: researchers reverse-
companies such as Google, Microsoft engineered the hashing algorithm and were
and Dropbox already scan material for able to create images that were falsely
CSEA, Apple planned to conduct scans on flagged by the system260.
personal iPhone devices using a technology
called NeuralHash. In September 2021, Apple announced
it was pausing implementation of CSAM
NeuralHash scans images without revealing scanning to collect feedback and make
them to moderators. It translates the image improvements. In April 2022, Apple
into a unique number (a hash) based on its announced its intention to introduce the
features. Before uploading to iCloud Photos, parental control safety feature on the
the hash is compared on-device against a Messages app on iPhones in the UK261.
database of known CSEA hashes provided
by child safety organisations. Any matches It is unclear how an image hashing program
prompt the creation of a cryptographic would operate under UK and EU data
safety voucher. If a user reaches a threshold protection law. On-device screening would
of safety vouchers, they are decrypted and likely entail explicit consent and user opt-in
shared with Apple moderators for review258. (rather than opt-out)262. User images are not
necessarily personal data under the GDPR;
The proposals were welcomed by many, they must depict identifiable living people,
including child safety organisations259. or be linked to a living person, to constitute
However, the image hashing feature personal data. However, neural hashes may
faced criticism from privacy advocates, constitute personal data. These emerging
cryptographers and other tech companies, legal questions, as well as general public
who viewed Apple’s proposals as scepticism, suggest that an on-device
introducing a backdoor on their devices. detection system may face barriers in the
UK or EU contexts.

258 Apple. CSAM detection: technical summary. See https://www.apple.com/child-safety/pdf/CSAM_Detection_Technical_


Summary.pdf (accessed 20 March 2022).
259 O’Neill PH. 2021 Apple defends its new anti-child-abuse tech against privacy concerns. MIT Technology Review. 6
August 2021. See https://www.technologyreview.com/2021/08/06/1030852/apple-child-abuse-scanning-surveillance/
(accessed 22 March 2022).
260 Brandom R. 2021 Apple says collision in child-abuse hashing system is not a concern. The Verge. 18 August 2021.
See https://www.theverge.com/2021/8/18/22630439/apple-csam-neuralhash-collision-vulnerability-flaw-cryptography
(accessed 10 October 2022).
261 Hern A. 2022 Apple to roll out child safety feature that scans messages for nudity to UK iPhones. The Guardian. 20
April 2022. See https://www.theguardian.com/technology/2022/apr/20/apple-says-new-child-safety-feature-to-be-
rolled-out-for-uk-iphones (accessed 23 April 2022).
262 Cobbe J. 2021 Data protection, ePrivacy, and the prospects for Apple’s on-device CSAM Detection system in Europe.
SocArXiv Papers. See 10.31235/osf.io/rhw8c (accessed 10 October 2022).

94 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

USE CASE 5.3

Privacy and verifiability in online voting


and electronic public consultation
The opportunity The challenge
Remote online voting offers to bring the ballot to Democratic elections depend on security and
the voter, allowing convenience, flexibility and auditability for fair and accurate collection
greater access to the democratic process263. and counting of votes. In online voting
these analogue threats become digital,
Cryptography plays a critical role in providing requiring solutions that can ensure votes
electronic voting and counting264. Online voting are kept accurate, secret, anonymous and
has been used in elections in Estonia since auditable simultaneously.
Cybernetica’s IT Lab built the first online voting
solution in 2005265. Following the launch of Many approaches can be used for electronic
multi-channel voting (in which votes can be and internet voting, some of which include
cast using mail, traditional written ballots or PETs. For example, homomorphic encryption
online), voter participation has risen in Estonia, can be used in electronic voting to achieve
with 47% of voters voting online in 2021. Online security; election results can be tallied without
voting has reportedly reduced public spending decrypting the votes267. This works well in
on elections266. small-scale elections. However, compute power
entails high costs when scaled up.

It may be that hybrid schemes, which use


cryptographic tools layered with other
approaches (such as blockchain) may be
the most robust. One such solution has
recently been prototyped by the Smartmatic-
Cybernetica Centre of Excellence for Internet
Voting (SCCEIV)268..

263 WebRoots Democracy. 2020 The Cratos Principles. See https://webrootsdemocracy.files.wordpress.com/2020/04/


the-cratos-principles-webroots-democracy-v2.pdf (accessed 20 August 2022).
264 National Democratic Institute (The important uses of cryptography in electronic voting and counting). See https://www.
ndi.org/e-voting-guide/examples/cryptography-in-e-voting (accessed 2 September 2022).
265 Smartmatic (Estonia: the world’s longest standing, most advanced voting solution). See https://www.smartmatic.
com/case-studies/article/estonia-the-worlds-longest-standing-most-advanced-internet-voting-solution/ (accessed 10
October 2022).
266 Krimmer R, Duenas-Cid D, Krivonosova I, Vinkel P, Koitmae A. 2018 How much does an e-vote cost? Cost
comparison per vote in multichannel elections in Estonia. Lecture Notes in Computer Science (Conference paper)
117—131. (https://doi.org/10.1007/978-3-030-00419-4_8)
267 National Democratic Institute (The important uses of cryptography in electronic voting and counting). See https://www.
ndi.org/e-voting-guide/examples/cryptography-in-e-voting (accessed 2 September 2022).
268 Smartmatic (Smartmatic—Cybernetica awarded European Commission blockchain research project). See https://www.
smartmatic.com/media/article/smartmatic-cybernetica-awarded-european-commission-blockchain-research-project/
(accessed 10 October 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 95


Chapter four

In 2014, Cybernetica partnered with Smartmatic Tiviledge is currently a research prototype


to develop TIVI, an online voting solution that for experimental and developmental use; it is
aims to guarantee end-to-end integrity in not open source. Several key areas must be
remote voting. TIVI has been used in Estonia, addressed prior to any legally binding use of
Norway, Chile and parts of the US. With this the technology for voting. First, compatibility
technology, a voter verifies their identity using with legal requirements must be considered
a digital or mobile ID. i-Voting is optional. A within a given jurisdiction, particularly where
voter can cast multiple i-Votes, with only the there are complex voting protocols (eg beyond
final vote counted; a paper vote always takes standard ‘winner takes all’ models). Second, a
precedence over an i-Vote. more robust system of verification will be key to
avoid fraud or breach of voter privacy. Third, the
PETs and distributed ledgers: Tiviledge protection of the election privacy key should
SCCEIV more recently developed Tiviledge, be considered (for example, using hardware
a prototype for privacy-preserving, auditable security) to prevent an attacker from gaining
i-voting. It can be used with the TIVI platform access to information.
and includes PETs. It focuses on making
election data available for independent audits Tiviledge is one prototype developed by the
while meeting the condition of a secret ballot269. PRIViLEDGE project270, funded by Horizon
Europe (see page 28).
Tiviledge uses zero knowledge proofs and
secure multi-party computation to verify votes
and summate totals. It writes the results on
an immutable, auditable distributed ledger.
A distributed ledger is a shared database,
which can serve as a public record. It may
only be added to: any tampering attempt is
made obvious because it is synchronised
and distributed across multiple hosts. This
guarantees integrity of the record. Today,
elections rely heavily on a central organisation,
and trusting the integrity of an election means
trusting a single entity. A distributed ledger
means election results are verifiable to
external auditors.

269 Archer DW et al. 2018 From keys to databases: Real-world applications of secure multi-party computation. Comput J.
61, 1749—1771. (https://doi.org/10.1093/comjnl/bxy090)
270 PRIViLEDGE Project (Homepage). See https://priviledge-project.eu/ (accessed 30 March 2022).

96 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

USE CASE 6

PETs and the mosaic effect:


Sharing humanitarian data in
emergencies and fragile contexts
The opportunity
In the last ten years there has been a substantial
rise in the volume and variety of data produced
during and for humanitarian responses and
development programmes. Humanitarian data
may contain telecommunications, messaging
and other ICT data, information from mobile
money or cash transfer applications, banking
or smart cards, as well as social media data271.
It can include contextual data (such as damage

© iStock / Phynart Studio.


assessment or geospatial data), information
about people affected by a crisis (including
their needs) or information related to response
efforts (such as transportation infrastructure,
food prices, or the availability of education
facilities)272. Emergency or crisis-related data
At a larger scale, and over time, humanitarian
may include traditional humanitarian data, as
and crises data can inform understandings of
well as user generated data, such as social
patterns – such as environmental catastrophes,
media posts or locations entered through GPS
or cycles of social conflict – assisting in
tracking apps.
anticipatory action and the reduction of
negative impacts. The use of long-term crisis
insights during the COVID-19 pandemic has led
to wider reflections on data governance in the
early outbreak of COVID-19 and the role PETs
might have played273, 274.

271 Privacy International. 2018 The humanitarian data problem: ‘doing no harm’ in the digital era. See https://
privacyinternational.org/sites/default/files/2018-12/The%20Humanitarian%20Metadata%20Problem%20-%20
Doing%20No%20Harm%20in%20the%20Digital%20Era.pdf (accessed 10 October 2022).
272 OCHA Centre for Humanitarian Data 2021. Data Responsibility Guidelines. See https://data.humdata.org/
dataset/2048a947-5714-4220-905b-e662cbcd14c8/resource/60050608-0095-4c11-86cd-0a1fc5c29fd9/download/
ocha-data-responsibility-guidelines_2021.pdf (accessed 10 October 2022).
273 El Emam K. 2020 Viewpoint: Implementing privacy-enhancing technologies in the time of a pandemic. Journal of
Data Protection & Privacy. 3, 344—352.
274 Shainski R, Dixon W. 2020 How privacy enhancing technologies can help COVID-19 tracing efforts. World Economic
Forum Agenda. 22 May 2020. See https://www.weforum.org/agenda/2020/05/how-privacy-enhancing-technologies-
can-help-covid-19-tracing-efforts/ (accessed 10 October 2022).

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 97


Chapter four

The challenge The UN Global Pulse recommended in 2015


Today, big data plays a fundamental role in that risks in humanitarian data use be assessed
responses to humanitarian crises and other according to level of data security and
emergency scenarios. At the same time, new availability of PETs; they also recommended
technologies – such as biometrics, mobile using PETs in conjunction with other methods
banking and drones – simultaneously provide (such as anonymisation) to employ privacy by
new avenues for security and privacy risks. design principles from the outset. However, few
examples of PETs in humanitarian data exist,
Humanitarian datasets contain information possibly because they are highly technical and
about some of the world’s most at-risk so can be expensive to deploy.
people, including refugees and internally
displaced people fleeing their homes due to The mosaic effect risk
persecution, conflict, and disaster275. The risk of The mosaic effect risk is described as the
reidentification of an individual, or the disclosure potential for ‘disparate items of information,
of a personal attribute or characteristic, often though individually of limited or no utility to their
entails magnified harms in these fragile possessor, [to] take on added significance when
contexts: as the Resolution on Privacy and combined with other items of information’277.
International Humanitarian Action outlines, The mosaic effect suggests that even de-
‘data that would normally not be considered as identified or pseudonymous data can be
sensitive under data protection laws may be reidentified if other datasets or complementary
very sensitive in humanitarian emergencies’ information are combined, revealing significant
context’276. Responsible data stewardship new information. This could disclose, for
must ensure the safety of these groups by example, the identity and location of people
understanding potential harms and working from minoritised groups. While such information
toward prevention. could be used to inform effective humanitarian
responses, it could also be used to do harm.
As datasets accumulate, so too does the
likelihood of content overlap between them.
This is especially true in concentrated settings,
such as refugee camps. The more information
that is common across multiple datasets, the
higher the disclosure risk posed by the ‘mosaic
effect’: the potential for individuals or groups
to be re-identified through using datasets in
combination, even though each dataset has
been made individually safe.

275 Inter-Agency Standing Committee 2021. Operational Guidance on Data Responsibility in Humanitarian Action. See
https://interagencystandingcommittee.org/system/files/2021-02/IASC%20Operational%20Guidance%20on%20
Data%20Responsibility%20in%20Humanitarian%20Action-%20February%202021.pdf (accessed 10 October 2022).
276 Global Privacy Assembly. 2015 37th International Conference of Data Protection and Privacy Commissioners. See
http://globalprivacyassembly.org/wp-content/uploads/2015/02/Resolution-on-Privacy-and-International-Humanitarian-
Action.pdf (accessed 10 October 2022).
277 Pozen DE. 2005 The mosaic theory, national security, and the freedom of information act. Yale L J. 115.

98 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

BOX 15

Collective intelligence in disaster management and emergency response

Emergency and disaster management are • A machine learning algorithm and


significant areas of research that use social semantic analysis were used to classify
media data. Using near-real time data, tweets to detect earthquakes in
researchers identify communities impacted, Japan. The team at The University of
the geographical spread of an event, and Tokyo was able to detect earthquakes
gain understanding of public behaviour registering magnitude 3+ with high
during a disaster278. In these cases, social probability (93% of those detected by
media data represents the collective the Japan Meteorological Agency)
intelligence of users on the ground who act simply by monitoring tweets, delivering
as ‘sensors’, relaying real-time observations notifications much faster than national
from the field. broadcasted announcements280 and;

• Researchers at the University of Georgia


Some examples include:
analysed images of lost-and-found
• A team of international researchers
tornado debris shared on social media
used Twitter data and location metadata
following a 2011 outbreak in the south-
following the 2012 Hurricane Sandy
eastern US. Using Geographic Information
evacuation in Florida, USA. They found a
System mapping and trajectory
correlation between per-capita hurricane-
modelling techniques, this was the most
related Twitter activity and per-capita
comprehensive study to date on debris
economic hurricane damage, suggesting
trajectory from a tornado outbreak281.
disaster-related social media could be
used for rapid damage assessments279;

278 Chae J, Thom D, Jang Y, Kim S, Ertl T, Ebert DS. 2014 Public behavior response analysis in disaster events utilizing
visual analytics of microblog data. Comput. Graph. 38, 51–60. (https://doi.org/10.1016/j.cag.2013.10.008)
279 Kryvasheyeu Y et al. Rapid assessment of disaster damage using social media activity. Sci Adv.
2. (https://doi.org/10.1126/sciadv.1500779)
280 Sakaki T, Okazaki M, Matsuo Y. Tweet analysis for real-time event detection and earthquake reporting system
development. IEEE Trans. Knowl. Data Eng., 25, 919–931. (https://doi.org/10.1109/tkde.2012.29)
281 Knox AJ et al. 2013 Tornado debris characteristics and trajectories during the 27 April 2011 super outbreak as
determined using social media data. Bulletin of the American Meteorological Society. 94, 1371—1380.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 99


Chapter four

The mosaic effect risk is related to the The Centre for Humanitarian Data is focused
increased use of metadata, or data about on increasing the use and impact of data in the
other data, in humanitarian contexts. This could humanitarian sector and are interested in the
be the time and location of a message sent, potential for PETs to address the mosaic effect
rather than the content of the message itself. and enhance collaboration283. They make the
Communications with people affected by crises following recommendations:
can include social media or SMS messaging, • Technical actions Humanitarian organisations
sharing information-as-aid, mobile cash transfer should invest in further strengthening
programmes, and monitoring and evaluation metadata standards and interoperability,
systems (such as those used to detect enabling monitoring of related datasets to
fraud), all of which entail rich and potentially counter mosaic effect risks;
compromising metadata282. Privacy International
• Procedural actions A data asset registry and
has therefore recommended that humanitarian
data ecosystem mapping assessment should
organisations practice do no harm principles by
be completed as per the recommendations
understanding how the data and metadata they
included in the IASC Operational Guidance
store and use may be employed for purposes
on Data Responsibility in Humanitarian Action
beyond aid – such as for profiling, surveillance
(2021);
or political repression. This highlights the need
for mitigation tools to be developed (such as • Governance actions Sector-wide fora should
PETs) and the importance of data minimisation. be used to ensure that datasets are not
shared on different platforms at different
levels of aggregation, and determine
consistent standards for approaches such as
anonymisation and;

• Legal actions Humanitarian organisations


can also improve licensing for datasets by
adding clauses that prohibit joining datasets
or analysing data with the purpose of re-
identification or attribute disclosure. While this
will not prevent intentional misuse, it will help
explain what type of use goes against the use
allowed by the sharing organisation.

282 Privacy International. Humanitarian Metadata Problem Doing No Harm in the Digital Era. See https://
privacyinternational.org/sites/default/files/2018-12/The%20Humanitarian%20Metadata%20Problem%20-%20
Doing%20No%20Harm%20in%20the%20Digital%20Era.pdf (accessed 28 September 2022).
283 Weller S. 2022 Minimizing privacy risks in humanitarian data. Privitar blog. 9 March 2022. See https://www.privitar.
com/blog/fragility-forum-minimizing-privacy-risks-in-humanitarian-data/ (accessed 10 October 2022).

100 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

The role of PETs in countering the risk of The model would return to the owner
mosaic effect with improved ability to predict peoples’
PETs could
PETs could help safeguard personal data movements or locations. This type of model
while still allowing researchers to utilise it in would be incredibly valuable for humanitarian help safeguard
humanitarian efforts. Differential privacy could organisations making decisions about where personal data
be used to add ‘noise’, to make any one to direct resources during crises. There are
while still allowing
true datapoint more difficult to trace to a real already examples of federated learning being
individual. The resulting ‘noisy’ dataset can then used in medical research (see Use case 1.1, researchers
be shared between organisations more safely. page 57). Homomorphic encryption has to utilise it in
The noise can be adjusted for extra privacy been used to perform large-scale studies on humanitarian
(and reduced utility), allowing data controllers cross-border health data (see Use case 1.1,
efforts.
to make contextual privacy-utility trade-offs. page 57), including multiple institutions in
collaboration and crowdsourced materials
Federated learning could be used on (such as genomics)284.
geospatial datasets, such as people’s locations,
without sharing the data used to train the
model. This would entail training a model, or
a predictive algorithm, on a local geospatial
dataset. The model would then be shared
for training on remote datasets at other
organisations, which are never revealed to the
model owner. External organisations holding
relevant data might include telecoms, other
humanitarian organisations, or social media
sites, all of which may not have established
data partnerships or sharing agreements.

284 Blatt M, Gusev A, Polyakov Y, Goldwasser S. 2020 Secure large-scale genome-wide association studies using
homomorphic encryption. P Natl Acad Sci USA. 117, 11608—11613. (https://doi.org/10.1073/pnas.1918257117)

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 101


Chapter four

BOX 16

Tackling human trafficking through cooperation between law enforcement


and NGOs.

A partnership was established between Dutch law enforcement and NGOs working against
human trafficking. The law enforcement agency (LEA) wanted to shadow potential trafficking
victims from a long list of identified individuals.

However, local human trafficking NGOs The result was a random list of 20 people
also held informant lists, with potential who were candidates for LEA shadowing.
overlap between their lists and that of the A future SMPC application may include
LEA. The NGOs were concerned that their tracking the movements of potential
informants would feel confidentiality had trafficking victims across agencies and
been breached by the NGO if they were NGOs without sharing their names, to
shadowed by the LEA. The long list from law identify trends and potential trafficking
enforcement was compared to the short list routes. This approach could also shed
from the NGO ‘in the blind’ using SMPC285. light on the extent of human trafficking
crimes more widely – an issue otherwise
impossible to measure.

FIGURE 8

Private multi-party machine learning with MPC

Using MPC, different parties send encrypted messages to each other, and obtain the model
F(A,B,C) they wanted to compute without revealing their own private input, and without the need
for a trusted central authority.

F(A,B,C)
B
A C

A C

F(A,B,C) F(A,B,C) F(A,B,C)

Central trusted authority Secure multi-party machine learning

285 Pinsent Masons. Data sharing coalition helps flag victims of human trafficking. See https://www.pinsentmasons.com/
out-law/news/data-sharing-coalition-helps-flag-victims-of-human-trafficking (accessed 7 July 2022).

102 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Chapter four

Conclusions
There are no known cases of the mosaic
effect causing harm in humanitarian, crises or
development scenarios. At the same time, there
is a cost to not sharing or linking data in such
cases, particularly where lives may be saved.
Humanitarian organisations seek to consolidate
and strengthen approaches to reduce risk
through data responsibility practices and
increasing cross-organisational work (including
with NSOs [Use Case 4]).

Some humanitarian organisations reserve the


right to nondisclosure286; however, in fast-
developing situations data flows become
difficult to control. Anticipatory tools, such
as Privacy Impact Assessments or Data
Protection Impact Assessments, could be
used to better mitigate downstream harms
stemming from the mosaic effect. Further, crisis
situations can change the calculus of harm vs
potential benefit287.

PETs may expand the bounds of possibility


in providing strong privacy and high utility
from a dataset. However, a broader approach
should also consider the ethics of humanitarian
projects, training and vetting of trusted
researchers, robust data sharing agreements
and other legal controls, as well as security
access controls and locked down physical
hardware. Historically, organisations have used
approaches such as vetting an internal team
of trusted researchers, drafting bilateral data
sharing agreements, or other legal tools. Even
with additional technical safeguards, these non-
technical solutions remain important.

286 Council of the European Union. 2012 Applicability of the General Data Protection Regulation to the activities of the
International Committee of the Red Cross. See http://data.consilium.europa.eu/doc/document/ST-7355-2015-INIT/en/
pdf (accessed 26 June 2022).
287 Veale M, Binns R, Edwards L. 2018 Algorithms that remember: model inversion attacks and data protection law. Philos
T R Soc A. 376. (https://doi.org/10.1098/rsta.2018.0083)

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 103


104 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT
Conclusions

Left
Shanghai, China. © iStock / AerialPerspective Works.

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 105


Conclusions

Conclusions
This report sets out to refresh perspectives Given their multipurpose nature, networked
on PETs following the society’s 2019 report PETs that allow for collaborative analysis
PETs may be
Protecting privacy in practice. In doing so, might be viewed as an upgrade to traditional
used in any it considers the role of PETs beyond data systems of information sharing, such as the
scenario where protection and highlights the secondary effects internet, rather than new privacy compliance
of PETs in motivating partnerships and enabling tools. For this reason, in the future, PETs may
data benefits
collaboration across sectors and international be used for any sufficiently valuable data, not
those with borders. The risk of personal data use is just sensitive category data (such as personal
exclusive access, considered in terms of privacy attack (what is or commercially advantageous data). Rather,
or where open technically possible) as well as the severity of PETs may be used in any scenario where data
potential downstream harms of compromised benefits those with exclusive access, or where
access could
data (which is contextual). open access could cause harm. This could
cause harm. include, for example, data pertaining to natural
This could Several questions remain beyond the scope resources (to prevent over-exploitation).
of this report and suggest areas for further
include, for
research. First, very little is known about the Finally, more work is required to integrate
example, data potential market value of PETs as discreet PETs into wider data governance systems. The
pertaining to technologies, or their true significance in data tendency for PETs to be developed as discreet
natural resources use in collaborative scenarios. It is therefore technologies has led users to approach PETs
difficult to estimate what value would be as a set of tools, each with unique problem-
(to prevent
unlocked with widespread uptake of PETs, solving capabilities. In the future, PETs may
over‑exploitation). whether in economic terms or in social benefit. operate more like complementary pieces of
The market value of PETs may also depend machinery which, when combined with other
on trends in use cases, whether PETs are technological, legal and physical mechanisms,
employed as security tools or for increased will amount to automated data governance
collaborative learning and analysis. systems. These systems could help to enact
an organisation’s data policy and facilitate
Second, this report has not explored the full responsible information flows at unprecedented
range of potential follow-on effects of PETs scales. This next level of PETs abstraction will
adoption. These include potential harms require collaboration between PETs developers
which may stem from greater monitoring and and leading organisations to develop and test
surveillance on the part of governments and use cases.
private sector actors, leading to enhanced
profiling and resulting in increased distrust of PETs can play an important role in a privacy
public services and loss of privacy in online by design approach to data governance
spaces (such as through highly targeted when considered carefully, informed by
advertisement). In some cases, PETs are already appropriate guidance and assurances. Given
being used to facilitate business-as-usual in the rapid development of these technologies,
online advertising288, easing companies’ access it is a critical time to consider how PETs will
to, and use of, customer data to the usual ends. be used and governed for the promotion of
human flourishing.

288 See Box 10, page 78 on Cambridge Analytica.

106 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Appendices

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 107


Appendices

APPENDIX 1:

Definitions
Differential privacy: security definition which Privacy budget (also differential privacy
means that, when a statistic is released, it budget, or epsilon): a quantitative measure
should not give much more information about of the change in confidence of an individual
a particular individual than if that individual had having a given attribute.
not been included in the dataset. See also
Privacy-preserving synthetic data (PPSD):
privacy budget.
synthetic data generated from real-world
Distributed Ledger Technology (DLT): data to a degree of privacy that is deemed
an open, distributed database that can record acceptable for a given application.
transactions between several parties efficiently
Private Set Intersection (PSI): secure multiparty
and in a verifiable and permanent way. DLTs are
computation protocol where two parties
not considered PETs, though they can be used
compare datasets without revealing them in
(as some PETs) to promote transparency by
an unencrypted form. At the conclusion of the
documenting data provenance.
computation, each party knows which items
Epsilon (Ɛ
Ɛ): see privacy budget. they have in common with the other. There are
some scalable open-source implementations of
Fully homomorphic encryption (FHE): a type
PSI available.
of encryption scheme which allows for any
polynomial function to be computed on Secure multi-party computation (SMPC
encrypted data, which means both additions or MPC): a subfield of cryptography concerned
and multiplications. with enabling private distributed computations.
MPC protocols allow computation or analysis
Homomorphic encryption (HE): a property that
on combined data without the different parties
some encryption schemes have, so that it is
revealing their own private inputs to the
possible to compute on encrypted data without
computation.
deciphering it.
Somewhat Homomorphic Encryption (SHE):
Metadata: data that describes or provides
a type of encryption scheme which supports a
information about other data, such as time and
limited number of computations (both additions
location of a message (rather than the content
and multiplications) on encrypted data.
of the message).
Synthetic data: data that is modelled to
Mosaic effect: the potential for individuals
represent the statistical properties of original
of groups to be re-identified through using
data; new data values are created which, taken
datasets in combination, even though each
as a whole, reproduce the statistical properties
dataset has been made individually safe.
of the ‘real’ dataset.
Noise: noise refers to a random alteration of
Trusted Execution Environment (TEE):
data/values in a dataset so that the true data
secure area of a processor that allows code
points (such as personal identifiers) are not as
and data to be isolated and protected from
easy to identify.
the rest of the system such that it cannot
be accessed or modified even by the
operating system or admin users. Trusted
execution environments are also known as
secure enclaves.

108 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Appendices

APPENDIX 2:

Acknowledgements
Working Group members
The members of the Working Group involved in this report are listed below. Members acted in
an individual and not a representative capacity and declared any potential conflicts of interest.
Members contributed to the project on the basis of their own expertise and good judgement.

Chair
Professor Alison Noble FRS FREng OBE, Technikos Professor of Biomedical Engineering and
Department of Engineering Science, University of Oxford

Members
Professor Jon Crowcroft FRS FREng, Marconi Professor of Communications Systems in the
Computer Lab, University of Cambridge; Alan Turing Institute
George Balston, Co-Director, Defence and Security, Alan Turing Institute
Professor Sir Anthony Finkelstein CBE FREng, President, City University London

Guy Cohen, Independent

Dr Benjamin R Curtis, Senior Researcher, Zama

Professor Emiliano de Cristofaro, Professor of Security and Privacy Enhancing Technologies,


University College London
Dr Marion Oswald, Associate Professor in Law, University of Northumbria

Professor Carsten Maple, Professor of Cyber Systems Engineering, University of Warwick Cyber
Security Centre
Dr Suzanne Weller, Head of Research, Privitar

Royal Society staff

Royal Society secretariat


Dr June Brawner, Senior Policy Adviser and Project Lead

Areeq Chowdhury, Head of Policy, Data

Dr Natasha McCarthy, Head of Policy, Data (until February 2022)

Dr Franck Fourniol, Senior Policy Adviser (until July 2021)

Royal Society staff who contributed to the development of the project


Dr Rupert Lewis, Chief Science Policy Officer

Dr Mahi Hardalupas, Project Coordinator (until July 2022)

Helena Gellersen, Patricia Jimenez, Louise Parkes, UKRI work placement (various periods)

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 109


Appendices

Reviewers
This report has been reviewed by expert readers and by an independent Panel of experts, before
being approved by Officers of the Royal Society. The Review Panel members were not asked to
endorse the conclusions or recommendations of the report, but to act as independent referees of
its technical content and presentation. Panel members acted in a personal and not a representative
capacity. The Royal Society gratefully acknowledges the contribution of the reviewers.

Reviewers
Dr Clifford Cocks CB FRS

Andrew Trask, Founder and Leader, OpenMined

Alex van Someren FREng, Chief Scientific Adviser for National Security, UK Government

Event participants
The Royal Society would like to thank all those who contributed to the development of this project,
in particular through participation in the following events.

PETs Contact Group Session One: Evidence and advice needs (21 April 2021)
15 participants from UK government, regulators and civil society.

PETs Contact Group Session Two: Use cases and outputs development (18 October 2021)
13 participants from UK government, regulators and civil society.

Contributors of use cases and standards


The use cases and standards chapters received domain-specific input from a range of experts in
research, industry and civil society. Domain experts were not asked to endorse the conclusions or
recommendations of the report, but to act as independent referees of the use cases and standards
chapters, their technical content and presentation. Contributors acted in a personal and not a
representative capacity. The Royal Society gratefully acknowledges their contributions.

Domain experts who consulted on use cases and standards


Gerry Reilly, Health Data Research UK

Greg A Johnston, Energy Systems Catapult

Alex Howard, Octopus Energy Centre for Net Zero


(until September 2021)
Dr Louisa Nolan, Office for National Statistics (until May 2022)

Dr Sergey M Plis, Dr Vince D Calhoun and Eric Verner, COINSTAC

Sahar Danesh, British Standards Institution

Annemarie Büttner, Independent expert

110 FROM PRIVACY TO PARTNERSHIP – POLICY REPORT


Appendices

Commissioned evidence-gathering and reviews


• Hattusia 2022. The current state of assurance in establishing trust in PETs. The Royal Society.
See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/

• Jordon J et al. 2022 Synthetic data: What, why and how?


See https://arxiv.org/pdf/2205.03257.pdf (accessed 2 September 2022).

• London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market
readiness, enabling and limiting factors. The Royal Society. See https://royalsociety.org/topics-
policy/projects/privacy-enhancing-technologies/ This project was partly funded by a grant from
the Centre for Data Ethics and Innovation.

Market research London Economics / Open Data Institute PETs market research
The Royal Society worked with London Economics and the Open Data Institute on exploratory
research into the state of PETs adoption, barriers and incentives within key UK public sector data
institutions. A sample of seven public sector organisations were interviewed by invitation, chosen to
represent a cross-section of criteria including function relevant to data (eg storage, processing) and
type of data used.

Public sector organisations profiled for PETs market readiness research


Competition and Markets Authority

DataLoch

Department for Transport

Government Digital Service


Greater London Authority

National Archives

Office for National Statistics

FROM PRIVACY TO PARTNERSHIP – POLICY REPORT 111


The Royal Society is a self-governing Fellowship of many
of the world’s most distinguished scientists drawn from all
areas of science, engineering, and medicine. The Society’s
fundamental purpose, as it has been since its foundation
in 1660, is to recognise, promote, and support excellence
in science and to encourage the development and use of
science for the benefit of humanity.

The Society’s strategic priorities emphasise its commitment


to the highest quality science, to curiosity-driven research,
and to the development and use of science for the benefit
of society. These priorities are:
• The Fellowship, Foreign Membership and beyond
• Influencing
• Research system and culture
• Science and society
• Corporate and governance

For further information


The Royal Society
6 – 9 Carlton House Terrace
London SW1Y 5AG

T +44 20 7451 2500


W royalsociety.org

Registered Charity No 207043

9 781782 526278

ISBN: 978-1-78252-627-8
Issued: January 2023 DES7924

You might also like