From Privacy To Partnership The Royal Society
From Privacy To Partnership The Royal Society
to partnership
The role of privacy enhancing
technologies in data governance
and collaborative analysis
From privacy to partnership
Issued: January 2023 DES7924
ISBN: 978-1-78252-627-8
© The Royal Society
Contents
Foreword4
Executive summary 5
Scope 5
Methodology 5
Key findings 6
Recommendations8
Introduction18
Background 18
Key terms and definitions 19
Chapter one: The role of technology in privacy‑preserving data flows 22
Data privacy, data protection and information security 23
What are privacy enhancing technologies (PETs)? 23
A downstream harms-based approach: Taxonomy of harms 24
Recent international developments in PETs 25
Interest in PETs for international data transfer and use 28
Accelerating PETs development: Sprints, challenges and international collaboration 28
Chapter two: Building the PETs marketplace 32
PETs for compliance and privacy 32
PETs in collaborative analysis 33
Barriers to PETs adoption: User awareness and understanding in the UK public sector 35
Barriers to PETs adoption: Vendors and expertise 36
Chapter three: Standards, assessments and assurance in PETs 42
PETs and assurance: The role of standards 42
Chapter four: Use cases for PETs 56
Considerations and approach 56
Privacy in biometric data for health research and diagnostics 57
Preserving privacy in audio data for health research and diagnostics 65
PETs and the internet of things: enabling digital twins for net zero 67
Social media data: PETs for researcher access and transparency 74
Synthetic data for population-scale insights 81
Collaborative analysis for collective intelligence 86
Online safety: Harmful content detection on encrypted platforms 90
Privacy and verifiability in online voting and electronic public consultation 95
PETs and the mosaic effect: Sharing humanitarian data in emergencies and fragile contexts 97
Conclusions106
Appendices108
Appendix 1: Definitions 108
Appendix 2: Acknowledgements 109
Foreword
The widespread collection and use of data is Our report arrives at a time of rapid innovation
transforming all facets of society, from scientific in PETs, as well as data protection legislation
research to communication and commerce. reform in the United Kingdom. The intention
The benefits of using data in decision making is not to provide a comprehensive view of
are increasingly evident in tackling societal all technologies under the broad umbrella of
problems and understanding the world around PETs; rather, we have chosen to focus on a
us. At the same time, there are inherent subset of promising and emerging tools with
Alison Noble OBE vulnerabilities when sensitive data is stored, demonstrable potential in data governance. In
FREng FRS used or shared. demonstrating this value, we cite examples from
the UK and international contexts. Realising the
From privacy to partnership sets out how full potential of PETs across national borders
an emerging set of privacy enhancing will require further harmonisation, including
technologies (PETs) might help to balance the consideration of data protection laws in
risks and rewards of data use, leading to wider various jurisdictions.
social benefit. It follows the Royal Society’s
Protecting privacy in practice: The current use, Artificial intelligence and machine learning
development and limits of Privacy Enhancing are transforming our capacity to assess and
Technologies in data analysis, which gave confront our greatest challenges, but these
a snapshot of this rapidly developing field in tools require data to ‘fuel’ them. As a biomedical
2019. This new publication offers a refreshed engineer using AI-assistive technologies to
perspective on PETs, not only as security tools, detect disease, I recognise that the greatest
but as novel means to establish collaborative research problems of our time – from cancer
analysis and data partnerships that are ethical, diagnostics to the climate crisis – are, in a
legal and responsible. sense, data problems.
We have three objectives for this report. Our The value of data is most fully realised through
first objective is that the use cases inspire aggregation and collaboration, whether
those collecting and using data to consider the between individuals or institutions. I hope this
potential benefits of PETs for their own work, report will inspire new approaches to data
or in new collaborations with others. Second, protection and collaboration, encouraging
for the evidence we present on barriers to further research in – and testing of – PETs in
adoption and standardisation to help inform various scenarios. PETs are not a silver bullet,
policy decisions to encourage a marketplace but they could play a key role in unlocking
for PETs. Finally, through our recommendations, the value of data without compromising
we hope the UK will maximise the opportunity privacy. By enabling new data partnerships,
to be a global leader in PETs – both for data PETs could spark a research transformation:
security and collaborative analysis – alongside a new paradigm for information sharing and
emerging, coordinated efforts to implement data analysis with real promise for tackling
PETs in other countries. future challenges.
Executive summary
Privacy Enhancing Technologies (PETs) are Scope
a suite of tools that can help maximise the From privacy to partnership outlines the current
use of data by reducing risks inherent to data PETs landscape and considers the role of these
use. Some PETs provide new techniques technologies in addressing data governance
for anonymisation, while others enable issues beyond data security. The aim of this
collaborative analysis on privately-held datasets, report is to address the following questions:
allowing data to be used without disclosing • How can PETs support data governance
copies of data. PETs are multi-purpose: they and enable new, innovative, uses of data for
can reinforce data governance choices, serve public benefit?
as tools for data collaboration or enable
• What are the primary barriers and enabling
greater accountability through audit. For these
factors around the adoption of PETs in
reasons, PETs have also been described
data governance, and how might these be
as ‘Partnership Enhancing Technologies’1 or
addressed or amplified?
‘Trust Technologies’2.
• How might PETs be factored into frameworks
This report builds on the Royal Society’s 2019 for assessing and balancing risks, harms and
publication Protecting privacy in practice: benefits when working with personal data?
The current use, development and limits
Methodology
of Privacy Enhancing Technologies in data
This work was steered by an expert Working
analysis3, which presented a high-level
Group as well as two closed contact group
overview of PETs and identified how these
sessions with senior civil servants and
technologies could play a role in addressing
regulators in April and October 2021 (on the
privacy in applied data science research, digital
scope and remit of the report, and on the use
strategies and data-driven business.
case topics and emerging themes, respectively).
1 Trask A. in Lunar Ventures (Lundy-Bryan L.) 2021 Privacy Enhancing Technologies: Part 2—the coming age of
collaborative computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-
99e8-12a92d5d88a0 (accessed 20 September 2022).
2 Infocomm Media Development Authority (Singapore grows trust in the digital environment). See https://www.imda.gov.
sg/news-and-events/Media-Room/Media-Releases/2022/Singapore-grows-trust-in-the-digital-environment (accessed
5 June 2022).
3 The Royal Society. 2019 Protecting privacy in practice: The current use, development and limits of Privacy Enhancing
Technologies in data analysis. See https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/
privacy-enhancing-technologies-report.pdf (accessed 30 June 2022).
The findings in this report are the result Standardisation for PETs, including data
of consultations with a wide range of data standards, is lacking and is cited as a hindrance
and privacy stakeholders from academia, to adoption by potential users in the UK public
government, third sector, and industry, as well sector10. Technical standards are required to
as three commissioned research projects on ensure the underpinning technologies work as
the role of assurance in enabling the uptake intended, while process standards are needed
of PETs4, PETs market readiness in the public to ensure users know how and when to deploy
sector5, and a survey of synthetic data: data them. While few PETs-specific standards exist
that is artificially generated based on real-world to date, standards in adjacent fields (such as
data, but which produces new data points6. The cybersecurity and AI) will be relevant. In the
use cases were drafted with input from domain future, PETs-specific standards could provide
specialists, and the report was reviewed by the basis for assurance schemes to bolster
expert readers as well as invited reviewers. user confidence.
The details of contributors, Working Group
members, expert readers and reviewers are
provided in the Appendix.
Key findings
General knowledge and awareness of PETs
remains low amongst many potential PETs
users7, 8, with inherent risk of using new and
poorly understood technologies acting as a
disincentive to adoption. Few organisations,
particularly in the public sector, are prepared
to experiment with data protection9. Without
in-house expertise, external assurance
mechanisms or standards, organisations
are unable to assess privacy trade-offs for a
given PET or application. As a result, the PETs
value proposition remains abstract and the
business case for adopting PETs is unclear for
potential users.
4 Hattusia 2022 The current state of assurance in establishing trust in PETs. The Royal Society.
See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/
5 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/. This project was partly funded by a grant from CDEI.
6 Jordon J et al. 2022 Synthetic data: What, why and how? See https://arxiv.org/pdf/2205.03257.pdf (accessed 2
September 2022).
7 London Economics and the Open Data Institute. 2022 Privacy Enhancing
Technologies: Market readiness, enabling and limiting factors. The Royal Society.
See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/.
8 Lunar Ventures, Lundy-Bryan L. 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
9 London Economics and the Open Data Institute. 2022 Privacy Enhancing
Technologies: Market readiness, enabling and limiting factors. The Royal Society.
See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/.
10 Ibid.
A significant barrier to the widespread use of Data protection is only one aspect of the right
PETs is a lack of clear use cases for wider public to privacy. In most cases, PETs address this
benefit. To address this, Chapter 4 illustrates the one aspect but do not address how data or
potential benefit of PETs in the contexts of: the output of data analysis is used, although
• Using biometric data for health research this could change as PETs mature. Some
and diagnostics; recent applications utilise PETs as tools for
accountability and transparency, or to distribute
• Enhancing privacy in the Internet of Things
decision-making power over a dataset across
and in digital twins;
multiple collaborators11, suggesting their
• Increasing safe access to social media data potential in addressing elements of privacy
and accountability on social media platforms; beyond data security.
11 For example, Meta recently conducted a survey collecting personal data, which was encrypted and split into shares
between third-party facilitators, namely universities. Analyses can be run using secure multi-party computation;
requests for analysis must be approved by all third-party shareholders. See https://ai.facebook.com/blog/assessing-
fairness-of-our-products-while-protecting-peoples-privacy/ (accessed 10 October 2022).
Recommendations
AREA FOR ACTION: COORDINATED INTERNATIONAL ACTION TO ENSURE THE
RESPONSIBLE DEVELOPMENT OF PETS FOR PUBLIC BENEFIT
RECOMMENDATION 1
RECOMMENDATION 2
RECOMMENDATION 3
RECOMMENDATION 4
PETs could reform the way data is used The PETs strategy should offer a vision that
domestically and across borders, offering complements the Government’s National
potential solutions to longstanding problems of Data Strategy21 and National AI Strategy22.
siloed and underutilised data across sectors. The PETs strategy should prioritise a roadmap
To ensure the use of PETs for public good, for public sector PETs adoption, addressing
PETs-driven information networks should be public awareness and the PETs marketplace
stewarded by public sector and civil society (Chapter 2), technological maturity, appropriate
organisations using data infrastructure for public regulatory mechanisms and responsibilities,
good. A coordinated national strategy for the alongside standards and codes of conduct for
development and adoption of PETs for public PETs users (Chapter 3).
good will ensure the timely and responsible
deployment of these technologies, with the
public sector leading by example.
20 US Office for Science and Technology Policy (Request for Information on Advancing Privacy-Enhancing Technologies).
https://public-inspection.federalregister.gov/2022-12432.pdf (accessed 17 July 2022).
21 HM Government (National Data Strategy). See https://www.gov.uk/government/publications/uk-national-data-strategy/
national-data-strategy (accessed 9 September 2022).
22 HM Government (National AI Strategy). See https://www.gov.uk/government/publications/national-ai-strategy
(accessed 9 September 2022).
RECOMMENDATION 5
Public sector organisations could partner with Communication of PETs and their appropriate
small and medium-sized enterprises (SMEs) use in various contexts will be key to building
developing PETs to identify use cases, which trust with potential users23, encouraging the
could then be tested through low-cost, low-risk PETs marketplace (Chapter 2). The ICO should
pilot projects. Legal experts and interdisciplinary continue its work on using PETs for wider good
policy professionals should be involved from and communicating the implications – including
project inception, ensuring PETs meet data barriers and potential benefits. The CDEI
protection requirements and that outcomes and should continue to provide practical examples
implications are properly communicated to non- that will help organisations understand and
technical decision-makers. build a business case for PETs’ adoption.
Proof of concept and pilot studies should
Use cases illustrated in Chapter 5 highlight be communicated to the wider public to
areas of significant potential public benefit in demonstrate the value of PETs, foster trust in
healthcare and medical research, for reaching public sector data use and demonstrate value-
net zero through national digital twins and for for-money24.
population-level data collaboration.
23 The Royal Society. Creating trusted and resilient data systems: The public perspective. (to be published online
in 2023)
24 This is in line with the Digital Economy Act 2017. See: The Information Commissioner’s Office (Data sharing across
the public sector: the Digital Economy Act codes). See https://ico.org.uk/for-organisations/guide-to-data-protection/
ico-codes-of-practice/data-sharing-a-code-of-practice/data-sharing-across-the-public-sector-the-digital-economy-act-
codes/ (accessed 2 September 2022).
RECOMMENDATION 6
While data protection legislation should remain Further interpretation may be required to help
technology neutral so as to be adaptable, users understand how PETs might serve as
current plans to review UK data protection laws tools for meeting data protection requirements.
provide an opportunity to consider the novel For example, it may be required to clarify
and multipurpose nature of these emerging data protection obligations where machine
technologies, particularly as they provide the learning models are trained on personal data in
technical means for new types of collaborative federated learning scenarios26 or the degree to
analysis. The ICO should continue its work to which differentially private or homomorphically
provide clarity around PETs and data protection encrypted data meets anonymisation
law, encouraging the use of PETs for wider requirements27. Where PETs enable information
public good25 and drawing from parallel work networks and international data collaborations,
on AI guidance where relevant (such as privacy- the ICO might anticipate clarification questions
preserving machine learning). specific to international and collaborative
analysis use cases. Regulatory sandboxes (as
in Recommendation 2) will be useful for testing
scenarios, particularly for experimentation with
PETs in structured transparency28 (such as in
open research, credit scoring systems) and as
accountability tools29.
25 The Information Commissioner’s Office (ICO consults health organisation to shape thinking on privacy-enhancing
technologies). See https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2022/02/ico-consults-health-
organisations-to-shape-thinking-on-privacy-enhancing-technologies/ (accessed 20 March 2022).
26 Nguyen T, Sun K, Wang S, Guitton F, Guo Y. 2021. Privacy preservation in federate learning An insightful survey from
the GDPR perspective. Computers & Security 110. (https://doi.org/10.1016/j.cose.2021.102402)
27 See for example: Koerner K. 2021 Legal perspectives on PETs: Homomorphic encryption. Medium. 20 July 2021. See
https://medium.com/golden-data/legal-perspectives-on-pets-homomorphic-encryption-9ccfb9a334f (accessed 30
June 2022).
28 Trask A, Bluemke E, Garfinkel B, Cuervas-Mons CG, Dafoe A. 2020 Beyond Privacy Trade-offs with Structured
Transparency. See https://arxiv.org/ftp/arxiv/papers/2012/2012.08347.pdf (accessed 6 February 2022).
29 See for example: Meta AI (Assessing fairness of our products while protecting peoples privacy). See https://
ai.facebook.com/blog/assessing-fairness-of-our-products-while-protecting-peoples-privacy/ (accessed 15
August 2022).
RECOMMENDATION 6 (CONTINUED)
The ICO could expand on its PETs guidance, for
example, through developing self-assessment
guides. Data ethics organisations, such as the
CDEI, might also develop impact assessment
tools, for example, a PETs impact assessment
protocol that considers downstream
implications on human rights. The Alliance
for Data Science Professionals certification
scheme30, which defines standards for ethical
and well-governed approaches to data use,
could specifically consider the role of PETs
in evidencing Skill Areas A (Data Privacy and
Stewardship) and E (Evaluation and Reflection).
RECOMMENDATION 7 RECOMMENDATION 8
31 (ISC)² ((ISC)² Information Security Certifications). See https://www.isc2.org/Certifications# (accessed 13 May 2022).
32 International Association of Privacy Professionals (Privacy Technology Certification). See https://iapp.org/media/pdf/
certification/CIPT_BOK_v.3.0.0.pdf (accessed 30 June 2022).
TABLE 1
Privacy risk Revealing sensitive attributes Revealing sensitive attributes Revealing sensitive attributes
addressed present in a dataset during present in a dataset during present in a dataset during
computation computation computation
Benefits Commercial solutions widely Can allow zero loss of information; No need for a trusted third party--
available; Zero loss of information; FHE can support the computation sensitive information is not revealed
efficient computation of any of any operation to anyone; The parties obtain only
operations the resulting analysis or model
Current limitations Many side-channel attacks FHE, SHE and PHE are usable Highly compute and
possible; current commercial Highly computationally intensive communication intensive; requires
solutions limited with regard to Bandwidth and latency issues expertise in design that meets
distributed computation on big Running time compute requirements and security
datasets PHE and SHE support the models
computation of limited functions
Standardisation in progress
Possibility for side channel attacks
(current understanding is limited)
Readiness level Product PHE / SHE / FHE in use PSI / PIR / Product, Proof of
(FHE on a smaller scale) concept--Pilot
KEY
FHE: Fully Homomorphic Encryption SHE: Somewhat Homomorphic Encryption PHE: Partial Homomorphic Encryption
PIR: Private Information Retrieval PSI: Private Set Intersection
* If the client encrypts their data and sends it to a server for homomorphic computation, only the client is able to access the results (by using their
secret decryption key).
Federated learning /
Differential privacy Privacy-preserving synthetic data
federated machine learning
Enables the use of remote data for training Prevents disclosure about individuals when Prevents disclosure about individuals when
algorithms; data is not centralised releasing statistics or derived information releasing statistics or derived information
Revealing sensitive information, including Revealing sensitive information, including Revealing sensitive attributes or presence in
an individual’s presence in a dataset an individual’s presence in a dataset; a dataset
Dataset or output disclosing sensitive
information about an entity included in the
dataset
Very little loss of information Formal mathematical proof / privacy Applications beyond privacy
guarantee. Level of privacy protection may
Level of privacy protection may be
be quantifiable. Relative to other PETs, it is
quantifiable (eg, with differentially private
computationally inexpensive.
synthetic data)
Model inversion and membership inference Noise and loss of information, unless Noise and loss of information
attacks may be vulnerabilities datasets are large enough Setting the level of protection requires
Setting the level of protection requires expertise
expertise Privacy enhancement unclear
Precision of analysis limited inversely to
level of protection
May require scale of data within each Specialist skills Specialist skills required
dataset (cross-silo federated learning) Custom protocols
As yet, no standards for generation or setting
Very large datasets
Distributed systems are complex and privacy parameters
difficult to manage As yet, no standards for setting privacy
parameters
Introduction
Background This work follows the Royal Society’s 2019
Data about individuals, their unique report Protecting privacy in practice: The
characteristics, preferences and behaviours, current use, development and limits of Privacy
is ubiquitous and the power to deliver data- Enhancing Technologies in data analysis37,
driven insights using this information is rapidly which highlighted the role of PETs in enabling
accelerating33, 34. This unprecedented availability the derivation of useful results from data without
of data, coupled with new capabilities to use providing wider access to datasets. Protecting
data, drives the frontiers of research and privacy in practice presented a high-level
innovation – addressing challenges from the overview of PETs and identified how these
climate crisis to the COVID-19 pandemic35, 36. potentially disruptive technologies could play
However, the greater collection, transfer and a role in addressing tensions around privacy
use of data – particularly data which is personal, and utility.
commercially sensitive or otherwise confidential
– also entails increased risks. The tension The 2019 report made several observations for
between maximising data utility (where data is how the UK could realise the potential of PETs,
used) and managing risk (where data is hidden) including:
poses a significant challenge to anyone using • The research and development of PETs can
data to make decisions. be accelerated through collaborative, cross-
sector research challenges developed by
This report, undertaken in close collaboration government, industry and the third sector,
with the Alan Turing Institute, considers the alongside fundamental research support for
potential for tools and approaches collectively advancing PETs;
known as Privacy Enhancing Technologies
• Government can be an important influencer
(PETs) to revolutionise the safe and rapid use
in the adoption of PETs by demonstrating
of sensitive data for wider public benefit. It
their use and sharing their experience around
examines the possibilities and limitations for
how PETs unlock new opportunities for data
PETs in responsible data governance and
analysis. At the same time, public sector
identifies steps required to realise their benefits.
organisations should be given the level of
expertise and assurance required to utilise
new technological solutions;
33 The Royal Society. 2017 Machine learning: the power and promise of computers that learn by example. See https://
royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf (accessed 30
May 2022).
34 The British Academy and the Royal Society. 2017 Data management and use: Governance in the 21st century. See
https://royalsociety.org/-/media/policy/projects/data-governance/data-management-governance.pdf (accessed 28
July 2022).
35 Alsunaidi A J et al. 2021 Applications of big data analytics to control COVID-19 pandemic. Sensors (Basel) 21, 2282.
(https://doi.org/10.3390/s21072282 s21072282)
36 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero. See https://
royalsociety.org/-/media/policy/projects/digital-technology-and-the-planet/digital-technology-and-the-planet-report.
pdf (accessed 20 September 2022).
37 The Royal Society. 2019 Protecting privacy in practice: The current use, development and limits of Privacy Enhancing
Technologies in data analysis. See https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/
privacy-enhancing-technologies-report.pdf (accessed 30 June 2022).
• PETs can promote human flourishing through Noise: noise refers to a random alteration of
enabling new and innovative ways of data/values in a dataset so that the true data
governing data, as well as promoting safe points (such as personal identifiers) are not as
and secure data use. The Department for easy to identify.
Digital, Culture, Media and Sport (DCMS), the
Privacy budget (also differential privacy
Centre for Data Ethics and Innovation (CDEI),
budget, or epsilon): a quantitative measure
Office for AI, regulators and civil society can
of the change in confidence of an individual
consider how PETs play a role in wider data
having a given attribute.
governance structures, including how they
operate alongside new data governance Privacy-preserving synthetic data (PPSD):
models such as ‘data trusts’. synthetic data generated from real-world
data to a degree of privacy that is deemed
Key terms and definitions acceptable for a given application.
This report draws on multidisciplinary concepts
Private Set Intersection (PSI): secure multiparty
from cryptography, business, cybersecurity,
computation protocol where two parties
ethics and analytics. Included here is a
compare datasets without revealing them in
quick reference glossary of key terms used
an unencrypted form. At the conclusion of the
throughout.
computation, each party knows which items
they have in common with the other. There are
Differential privacy: security definition which
some scalable open-source implementations of
means that, when a statistic is released, it
PSI available.
should not give much more information about
a particular individual than if that individual had Secure multi-party computation (SMPC or
not been included in the dataset. See also MPC): a subfield of cryptography concerned
‘privacy budget’. with enabling private distributed computations.
MPC protocols allow computation or analysis
Distributed Ledger Technology (DLT): an
on combined data without the different parties
open, distributed database that can record
revealing their own private inputs to the
transactions between several parties efficiently
computation.
and in a verifiable and permanent way. DLTs are
not considered PETs, though they can be used Synthetic data: data that is modelled to
(as some PETs) to promote tra nsparency by represent the statistical properties of original
documenting data provenance. data; new data values are created which, taken
as a whole, reproduce the statistical properties
Epsilon (Ɛ
Ɛ): see ‘privacy budget’.
of the ‘real’ dataset.
Homomorphic encryption (HE): a property that
Trusted Execution Environment (TEE): secure
some encryption schemes have, so that it is
area of a processor that allows code and data
possible to compute on encrypted data without
to be isolated and protected from the rest of
deciphering it.
the system such that it cannot be accessed
Metadata: data that describes or provides or modified even by the operating system or
information about other data, such as time and admin users. Trusted execution environments
location of a message (rather than the content are also known as secure enclaves.
of the message).
Left
© iStock
credit. / Poike.
38 The British Academy and the Royal Society. 2017 Data management and use: Governance in the 21st century. See
https://royalsociety.org/-/media/policy/projects/data-governance/data-management-governance.pdf (accessed 28
July 2022).
39 Wolf LE 2018. Risks and Legal Protections in the World of Big-Data. Asia Pac J Health Law Ethics. 11, 1-15. https://www.
ncbi.nlm.nih.gov/pmc/articles/PMC6863510/
40 Jain P, Gyanchandani M, Khare N. 2016 Big data privacy: a technological perspective and review. Journal of Big Data
3, 25.
41 The British Academy and the Royal Society. 2017 Data management and use: Governance in the 21st century. See
https://royalsociety.org/-/media/policy/projects/data-governance/data-management-governance.pdf (accessed 28
July 2022).
42 The Israel Academy of Sciences and Humanities and The Royal Society. 2017 Israel-UK privacy and technology
workshop note of discussions. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/
(accessed 20 September).
43 This is distinct from aggregation across a population or group.
44 Nissenbaum H. 2010 Privacy In Context: Technology, Policy, and the Integrity of Social Life. Stanford: Stanford
Law Books.
45 Bhajaria N. 2022 Data privacy: A runbook for engineers. Shelter Island: Manning.
A specific definition of privacy may be less What are privacy enhancing technologies
useful than considering what privacy is for46 (PETs)?
and what is at stake by examining potential PETs are an emerging set of technologies
PETs are an
downstream harms. The loss of privacy may and approaches that enable the derivation of emerging set
also be considered intrinsically harmful to useful results from data without providing full of technologies
an individual. access to the data. In many cases, they are
and approaches
tools for controlling the likelihood of breach or
Data privacy, data protection and information disclosure. This potentially disruptive suite of that enable the
security tools could create new opportunities where derivation of
Data privacy is related to information security, the risks of using data currently outweigh the useful results
but there are important differences. Information benefits. PETs can reduce the threats typically
security focuses on external adversaries associated with collaboration48, motivating new
from data
and the prevention of undesired access to partnerships – for example, between otherwise without providing
information47. Security is a necessary condition competing organisations. For this reason, full access to
for data privacy, but privacy also entails the PETs have more recently been described as
the data.
legitimate and fair use of (secure) data. Data Partnership Enhancing Technologies49 and
security relates to protecting data as an asset, Trust Technologies50.
whereas data privacy is more concerned with
protecting people: ensuring the rights of data
subjects follow their data.
46 Zimmermann C. 2022 Part 1: What is Privacy Engineering? The Privacy Blog. 10 May 2022. See https://the-privacy-
blog.eu/2022/05/10/part1-what-is-privacy-engineering/ (accessed 20 September 2022).
47 According to NIST, security is ‘[t]he protection of information and information systems from unauthorized access, use,
disclosure, disruption, modification, or destruction in order to provide confidentiality, integrity, and availability.’ National
Institute of Standards and Technology (Computer security resource center). See https://csrc.nist.gov/glossary/term/is
(accessed 20 September 2022).
48 World Economic Forum. 2019 The next generation of data-sharing in financial services: Using privacy enhancing
technologies to unlock new value. See https://www3.weforum.org/docs/WEF_Next_Gen_Data_Sharing_Financial_
Services.pdf (accessed 20 September 2022).
49 Lunar Ventures, Lundy-Bryan L. 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
50 Infocomm Media Development Authority (Singapore grows trust in the digital environment). See https://www.imda.gov.
sg/news-and-events/Media-Room/Media-Releases/2022/Singapore-grows-trust-in-the-digital-environment (accessed
5 June 2022).
51 Information and Privacy Commissioner of Ontario and Registratiekamer (Netherlands) 2008. Privacy-Enhancing
Technologies: The Path to Anonymity. Volume 1.
52 European Union Agency for Cybersecurity (Data Protection: Privacy enhancing technologies). See https://www.enisa.
europa.eu/topics/data-protection/privacy-enhancing-technologies (accessed 20 September 2022).
53 National Institute of Standards and Technology (NIST Privacy Engineering Objectives and Risk Model Discussion
Draft). See https://www.nist.gov/system/files/documents/itl/csd/nist_privacy_engr_objectives_risk_model_discussion_
draft.pdf (accessed 20 September 2022).
54 World Economic Forum. 2019 The Next Generation of Data-Sharing in Financial Services: Using Privacy Enhancing
Techniques to Unlock New Value). See https://www3.weforum.org/docs/WEF_Next_Gen_Data_Sharing_Financial_
Services.pdf (accessed 20 September 2022).
55 Organisation for Economic Co-operation and Development (Recommendation of the Council on Enhancing Access
to and Sharing of Data). See https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0463 (accessed 20
September 2022).
56 Hurst A. 2022 UN launches privacy lab pilot to unlock cross-border data sharing benefits. Information Age. 25
January 2022. See https://www.information-age.com/un-launches-privacy-lab-pilot-to-unlock-cross-border-data-
sharing-benefits-19414/ (accessed 20 March 2022).
57 Infocomm Media Development Authority (Singapore grows trust in the digital environment). See https://www.imda.gov.
sg/news-and-events/Media-Room/Media-Releases/2022/Singapore-grows-trust-in-the-digital-environment (accessed
5 June 2022).
58 HM Government (U.K. and U.S. governments collaborate on prize challenges to accelerate development and
adoption of privacy-enhancing technologies). See https://www.gov.uk/government/news/uk-and-us-governments-
collaborate-on-prize-challenges-to-accelerate-development-and-adoption-of-privacy-enhancing-technologies
(accessed 13 June 2022).
59 Ibid.
FIGURE 1
Taxonomy of harms
Insider threats
Security
(data disclosed or reused for intended or
violation
unintended purposes)
Data is used
By others for processing and analytics
Data Poisoning/classifier
Model inversion/
De-anonymisation: reconstruction influence/trojan:
reconstruction
EXAMPLE individuals’ dataset is altered to
attack: the data used
ATTACKS identities are disrupt the robustness
Tracing to train a model is
revealed or integrity, warping
attack reconstructed
outcomes
DOWNSTREAM HARMS
DOMAIN OF HARM
KEY
Source: Royal Society meetings with Working Group for Privacy Enhancing Technologies, November 2021 and April 2022.
Interest in PETs for international data transfer As well as online voting (see Use case 5.3,
and use page 95), PRIViLEDGE developed a number
A fragmented array of legal requirements of toolkits and prototypes61, including privacy-
covers data use across the globe. As of preserving data storage using ledgers (data
March 2022, there are 157 countries with data residing on a blockchain) and secure multi-party
protection laws, entailing various stipulations computation (SMPC) on distributed ledgers,
for data transfer and use60. PETs can provide which allows two or more parties to compute
means for secure collaboration across borders, using a ledger as a communication channel.
preventing unauthorised access to datasets; Many of these resources have been opened
however, data use is still subject to local legal further development.
requirements. PETs do not provide ‘loopholes’
to data protection laws in the UK. Rather, State-level collaborations to accelerate PETs
PETs can be used as tools to help data users include the Digital Trust Centre (DTC), launched
comply with regulatory requirements, such in 2022 in Singapore62, 63. The DTC is set
as anonymisation. While this report refers to lead Singapore’s efforts in research and
primarily to current UK GDPR, it restricts legal development for ‘trust technologies’, such as
commentary to high-level observations, noting PETs, which provide solutions for data sharing
ongoing data reform in the UK and international and evaluation of trustworthy AI systems. This
relevance of PETs in other jurisdictions. national effort includes sandbox environments,
academic-enterprise partnerships and national
Accelerating PETs development: Sprints, and international collaborations between
challenges and international collaboration research institutes. As a founding member of
Other PETs development initiatives include the Global Partnership for AI (GPAI), Singapore
the PRIViLEDGE project, funded by Horizon intends to use this platform to enhance its
Europe between 2017 and 2021. The project contributions to GPAI.
aimed to develop cryptographic protocols in
support of privacy, anonymity and efficient These initiatives have the potential to drive
decentralised consensus using distributed innovation and are raising the profile of PETs for
ledger technologies (DLTs). privacy, partnership and trust. This will be key
in motivating new users and creating a wider
marketplace for PETs. The following section
focuses on the UK public sector, describing
enabling factors and barriers in the adoption
of PETs.
60 Greenleaf G. 2022 Now 157 Countries: Twelve Data Privacy Laws in 2021/22. Privacy Laws & Business International
Report 1, 3—8. See https://ssrn.com/abstract=4137418 (accessed 24 May 2022).
61 Livin L. 2021 Achievements of the priviledge project. Priviledge blog. 30 June 2021. See https://priviledge-project.eu/
news/achievements-of-the-priviledge-project (accessed 30 June 2022).
62 Infocomm Media Development Authority (Singapore grows trust in the digital environment). See https://www.imda.gov.
sg/news-and-events/Media-Room/Media-Releases/2022/Singapore-grows-trust-in-the-digital-environment (accessed
5 June 2022).
63 The DTC will serve as implementation partner for an international collaboration between the Centre of Expertise of
Montreal for the Advancement of Artificial Intelligence (CEIMIA) and the Infocomm Media Development Authority
(IMDA) in Singapore. This partnership seeks to develop solutions to demonstrate how PETs can help organisations
leverage cross-institution and cross-border data.
BOX 1
A series of challenges, technology sprints and collaborative projects have propelled the
development of PETs in financial services. The World Economic Forum has outlined potential
uses for PETs in determining creditworthiness, identifying collusion, or flagging fraudulent
transactions between multiple banks64. Financial information sharing is key in tackling financial
crime, which amounts to around $1.6 trillion annually (between 2-5% of the global GDP). This
requires collaboration and data sharing in a way that safeguards client data, adheres to legal
requirements and does not compromise competitive advantage of banking institutions.
In the UK, the Financial Conduct Authority • Using secure multi‑party computation to
(FCA) explored potential use cases for PETs uncover patterns of suspicious transactions
such as secure multi-party computation in across networks involving multiple banking
enabling data-based financial crime detection institutions, or to highlight transactional
and prevention, launching a TechSprint on mismatches in risky categories, such as
Global Anti-Money Laundering and Financial account names;
Crime in July 201965, 66.
• Using federated learning to improve risk
assessment between multiple banks by
This event included over 140 active
enabling sharing of typologies;
participants, and concluded with ten proofs
of concept, including: • Using pseudonymised and hashed
• Using homomorphic encryption to enable customer data to enable sharing and cross-
banks to share and analyse sensitive referencing, to highlight potential areas of
information in order to uncover money- concern or for further investigation.
laundering networks, or to support the
These demonstrations illustrate how PETs
identification of existing and new financial
can be used for a particular end goal: to
crime typologies, or to allow banks to
identify criminal behaviour in order to target
distinguish good from bad actors through
enforcement action. While this use case
question-and-answer when onboarding
is applauded by those working to tackle
new clients;
financial crime, it is worth considering how the
same methods might be used for surveillance
of other behaviours (for example, to profile
customers for targeted advertisements,
or for enhanced credit scoring).
64 World Economic Forum. 2019 The Next Generation of Data-Sharing in Financial Services: Using Privacy Enhancing
Techniques to Unlock New Value). See https://www3.weforum.org/docs/WEF_Next_Gen_Data_Sharing_Financial_
Services.pdf (accessed 20 September 2022).
65 Financial Conduct Authority (2019 Global AML and Financial Crime TechSprint). See https://www.fca.org.uk/events/
techsprints/2019-global-aml-and-financial-crime-techsprint (accessed 20 September 2022).
66 Cook N. 2019 It takes a network to defeat a network: tech in the fight against financial crime. Royal Society blog. 19
September 2022. See https://royalsociety.org/blog/2019/09/it-takes-a-network-to-defeat-a-network/ (accessed 16
February 2022).
Left
© iStock / PeopleImages.
67 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
68 Centre for Data Ethics and Innovation (Privacy Enhancing Technologies Adoption Guide). See https://cdeiuk.github.io/
pets-adoption-guide/ (accessed 20 September 2022).
69 Gartner (Gartner identifies the top strategic technology trends for 2022). See https://www.gartner.com/en/newsroom/
press-releases/2021-10-18-gartner-identifies-the-top-strategic-technology-trends-for-2022 (accessed 20 September
2022). Note that in Gartner’s analysis PETs are defined similarly to this report.
70 Lunar Ventures (Lundy-Bryan L). 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
71 Ibid.
72 Ibid.
73 Ibid.
74 Melis L, Song C, De Cristofaro E, Shmatikov V. 2018 Inference attacks against collaborative learning. Preprint. See
https://www.researchgate.net/publication/325074745_Inference_Attacks_Against_Collaborative_Learning (accessed
20 September 2022).
75 See Use case 1.1, page 57.
76 See Use case 6, page 97.
77 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
Legal and technical friction points prevent timely For these reasons, collaborative analysis has
and straightforward access to public sector been predicted by one firm as the largest
data, limiting its value as a public resource. new technology market to develop in the
PETs that allow the sending or processing of current decade80. Cloud services are one
datasets internationally could be key to realising substantial market already being impacted
the value of data use across institutions through the widespread use of Trusted
and borders, which has been estimated to Execution Environments (TEEs), which allow
be between $3-5 trillion USD annually78. for data processing and analysis in a secure
Governments and data-holding organisations environment with restricted access81. TEEs
are beginning to understand this value in terms can provide an application domain for SMPC,
of both economic and social benefits, and are enabling collaborative analysis of confidential
seeking technology-based tools to enable datasets82. Given its role in secure and
collaboration79. The same PETs could also collaborative analysis, confidential cloud could
enhance data use across departments within an be an area of significant market growth in the
organisation, whether for reuse or when subject near future83, 84.
to further restrictions (as with International Traffic
in Arms Regulations compliance in the US).
78 McKinsey. 2013 Collaborating for the common good: Navigating public-private data partnerships. See https://
www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/collaborating-for-the-common-
good#:~:text=Overall%2C%20McKinsey%20estimates%20that%20connecting (accessed 18 July 2022).
79 World Economic Forum. 2019 The Next Generation of Data-Sharing in Financial Services: Using Privacy Enhancing
Techniques to Unlock New Value). See https://www3.weforum.org/docs/WEF_Next_Gen_Data_Sharing_Financial_
Services.pdf (accessed 20 September 2022).
80 Lunar Ventures (Lundy-Bryan L.) 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
81 Gartner (Gartner Top Strategic Technology Trends for 2021). See https://www.gartner.com/smarterwithgartner/gartner-
top-strategic-technology-trends-for-2021 (accessed 26 September 2022).
82 Geppert T, Deml S, Sturzenegger D, Ebert N. 2022 Trusted Execution Environments: Applications and Organizational
Challenges. Front. Comput. Sci. 4 (https://doi.org/10.3389/fcomp.2022.930741)
83 Gartner (Gartner Top Strategic Technology Trends for 2021). See https://www.gartner.com/smarterwithgartner/gartner-
top-strategic-technology-trends-for-2021 (accessed 26 September 2022).
84 The Confidential Computing Consortium, which is run by the Linux Foundation, is promoting the use of TEEs in cloud
services internationally. The Consortium includes every large cloud provider (Alibaba, Baidu, Google Clous, Microsoft,
Tencent), demonstrating confidential computing as a priority to leaders in digital technology. Confidential Computing
Consortium Defining and Enabling Confidential Computing (Overview). See https://confidentialcomputing.io/wp-
content/uploads/sites/85/2019/12/CCC_Overview.pdf (accessed 15 March 2022).
Barriers to PETs adoption: User awareness A lack of understanding around PETs within
and understanding in the UK public sector wider data protection requirements means
A number of barriers prevent the widespread stakeholders are hesitant to adopt them88. For
use of PETs for data protection and example, anonymised personal data is not
collaborative data analysis in the UK public subject to the principles of data protection
sector. The first obstacle is general knowledge requirements detailed in the UK GDPR or
and awareness of PETs, their benefits and EU GDPR89, 90; however, in the UK, there is
potential use cases85, 86. Researchers and no universal test of anonymity. Technology-
analysts are often familiar with traditional specific guidance may be useful in interpreting
privacy techniques (such as anonymisation, requirements and best practices in emerging
pseudonymisation, encryption and data technologies, for example, how archived
minimisation); for some, it is unclear what PETs synthetic data should be handled91. Currently,
can add to these approaches. organisations must turn to assessments by
internal or external parties for guidance. These
PETs that enable collaborative analysis include uncertainties lead to a culture of risk-aversion
some of the most technically complex and described by some UK public bodies92.
least used to date (such as secure multi‑party Without assurance or technical standards,
computation and federated learning). While some question the genuine security PETs
PETs may be some of the most promising, offer, particularly where privacy threats and
the risk inherent to using new and poorly adversaries are undefined or hypothetical93.
understood technologies is a strong
disincentive to adoption: few organisations,
particularly in the public sector, are prepared to
experiment with privacy87.
85 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
86 Lunar Ventures (Lundy-Bryan L.) 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
87 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
88 Ibid.
89 Information Commissioner’s Office (What is personal data?). See https://ico.org.uk/for-organisations/guide-to-data-
protection/guide-to-the-general-data-protection-regulation-gdpr/key-definitions/what-is-personal-data/#:~:text=If%20
personal%20data%20can%20be,subject%20to%20the%20UK%20GDPR (accessed 20 September 2022).
90 GDPR Info (EU GDPR Recital 26). See https://gdpr-info.eu/recitals/no-26/ (accessed 20 September 2022).
91 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
92 Ibid.
93 Ibid.
Where organisations are unable to assess Other barriers are institutional in nature. For
privacy trade-offs for a given PET or application, example, where technical expertise does
Where
cost-benefit analysis becomes impractical. As exist in-house, these individuals are often
organisations are a result, the PETs value proposition remains organisationally removed from decision-
unable to assess speculative and the business case for adopting makers97. Foundational data governance
PETs is unclear. Demonstrations are needed issues, such as data quality and interoperability,
privacy trade-offs
to establish the potential benefit of PETs, for are primary concerns for many organisations
for a given PET example, through case studies that include and, as such new, unknown technologies are
or application, cost-benefit analyses94. The use cases and deprioritised. Compute power is also a practical
cost-benefit examples in Chapter Four (page 56) provide limiting factor, particularly with energy-intensive
a starting point for such an approach. approaches such as homomorphic encryption98.
analysis becomes
impractical. According to those interviewed, market Barriers to PETs adoption: Vendors and
confidence could be enhanced through expertise
better data readiness and the development The development of PETs requires a deep
of standards (Chapter Three)95. PETs are understanding of cryptography. However, unlike
subject to relevant legal frameworks and other computing-related fields (such as software
existing regulators, such as the ICO in the UK. engineering), the cutting edge of cryptography
However, they are not specifically regulated as remains largely in academia. This leads to
technologies, and their efficacy is ‘illegible’ to a gap between cryptography expertise and
non-experts. Standards could be followed by market drivers, such as cost and convenience.
assurance and certifications. Implementation As a result, theoretical cryptography ‘risks
frameworks for PETs would allow some over-serving the market on security’99.
elements of decision-making to be outsourced, Bridging the gap between cryptography
although additional expertise will likely be talent and entrepreneurs could create viable
required in practice96. PETs vendors.
94 Ibid.
95 Ibid.
96 Ibid.
97 Ibid.
98 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
99 Lunar Ventures (Lundy-Bryan L.) 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
Conclusions
A flourishing PETs market will require both trust
in the technology and users’ ability to discern
appropriate applications. PETs vendors can
help address scepticism by integrating PETs
in wider data governance approaches, rather
than promoting one-size-fits-all solutions. Where
public sentiment around the use of PETs is
unknown, further research – including focus
groups or public dialogues – could be used
toward ensuring end-user acceptance of (and
demand for) the technologies102.
100 British Computing Society (The Alliance for Data Science Professionals: Memorandum of Understanding July 2021).
See https://www.bcs.org/media/7536/alliance-data-science-mou.pdf (accessed 2 September 2022).
101 OpenMined (The Private AI Series). See https://courses.openmined.org/ (accessed 7 October 2022).
102 The Royal Society. Creating trusted and resilient data systems: The public perspective. (to be published online
in 2023)
103 Lunar Ventures (Lundy-Bryan L.) 2021 Privacy Enhancing Technologies: Part 2—the coming age of collaborative
computing. See https://docsend.com/view/db577xmkswv9ujap?submissionGuid=650e684f-93eb-4cee-99e8-
12a92d5d88a0 (accessed 20 September 2022).
104 London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market readiness, enabling
and limiting factors. The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-
technologies/
TABLE 2
PETs described in this report and their function with regard to security and collaborative analysis105.
What does this PET do? Allows the use, or analysis, of encrypted data Allows data to be used or analysed within a
without decrypting it. secure, isolated environment.
In what circumstances would it be To create meaningful insights in computation When data needs to be stored securely, or to
used? without revealing the contents of a dataset generate insights from data without revealing
to those running the analysis (which could be the dataset to party running the analysis or
done by a trusted third-party). hosting the TEE.
Whose data is being protected and The data held by the institution running the The data held by the institution running the
from whom? computation is being protected from whoever research can only be decrypted and used
runs the analysis, whether a third-party or the within the TEE, and only used by approved
institution themselves. If the third-party were code. The TEE is protected from outside
to act in bad faith, they would not have access environment, including the operating system
to the data in question. and admin users.
Whose interests are being The data controller They have an interest to The data controller They have an interest to
protected and what are they? carry out their computation in the safest and carry out their research in the safest and most
most effective way possible. effective way possible.
The data subjects Those who the data is The data subjects Those who the data is
about have an interest in making sure their about have an interest in making sure their
data is not accessed by bad actors. data is not accessed by bad actors.
Relevance to security and Security Data is protected from unauthorised Security Data is protected from unauthorised
collaborative analysis access. access.
105 Modified from Hattusia 2022 The current state of assurance in establishing trust in PETs.
The Royal Society. See https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/ (accessed 20 September 2022).
106 A type of HE called ‘multi-key FHE’ can perform a similar function: several parties each have a secret key and can encrypt their own data, which is
sent to a trusted third party who for computation. The result can be decrypted by all parties who contributed data to the process.
This allows multiple parties to run analysis Mostly for use with large data sets, DP This allows for the training of an algorithm
on their combined data, without revealing allows institutions to reveal data or derived across multiple devices or datasets held on
the contents of the data to each other106. information to others without revealing servers.
sensitive information about the groups or
individuals represented in the data set.
Removes the need for a trusted central An institution may want to share analytical An organisation wants to train a machine
authority that would have access insights that they have derived from learning model, but has limited training
to everyone’s data. Rather, multiple their data with another group or with the data available. They ‘send’ the model to
organisations can keep their data sets public, but their data set contains sensitive remote datasets for training; the model
private from each other, but still run joint information which should be kept private. returns having benefitted from those
analysis on the combined data. datasets.
Each collaborating organisation holds Sensitive information about the groups or Each collaborating organisation holds
data about individuals (or other sensitive individuals present in the dataset is being data about individuals (or other sensitive
data), and that data is protected from those protected from whoever the data is being data) and that data is protected from those
collaborating on analysis. The data also is shared with or analysed by, whether that’s collaborating on analysis. Only the trained
protected from any potential misconduct or a trusted third-party, the general public, or model is exchanged.
incompetence from any of the parties. the institution themselves.
The collaborating organisations They The data controller They have an interest The collaborating organisations They
have an interest to carry out their research to carry out their research and share have an interest to carry out their research
in the safest and most effective way data in the safest and most effective way in the safest and most effective way
possible. possible. possible.
Security Data is protected from Security Data is protected from Security Data is protected from
unauthorised access. unauthorised access. unauthorised access.
Collaborative analysis Multiple parties Collaborative analysis There is potential Collaborative analysis Federated learning
can work on datasets held by parties of for open access to the data without is also called collaborative learning;
‘mutual distrust’; the data remains safe from revealing the presence or attributes of multiple parties are required.
unwarranted interference. individuals.
Left
© iStock / olaser.
The 2020 Edelman Trust Barometer110 identified Global standards have been effective in
two types of trust: cybersecurity and privacy; likewise, encryption-
• Moral – the trustor believes the trustee can based PETs may rely on encryption standards.
articulate and act on the best interests of Similar approaches may be feasible where
the trustor and; risk of disclosure is quantifiable, such as with
differential privacy.
• Competence – the trustor believes the
trustee has the ability to deliver on what has
been agreed.
107 Hattusia 2022 The current state of assurance in establishing trust in PETs. The Royal Society. See https://royalsociety.
org/topics-policy/projects/privacy-enhancing-technologies/ See also Table 3.
108 Ibid.
109 Zimmermann C. 2022 Part 1: What is Privacy Engineering? The Privacy Blog. 10 May 2022. See https://the-privacy-
blog.eu/2022/05/10/part1-what-is-privacy-engineering/ (accessed 20 September 2022).
110 Edelman (2020 Trust Barometer). See https://www.edelman.com/trust/2020-trust-barometer (accessed 15
February 2022).
TABLE 3
Assurances and trust relationships in the use of PETs in privacy-preserving data governance systems.
PETs users The technology itself; Have the executives or Will the PET fulfil its Technical assurance
(eg, engineers or collaborators; external PETs vendors prescribed expected technical Technological
data scientists) actors; organisation’s the right PET for the function? specifications
executives (decision- application, such that it demonstrating the PET
Will the data remain
makers) or PETs vendors functions in a privacy- will function as intended.
secure from outside
(if using). preserving way?
actors who want access Assurance in the
to it? application
The use of the PET is
appropriate for the given
use case; the PET is part
of wider responsible data
governance.
Executives and PETs users; PETs vendors; N/A Are the developers Technical assurance
PETs vendors PETs developers; the competent in delivering Professional qualifications
(those technology itself. a fit-for-purpose detailing the PET user’s
‘diagnosing’ technology? ability.
use cases and
Will the PET fulfil its Technical assurance
deploying PETs)
expected function? Technological
specifications
demonstrating the PET
will function as intended.
Data subjects The data governance Will personal data be Will data remain safe Assurance in the
(the people whom ecosystem of used in accordance with from interference from application
the data is about) organisations that collect intent, and not lead to unauthorised users? The PET is used as part
and use their data increased surveillance of wider responsible data
and exploitation? governance.
111 National Institute of Standards and Technology (Roadmap to the Privacy Framework). See https://www.nist.gov/
privacy-framework/roadmap (accessed 15 March 2022).
112 The AI Standards Hub is led by the Alan Turing Institute with support from the British Standards Institution and the
National Physical Laboratory. HM Government (New UK initiative to shape global standards for Artificial Intelligence).
See https://www.gov.uk/government/news/new-uk-initiative-to-shape-global-standards-for-artificial-intelligence
(accessed 19 March 2022).
BOX 2
TABLE 4
Standards
development
Name Number organisation Date published
Information security, cybersecurity and privacy protection – ISO/IEC 27002:2022 ISO and IEC Feb 2022
Information security controls
Security techniques – Extension to ISO/IEC 27001 and ISO/IEC ISO/IEC 27701:2019 ISO and IEC Aug 2019
27002 for privacy information management – Requirements and
guidelines
Information technology – Security techniques – Code of practice for ISO/IEC 29151:2017 ISO and IEC Aug 2017
personally identifiable information protection
113 See for example the Professional Evaluation and Certification board training courses https://pecb.com/en/education-and-certification-for-individuals.
Training
available Description Reference to PETs
Focus on ICT systems for PII PETs used as privacy controls; refers to PETs ‘such as
pseudonymization, anonymization or secret sharing’. Briefly
mentions HE in regards to encryption.
Requirements for a systems/software ‘Organizations should also put in place policies on the
engineering for privacy following: Privacy enhancing technologies and techniques:
Which technologies the organization uses, and how and when
these technologies are used.’
Description of privacy enhancing data de- Content on homomorphic encryption, differential privacy and
identification techniques and measures to be synthetic data.
used in accordance with ISO/IEC 29100.
Information security guidelines specifically for Recommends to ‘consider whether, and which, privacy
PII. enhancing technologies (PETs) may be used.’
Guidance on de-identification, suggests Suggestions of use of differential privacy and synthetic data.
motivated intruder tests.
TABLE 4 (continued)
Standards
development
Name Number organisation Date published
UK Jul 2012
The Anonymisation Decision-Making Framework: European
Anonymisation
Practitioners’ Guide
Network
The NIST Privacy Framework: A Tool for Improving Privacy through NIST Jan 2020
Enterprise Risk Management
Roadmap for Advancing the NIST Privacy Framework: A Tool for NIST Jan 2020
Improving Privacy through Enterprise Risk Management
Training
available Description Reference to PETs
Framework for anonymisation by an open Suggestions of use of differential privacy and synthetic data.
group, led by academics at the University of
Manchester.
TABLE 5
HE IT Security techniques – Encryption algorithms – Part 6: Homomorphic encryption ISO/IEC 18033-6:2019 ISO/IEC
HE Information security – Encryption algorithms – Part 8: Fully Homomorphic ISO/IEC AWI 18033-8 ISO/IEC
Encryption
TEEs IEEE Standard for Technical Framework and Requirements of Trusted Execution IEEE 2830-2021 IEEE
Environment based Shared Machine Learning
TEEs Standard for Secure Computing Based on Trusted Execution Environment P2952 IEEE
TEEs Information technology – Trusted platform module library ISO/IEC 11889-1:2015 ISO/IEC
DP Privacy enhancing data de-identification terminology and classification of ISO/IEC 20889:2018 ISO/IEC
techniques
SMPC Information technology – Security techniques – Secret sharing – Part 1: General ISO/IEC 19592-1:2016 ISO/IEC
SMPC Information technology – Security techniques – Secret sharing – Part 2: ISO/IEC 19592-2:2017
Fundamental mechanisms
SMPC Information security – Secure multi-party computation – Part 1: General ISO/IEC CD 4922-1.2 ISO/IEC
SMPC Information security – Secure multi-party computation – Part 2: Mechanisms ISO/IEC WD 4922-2.3 ISO/IEC
based on secret sharing.
SMPC IEEE Recommended Practice for Secure Multi-Party Computation IEEE 2842-2021 IEEE
May 2019 Standard Looks at two PHE algorithms, appropriate parameters and the process of homomorphically operating on
the encrypted data.
Mar 2018 Standard Standard produced by an open consortium of industry, government and academia.
May 2009 Standard Originally made for mobile phone TEEs, but applicable more generally, setting out core requirements,
best practice and examples.
Oct 2018 Standard Highly technical standard used extensively in industry products.
Standard Internet of Things (IoT) certification for hardware, software and devices. This is used In the
standardisation of TEE hardware (e.g. ARM TrustZone).
Oct 2021 Standard Standard on the applied use of TEEs in privacy preserving machine learning done using third parties and
MPC.
Aug 2015 Standard A four-part standard on trusted platform modules, a related technology, developed by an industry
collaboration and later adopted by ISO/IEC.
Nov 2018 Guidance Discusses differential privacy as a metric and also related noise addition methods.
Dec 2021 Project General explainer on DP in 12 parts, concluding with a statement that they have plans to use it as a
foundation on which to develop technical guidelines.
May 2018 Guidance Example academic paper sharing a framework for developing DP algorithms.
Oct 2017 Standard Covers five secret sharing algorithms that meet requirements of message confidentiality and
recoverability.
Nov 2021 Standard A ‘technical framework’ for SMPC including security levels and use cases.
Project Industry (and academic) collaboration, sets out goals to produce best practice and terminology guidance
for a standard project authorization request for a synthetic data privacy and accuracy standard.
May 2022 Guidance An academic review of synthetic data as a technology highlighting some of the challenges.
Measuring privacy and utility in PETs Using a single privacy metric also risks over-
One potential barrier in developing PETs simplification, failing to adequately address
standards is achieving consensus on metrics all relevant harms (as privacy metrics can only
for privacy and utility. There are many different account for one harm at a time).
metrics that can be used for privacy; one
review categorises over 80 privacy metrics and Threat modelling can be used to identify
suggests a method of how to choose them114. potential risks, attacks or vulnerabilities in a
data governance system. Threat models are
The cybersecurity community uses security constantly evolving as attacks reach new levels
metrics. Encryption, for example, has security of sophistication. For example, anonymisation
metrics such as key length, which estimate originally meant zero risk of reidentification.
the computing power it would take to break However, increasingly sophisticated
encryption and therefore the degree of security reidentification techniques, such as those
provided. SDOs are also interested in privacy that make use of statistical approaches and
metrics, as in Privacy enhancing data de- publicly available datasets, are changing the
identification terminology and classification of requirements of adequate anonymisation117.
techniques (ISO/IEC 20889), which concerns
differential privacy and its use as a measure. Considering these constraints, the best
However, privacy-utility trade-offs vary approach may be technical standards and
according to context, making metrics and metrics where feasible (as with encryption or
thresholds difficult to generalise115, 116. noise addition algorithms), complemented
by scenario-based guidance, assessment
protocols and codes of conduct.
114 Wagner I, Eckhoff D. 2018 Technical Privacy Metrics: A Systematic Survey. See https://arxiv.org/abs/1512.00327
(accessed 20 September 2022). Note that more general mathematical approaches also exist, which aim for a
definition of privacy more like that of epsilon in differential privacy. One example of this is Pufferfish, a self-professed
framework for mathematical privacy definitions, which can be used in the context of PETs: Kifer D, Machanavajjhala
A. 2014 Pufferfish: a framework for mathematical privacy definitions. ACM Transactions on Database Systems 39,
1—36. (https://doi.org/10.1145/2514689).
115 Lee J, Clifton C. 2011 How Much Is Enough? Choosing ε for Differential Privacy (conference paper). See https://link.
springer.com/chapter/10.1007/978-3-642-24861-0_22 (accessed 23 April 2022).
116 Abowd JM, Schmutte IM. 2019 An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices.
Am Econ Rev. 109, 171—202.
117 A useful analysis of the UK’s approach to anonymisation in data protection regulation can be found in: Bristow and
Privitar. 2021 Introduction to Anonymisation. See https://www.bristows.com/app/uploads/2021/07/Introduction-to-
Anonymisation-Privitar-and-Bristows.pdf (accessed 20 September 2022).
BOX 3
A type of attack-based risk assessment, this Motivated intruder testing can provide
is an established method for assessing the a degree of assurance. However, this
efficacy of a privacy regime (or PET). This approach cannot provide a quantitative
requires anticipation of 1) the technologies measure of assurance. In the past, a
and methods that might be used to attack motivated intruder has been defined
the data; 2) the vulnerabilities of a given as someone without specialist skills or
PET to various attacks; 3) the kinds of computing power, which may not be a
knowledge that could enable the attack; realistic adversary for some data sets (such
4) the goals of potential attacks and how as highly desirable datasets). More explicit
they might cause harms. An exhaustive guidance on testing, including choosing
list and test of every attack is not feasible. what and how to test a PET, could be
Rather, it is important to know what kind of included either in process standards or
attacks are most possible and most likely. PETs guidance.
Because it is impossible to anticipate every
scenario, even motivated intruder testing Testing does not remove the need for
does not provide a guarantee of privacy. expert users and developers. Social and
Nonetheless, it has been the primary educational infrastructure must be in place
legal test in determining whether data is to educate data scientists (and privacy
identifiable or not. professionals) on PETs and risk assessment.
Left
Elizabeth Line, London, UK. © Unsplash / Kevin Grieve.
© iStock / simonkr.
to improve and save lives118.
FIGURE 2
Aggregation
A B C D
Local updates
(A, B, C, D)
A B C D
Preserving privacy in medical imaging for MRI imaging and metadata can reveal sensitive
research and diagnostics information about a patient. Indeed, even an
Magnetic Resonance Imaging (MRI) is a type individual’s presence in a dataset may be
of scan that produces detailed images of the sensitive. While the images themselves may
inside of the body and internal organs by using be de-identified through removal of names,
strong magnetic fields and radio waves. The addresses and scan date, neuroimages can
images produced by MRI scanning provide sometimes be reidentified (as demonstrated in
critical information in the diagnosis and staging a 2019 Mayo Clinic study)123.
of disease progression. Sets of MRI images can
be used to train machine learning algorithms
to detect certain features or abnormalities in
images. This technology can be deployed to
screen large numbers of images for research
purposes: identifying patterns that link
variables like patient behaviour, genetics, or
environmental factors with brain function.
123 Schwarz C G et al. 2019 Identification of Anonymous MRI Research Participants with Face-Recognition Software. N
Engl J Med. 381, 1684—1686. (https://doi.org/10.1056/nejmc1908881
Privacy solutions that enable collaboration In either approach, all users’ models are
Federated learning is a type of remote improved by ‘learning’ from remote datasets,
execution in which models are ‘sent’ to remote which are themselves never revealed. By
data-holding machines (eg, servers) for local using federated learning, raw data is not
training. This can allow researchers to use shared, which rules out the most common
data at other sites for training models without issues associated with data protection. At the
accessing those data sets. For example, if same time, federated learning does not offer
researchers at different universities hold perfect privacy; models are still vulnerable to
neuroimaging data, a federated approach some advanced attacks. These attacks may
would allow them to train models on all be of a sufficiently low risk to be acceptable
participants’ imaging data, even as that data to the parties such that they can proceed.
remains ‘invisible’ to analysts. This is an Other safeguards may also be put in place.
example of Federated Machine Learning These could include detecting when repeated
(see Figure 2). queries are made of an MRI dataset, which
could be cross-referenced with public data to
There are two approaches to accomplishing reidentify subjects.
Federated Machine Learning in this case:
• In one approach, each site analyses its own
data and builds a model; the model is then
shared to a remote, centralised location (a
node) common to all researchers involved.
This node then combines all models into one
‘global’ model and shares it back to each
site, where researchers can use the new,
improved model124;
BOX 4
COINSTAC allows users who cannot directly Additionally, TReNDS researchers are
share their data to collaboratively run developing optimised algorithms for deep
open, reproduceable federated learning learning to reduce transmission bandwidth
and coordinated pre-processing using without sacrificing accuracy. In a third
software packages that can run in any example, brain age estimation algorithms
environment (such as personal devices, were trained to predict actual subject age
private data centres, or public clouds). It using neuroimaging; this was then applied
uses containerised software (software which to estimate the biological brain age of new
runs all necessary code within one subjects129. This is useful because large
environment that is executable regardless gaps between estimation of biological
of host operating system and is therefore brain age and actual age are potential
consistent across platforms). This software is biomarkers of brain disorders such as
available on GitHub under an MIT license127. Alzheimer’s disease. This model achieved
results that were statistically equivalent to
COINSTAC developers have documented centralised models.
several case studies. In one study, a
federated analysis using datasets from TReNDS is also currently developing a
Europe and India found structural changes network of COINSTAC vaults, which will
in brain grey matter linked to age, smoking, allow researchers to perform federated
and body mass index (BMI) in adolescents128. analysis with multiple large, curated
Another case study uses a federated neural datasets. This open science infrastructure
network classifier to differentiate smokers will enable rapid data reuse, create more
from non-smokers in resting-state functional generalisable models on diverse datasets,
MRI (fMRI) data. The federated models and democratise research by removing
typically achieve results similar to those barriers to entry for small or under-
using pooled data and better than those resourced groups.
drawing data only from isolated sites.
Conclusions
Large, robust, international neuroimaging
datasets are required for training machine
learning models. These datasets exist around
the world in various institutions. Securely
using remote datasets to train machine
learning models could transform research in
this field. Further, safeguarding the privacy of
imaging subjects could increase participation
in research, enhancing the diverse, large-
scale data required to make future strides
in neuroscience.
130 Differential privacy and federated learning can be combined in two ways: output perturbation (where noise is added
to the output of an optimisation algorithm) and objective perturbation (noise is added at every step of the optimisation
algorithm). The latter may hold more functionality but requires identical pre-processing across sites and good local
feature mapping.
BOX 5
131 Health Data Research UK (HDR UK Strategic Delivery Plan 2021/22). See https://www.hdruk.ac.uk/wp-content/
uploads/2021/02/Strategic-Delivery-Plan-2021_22.pdf (accessed 7 October 2022).
132 White T, Blok E, Calhoun V D. 2020 Data sharing and privacy issues in neuroimaging research:
Opportunities, obstacles, challenges, and monsters under the bed. Hum Brain Mapp. 43,
278—291. (https://doi.org/10.1002/hbm.25120)
133 Veale M, Binns R, Edwards L. 2018 Algorithms that remember: model inversion attacks and data protection law. Philos
T R Soc A. 376. (https://doi.org/10.1098/rsta.2018.0083).
134 First proposed by Fredrikson M, Jha S, Ristenpart T. 2015 Model Inversion Attacks that Exploit Confidence Information
and Basic Countermeasures. See https://rist.tech.cornell.edu/papers/mi-ccs.pdf (accessed 6 September 2022).
135 Veale M, Binns R, Edwards L. 2018 Algorithms that remember: model inversion attacks and data protection law. Philos
T R Soc A. 376. (https://doi.org/10.1098/rsta.2018.0083)
136 Melis L, Song C, De Cristofaro E, Shmatikov V. 2018 Exploiting Unintended Feature Leakage in Collaborative
Learning. See https://arxiv.org/abs/1805.04049 (accessed 10 October 2022).
BOX 6
137 The honest-but-curious adversary is ‘a legitimate participant in a communication protocol who will not deviate from
the defined protocol but will attempt to learn all possible information from legitimately received messages’, as defined
in Paverd A, Martin A, Brown I. Modelling and Automatically Analysing Privacy Properties for Honest-but-Curious
Adversaries. See https://www.cs.ox.ac.uk/people/andrew.paverd/casper/casper-privacy-report.pdf (accessed 10
September 2022).
138 Melis L, Song C, De Cristofaro E, Shmatikov V. 2018 Exploiting Unintended Feature Leakage in Collaborative
Learning. See https://arxiv.org/abs/1805.04049 (accessed 10 October 2022).
139 The Information Commissioner’s Office. Guidance on the AI auditing framework: Draft guidance for consultation.
See https://ico.org.uk/media/2617219/guidance-on-the-ai-auditing-framework-draft-for-consultation.pdf (accessed 20
September 2022).
140 The Royal Society. 2019 Protecting privacy in practice: The current use, development and limits of Privacy Enhancing
Technologies in data analysis. See https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/
privacy-enhancing-technologies-report.pdf (accessed 30 June 2022).
141 National Health Service Digital (Improving our Data Processing Services). See https://digital.nhs.uk/data-and-
information/data-insights-and-statistics/improving-our-data-processing-services (accessed 15 May 2022).
BOX 6 (continued)
142 Github (Privacy Trust Lab Privacy Meter). See https://github.com/privacytrustlab/ml_privacy_meter (accessed 10
September 2022).
143 Trusted research environments can vary significantly in scope and security guarantee.
144 Wroge TJ, Özkanca Y, Demiroglu C, Si D, Atkins D C, Ghomi RH. 2018 Parkinson’s disease diagnosis using machine
learning and voice. See https://www.ieeespmb.org/2018/papers/l01_01.pdf (accessed 23 April 2022).
145 König A, Satt A, Sorin A, Hoory R, Toledo-Ronen O, Derreumaux A, Manera V, Verhey F, Aalten P, Robert PH, David
R. Automatic speech analysis for the assessment of patients with predementia and Alzheimer’s disease. Alzheimers
Dement. 1, 112—124. (https://doi.org/10.1016/j.dadm.2014.11.012)
146 Haulcy R, Glass J. 2021 Classifying Alzheimer’s Disease Using Audio and Text-Based Representations of Speech.
Front. Psychol. Sec. Human-Media Interaction. 11 (https://doi.org/10.3389/fpsyg.2020.624137)
147 University College London (Meet the C-PLACID Audio-Recording Research Team). See https://www.ucl.ac.uk/drc/c-
placid-study/audio-recording-c-placid/meet-c-placid-audio-recording-research-team (accessed 1 September 2022).
148 Fagherazzi G, Fischer A, Ismael M, Despotovic V. 2021 Voice for health: The use of vocal biomarkers from research to
clinical practice. Digit Biomark. 5, 78—88. (https://doi.org/10.1159/000515346)
149 Arora A, Baghai-Ravary L, Tsanas A. 2019 Developing a large scle population screening tool for the assessment of
Parkinson’s disease using telephone-quality voice. J Acoust Soc Am. 145 5 2871. (https://doi.org/10.1121/1.5100272)
150 Examples include: Mozilla Labs (Common Voice). See https://labs.mozilla.org/projects/common-voice/ (accessed 15
August 2022); Google Audio Set: Gemmeke JF et al. 2017 Audio Set: An ontology and human-labaled dataset for
audio events. Proc. IEEE ICASSP 2017 New Orleans. See https://research.google/pubs/pub45857/, https://research.
google.com/audioset/dataset/index.html (accessed 2 June 2022), and open data sets such as Oxford University’s
VoxCeleb: Oxford University (VoxCeleb). See https://www.robots.ox.ac.uk/~vgg/data/voxceleb/ (accessed 14
May 2022).
151 Haulcy R, Glass J. 2021 Classifying Alzheimer’s Disease Using Audio and Text-Based Representations of Speech.
Front. Psychol. Sec. Human-Media Interaction. 11 (https://doi.org/10.3389/fpsyg.2020.624137)
Privacy preservation in biometric audio data Privacy-preserving synthetic data (PPSD) may
When using biometric audio data, PETs should be generated from audio recordings prior
be layered with audio-specific approaches to sharing or querying156. However, this is
to anonymisation. For example, voice an emerging application of PPSD157, 158. New
transformation techniques may be used to synthetic datasets may need to be created
alter a patient’s voice quality152. Transcription specific to various research queries159, 160, which
of audio data can be automated using AI- could become costly.
based applications (eg Google Cloud’s Speech
API), then scanned using a machine learning Conclusions
algorithm that tags identifiers such as names, Voice recognition technology is becoming
dates, ages, or geographical location. By ever more sophisticated, such that speaker
highlighting identifiable elements, identifiers can identification is now feasible even under noisy
be swiftly redacted. conditions. These methods may be applied
even where masking techniques such as
Audio data collection techniques may include transformation have been used161. Without
phone or web-based recording153; these can greater sharing of audio data there is a risk that
entail potential for eavesdropping. Voice Over audio-trained models become biased according
IP (VOIP) can include end-to-end homomorphic to language-, accent-, age-, and culture-specific
encryption, ensuring that no other parties listen biomarkers. This could be countered through
during data collection154. It is also possible to open and crowd-sourced initiatives162, which
encrypt voice data for cloud storage155, or to could be rolled out most safely with PETs.
split voice data into random fragments, which
are each processed separately.
152 Jin Q, Toth AR, Schultz T, Black AW. 2009 Voice convergin: Speaker de-identification by
voice transformation. 2009 IEEE International Conference on Acoustics, Speech and Signal
Processing. (https://doi.org/10.1109/icassp.2009.4960482)
153 Fagherazzi G, Fischer A, Ismael M, Despotovic V. 2021 Voice for health: The use of vocal biomarkers from research to
clinical practice. Digit Biomark. 5, 78—88. (https://doi.org/10.1159/000515346)
154 In this case, VOIP signals from multiple parties are mixed at a central server, improving the scalability of the solution
and protecting the data held on the central server, were the server to be compromised. See: Rohloff K, Cousins D
B, Sumorok D. 2017 Scalable, Practical VoIP Teleconferencing with End-to-End Homomorphic Encryption. IEEE T Inf
Foren Sec. 12, 1031—1041. (https://doi.org/10.1109/tifs.2016.2639340)
155 Shi C, Wang H, Hu Y, Qian Q, Zhao H. 2019 A speech homomorphic encryption scheme with less data expansion in
cloud computing. KSII T Internet Inf. 13, 2588—2609. (https://doi.org/10.3837/tiis.2019.05.020)
156 Fazel A et al. 2021. SynthASR: Unlocking Synthetic Data for Speech
Recognition. (https://doi.org/10.48550/arXiv.2106.07803)
157 Tomashenko N et al. 2020 Introducing the VoicePrivacy initiative. See https://doi.org/10.48550/arXiv.2005.01387
(accessed 30 March 2022).
158 Shevchyk A, Hu R, Thandiackal K, Heizmann M, Brunschwiler T. 2022 Privacy preserving synthetic respiratory sounds
for class incremental learning. Smart Health. 23. (https://doi.org/10.1016/j.smhl.2021.100232)
159 Fazel A et al. 2021 SynthASR: Unlocking Synthetic Data for Speech Recognition. See https://doi.org/10.48550/
arXiv.2106.07803 (accessed 10 October 2022).
160 Rossenbach N, Zeyer A, Schlüter R, Ney H. 2020 Generating synthetic audio data for attention-based speech
recognition systems. See https://doi.org/10.48550/arXiv.1912.09257 (accessed 10 October 2022).
161 Chung J S, Nagrani A, Zisserman A. 2018 VoxCeleb2: Deep speaker recognition. See https://doi.org/10.48550/
arXiv.1806.05622 (accessed 2 September 2022).
162 Fagherazzi G, Fischer A, Ismael M, Despotovic V. 2021 Voice for health: The use of vocal biomarkers from research to
clinical practice. Digit Biomark. 5, 78—88. (https://doi.org/10.1159/000515346)
USE CASE 2
163 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero. See https://
royalsociety.org/-/media/policy/projects/digital-technology-and-the-planet/digital-technology-and-the-planet-report.
pdf (accessed 20 September 2022).
164 Dietz M, Putz B, Pernul G. 2019 A Distributed Ledger approach to Digital Twin secure data sharing. See https://core.
ac.uk/download/pdf/237410573.pdf (accessed 27 September 2022).
FIGURE 3
Data is needed from a range of sources to develop, evaluate, and ‘fuel’ a digital twin of the UK
energy system. Emerging privacy and security concerns must be addressed to allow the safe flow
of data between digital twin models and real-world assets.
Energy generation
Buildings
E-storage
Agriculture
Industry
HOSPITAL
Hospitals
E-vehicle
charge points
165 Catapult Energy Systems. 2019 A strategy for a Modern Digitalised Energy System: Energy Data Taskforce report. See
https://esc-production-2021.s3.eu-west-2.amazonaws.com/2021/07/Catapult-Energy-Data-Taskforce-Report-A4-v4AW-
Digital.pdf (accessed 27 September 2022).
166 In one smart heating example, analysts demonstrated the ability to uncover users’ sleeping patterns, location within a
home, even whether a user was sitting or standing. While this level of detail goes beyond what is possible with typical
smart metering, it is one example where perceived potential invasiveness of smart fixtures in the home may prevent
uptake of this technology: Morgner P, Müller C, Ring M, Eskofier BM. 2017 Privacy Implications of Room Climate Data.
Lecture Notes in Computer Science vol 10493. See https://doi.org/10.1007/978-3-319-66399-9_18 (accessed 27
September 2022).
167 Dietz M, Putz B, Pernul G. 2019 A Distributed Ledger approach to Digital Twin secure data sharing. See https://core.
ac.uk/download/pdf/237410573.pdf (accessed 27 September 2022).
168 Catapult Energy Systems (Energy Digitalisation Taskforce publishes recommendations for a digitalised Net Zero
energy system). See https://es.catapult.org.uk/news/energy-digitalisation-taskforce-publishes-recommendations-for-a-
digitalised-net-zero-energy-system/ (accessed 22 September 2022).
169 University of Cambridge (Centre for Digital Built Britain). See https://www.cdbb.cam.ac.uk/subject/information-
management-framework-imf (accessed 20 September 2022).
170 More specifically, one of CReDo’s aims is to trial the MFI to evaluate the framework’s capacity to operate at a
national level.
Privacy solutions should be implemented at • Appliance and usage patterns that might
several critical points in the coupled digital be used in unsolicited targeted marketing,
twin-asset ecosystem. This use case focusses for example, ads or messages prompting
on energy consumption, where private data consumers to have their boiler serviced.
may disclose:
While these inferences could be made
• What appliances are used and when171;
using contemporary smart meter data,
• What behaviour patterns might be revealed future versions may take readings at shorter
by consumers’ energy usage – particularly intervals, allowing for detection of which
occupancy patterns172; appliances are used, or which TV channels are
watched (through discernible electromagnetic
• What information might be inferred about the
interference signatures)175.
building / utilities and other features, leading
to security risks in national energy systems
Individual privacy solutions: Smart meter
assets173;
data privacy
• How energy companies’ processing Smart meter data is personal data176. Privacy
algorithms might give away proprietary concerns around smart meter data have gained
knowledge and commercially sensitive attention with the roll-out of devices in Europe
behavioural insights; and the UK177, However, smart meter data
holds substantial value for renewable energy
• What billing or other pseudonymised
integration: there is no other way of measuring
records might reveal private information
energy consumption in real time, or so close to
about consumers174, including consumer
consumer end-use.
responsiveness to changes in price;
171 Molina-Markham A, Shenoy P, Fu K, Cecchet E, Irwin D. 2010 Private memoirs of a smart meter. Proceedings
of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building. See https://doi.
org/10.1145/1878431.1878446 (accessed 2 September 2022).
172 Lisovich MA, Mulligan DK, Wicker SB. 2010 Inferring Personal Information from Demand-Response Systems. IEEE
Secur Priv. 8, 11—20. (http://dx.doi.org/10.1109/MSP.2010.40)
173 Beckel C, Sadamori L, Staake T, Santini S. 2014 Revealing household characteristics from smart meter data. Energy.
78 397—410. (http://dx.doi.org/10.1016/j.energy.2014.10.025)
174 Jawurek M, Johns M, Rieck K. 2011 Smart metering de-pseudonymization. ACSAC 2011 Proceedings of the 27th
Annual Computer Security Applications Conference. See http://doi.acm.org/10.1145/2076732.2076764 (accessed 20
March 2022).
175 Enev M, Gupta S, Kohno T. 2011 Televisions, video privacy, and powerline electromagnetic interference. See http://doi.
acm.org/10.1145/2046707.2046770 (accessed 2 September 2022).
176 The UK government’s Smart Metering Implementation Programme (2018) outlined the smart metering Data Access
and Privacy Framework, which aimed to ‘safeguard consumers’ privacy, whilst enabling proportionate access to
energy consumption data’. HM Government. 2018 Smart metering implementation programme: Review of data
access and privacy framework. See https://assets.publishing.service.gov.uk/government/uploads/system/uploads/
attachment_data/file/758281/Smart_Metering_Implementation_Programme_Review_of_the_Data_Access_and_
Privacy_Framework.pdf (accessed 22 September 2022).
177 For example: Pöhls HC, Staudemeyer RC. 2015 Privacy enhancing techniques in Smart City applications. See https://
cordis.europa.eu/docs/projects/cnect/4/609094/080/deliverables/001-RERUMdeliverableD32Ares20153669911.pdf
(accessed 26 September 2022).
178 UN Conference of European Statisticians. 2019 Protecting Consumer Privacy in Smart Metering by Randomized
Response. See https://unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.46/2019/mtg1/SDC2019_S4_URV_
Protecting_Consumer_Privacy_AD.pdf (accessed 24 August 2022).
179 Cuijpers C, and Koops B-J. 2013 Smart Metering and Privacy in Europe: Lessons from the Dutch Case. In: Gutwirth S,
Leenes R, de Hert P, Poullet Y. (eds) European Data Protection: Coming of Age. Berlin: Springer, Dordrecht.
Government, regulators and national Data coming from physical assets may be
security harms used to control the grid and national power
Combined summary statistics of energy data distributions. TEEs – potentially coupled with
sets will be key to maximising the benefits of an homomorphic encryption – could safeguard
energy digital twin. Privacy-preserving synthetic collaborative cloud computing from attacks,
data (PPSD) could be used to share relevant protecting security to critical national
properties of rich microdata – in essence, how infrastructure181. Homomorphic encryption
the datasets relate to one another – collected can be highly compute-intensive and would
through smart systems. Simpler, differentially require significant development to be used at
private summary statistics could be shared a large scale.
(where the privacy-utility trade-off would
be more transparent). This would enable
decision-making by government and regulators
without releasing full datasets. However, the
utility and privacy trade-offs of PPSD must
be better understood and will be highly
case‑dependent180.
FIGURE 4
TEEs are secure areas inside a processor, which are isolated from the rest of the system. The
code contained in the TEE cannot be read by the operating system, nor the hypervisor (a
process that separates a computer’s operating system and applications from the underlying
physical hardware).
App App
Operating system
CODE DATA
Hypervisor
Hardware
180 Jordon J et al. 2022 Synthetic data: What, why and how? See https://arxiv.org/pdf/2205.03257.pdf (accessed 2
September 2022).
181 Archer et al. 2017 Applications of homomorphic encryption. See https://www.researchgate.net/
publication/320976976_APPLICATIONS_OF_HOMOMORPHIC_ENCRYPTION/link/5a051f4ca6fdcceda0303e3f/
download (accessed 23 April 2022).
Commercially sensitive data solutions for Ofgem and other regulatory bodies should
digital twins ensure that data usage reflects consumer
Energy providers could use insights from smart interests. In a digital twin, this could entail
meter data to provide new service models allowing users to audit and challenge their
(eg heating as a service). In addition to SMPC, smart meters’ outputs, for example183. Where
federated learning could allow users’ data to algorithms are trained on real-time data, every
stay localised while training models are used effort must be made to ensure sections of the
by energy providers. For example, a machine population are not over- or under-represented,
learning model could be sent to individual as this could reproduce systemic biases and
smart home systems and ‘learn’ locally about promote inaccuracies. A consumer consent
certain energy consumption patterns in order dashboard, such as the one proposed by the
to predict demand182. Energy Digitalisation Taskforce184 in the UK,
may provide a greater sense of control and
Conclusions encourage consumer trust.
Digital twins hold significant potential in
enabling the net zero transition. A privacy-
enhanced digital twin using PETs should
be bolstered with basic security measures,
including the physical restriction of access to
critical infrastructure, servers and computers
(eg using hardware keys). For PETs to be
embedded into the realisation of an energy
digital twin, data protection regulation and
related guidance should consider what
mandates or advice would be effective and
ethical in promoting the uptake of smart meters.
182 Fuller A, Fan Z, Day C, Barlow C. 2020. Digital Twin: Enabling Technologies, Challenges and Open Research. IEEE
Access. 8, 108952—108971. (https://doi.org/10.1109/access.2020.2998358)
183 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero. See https://
royalsociety.org/-/media/policy/projects/digital-technology-and-the-planet/digital-technology-and-the-planet-report.
pdf (accessed 20 September 2022).
184 HM Government 2022. Energy Digitalisation Taskforce report: joint response by BEIS, Ofgem and Innovate UK. See
https://www.gov.uk/government/publications/digitalising-our-energy-system-for-net-zero-strategy-and-action-plan/
energy-digitalisation-taskforce-report-joint-response-by-beis-ofgem-and-innovate-uk (accessed 24 August 2022).
USE CASE 3
185 Statista (Number of internet and social media users worldwide as of July 2022). See https://www.statista.com/
statistics/617136/digital-population-worldwide/ (accessed 18 August 2022).
186 The Royal Society. 2022 The online information environment: Understanding how the internet shapes people’s
engagement with scientific information. See https://royalsociety.org/-/media/policy/projects/online-information-
environment/the-online-information-environment.pdf?la=en-GB&hash=691F34A269075C0001A0E647C503DB8F
(accessed 30 March 2022).
187 See Lomborg S, Bechmann A. 2014 Using APIs for Data Collection on Social Media. The Information Society 30 4
256—265. (https://doi.org/10.1080/01972243.2014.915276)
188 For example: Giglietto F, Rossi L, Bennato D. 2012 The Open Laboratory: Limits and Possibilities of Using
Facebook, Twitter, and YouTube as a Research Data Source. Journal of Technology in Human Services. 30,
145–159. (https://doi.org/10.1080/15228835.2012.743797)
189 Teodorescu H-N. 2015 Using analytics and social media for monitoring and mitigation of social disasters. Procedia
Engineer. 107 325—334. (https://doi.org/10.1016/j.proeng.2015.06.088)
190 Harvard Kennedy School Misinformation Review (Tackling misinformation: What researchers could do with social
media data). See https://misinforeview.hks.harvard.edu/article/tackling-misinformation-what-researchers-could-do-
with-social-media-data/ (accessed 20 November 2021).
191 Gundecha P, Barbier G, Huan L. 2011 Exploiting Vulnerability to Secure User Privacy on a Social Networking Site.,
Proceedings of the 17th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, KDD,
2011. 511–519.
192 Sobkowic P; Kaschesky M; Bouchard G. 2012 Opinion mining in social media: Modeling, simulating, and forecasting
political opinions in the web. Gov Inform Q. 29, 470–479. (https://doi.org/10.1016/j.giq.2012.06.005)
193 For example, Meta’s Graph API. Meta for Developers (Graph API Overview). See https://developers.facebook.com/
docs/graph-api/overview/ (accessed 17 July 2022).
194 Twitter Developer Platform (Volume streams). See https://developer.twitter.com/en/docs/twitter-api/tweets/volume-
streams/introduction (accessed 27 September 2022).
BOX 9
195 La Nauze A, Severnini ER. 2021 Air pollution and adult cognition: Evidence from brain training. See https://www.nber.
org/system/files/working_papers/w28785/w28785.pdf (accessed 30 April 2022).
196 Pila E, Mond JM, Griffiths S, Mitchison D, Murray SB. 2017 A thematic content analysis of #cheatmeals images on
social media: Characterizing an emerging dietary trend. Int J Eat Disord. (https://doi.org/10.1002/eat.22671)
197 Meta (Data for Good: New Tools to Help Health Researchers Track and Combat COVID-19). See https://about.fb.com/
news/2020/04/data-for-good/ (accessed 27 September 2022).
198 Office for National Statistics Data Science Campus (Using Facebook data to understand changing mobility patterns).
See https://datasciencecampus.ons.gov.uk/using-facebook-data-to-understand-changing-mobility-patterns/ (accessed
24 August 2022).
199 Humanitarian Data Exchange (Future of Business Survey—Aggregated Data). See https://data.humdata.org/dataset/
future-of-business-survey-aggregated-data (accessed 21 February 2022).
200 Humanitarian Data Exchange (Survey on Gender Equality At Home). See https://data.humdata.org/dataset/survey-on-
gender-equality-at-home (accessed 21 February 2022).
201 COVID-19 Mobility Data Network (Facebook Data for Good Mobility Dashboard). See https://visualization.
covid19mobility.org/?date=2021-09-24&dates=2021-06-24_2021-09-24®ion=WORLD (accessed 27
September 2022).
202 Rao D, Yarowsky D, Shreevats A, Gupta M. 2010 Classifying latent user attributes in twitter. See https://www.cs.jhu.
edu/~delip/smuc.pdf (accessed 30 March 2022).
203 Tang J, Zhang Y, Sun J, Rao J, Yu W, Chen Y, and Fong A C M. 2012 Quantitative Study of Individual Emotional States
in Social Networks. IEEE T Affect Comput. 3, 132–144.
204 Hays J, Efros A. 2008 Im2gps: estimating geographic information from a single image. Proceedings of the IEEE Conf.
on Computer Vision and Pattern Recognition (CVPR) 2008. http://graphics.cs.cmu.edu/projects/im2gps/im2gps.pdf
(accessed 27 September 2022).
205 Jahanbakhsh K, King V, Shoja GC 2012. They Know Where You Live! See https://arxiv.org/abs/1202.3504 (accessed 10
October 2022).
206 Silva TH, de Melo POSV, Almeida JM, Musolesi M, Loureiro AA F 2014. You are What you Eat (and Drink): Identifying
Cultural Boundaries by Analyzing Food & Drink Habits in Foursquare. See https://arxiv.org/abs/1404.1009 (accessed
27 September 2022).
207 Narayanan A, Shmatikov V 2009. De-anonymizing social networks. See https://www.cs.utexas.edu/~shmat/shmat_
oak09.pdf (accessed 15 August 2022).
208 Li R, Wang S, Deng H, Wang R, Chang K C C. 2012 Towards social user profiling: Unified and discriminative influence
model for inferring home locations. KDD 2012: Proceedings of the 18th ACM SIGKDD International conference on
Knowledge Discovery and Data Mining. 1023–1031. (https://doi.org/10.1145/2339530.2339692)
BOX 10
209 Rosenberg M, Dance GJX. 2018 You Are the Product’: Targeted by Cambridge Analytica on Facebook. New York
Times. 8 April 2018. See https://www.nytimes.com/2018/04/08/us/facebook-users-data-harvested-cambridge-
analytica.html (accessed 14 May 2022).
210 Lawmakers publish evidence that Cambridge Analytica work helped Brexit group. Reuters. 16 April 2018. See https://
www.reuters.com/article/us-facebook-cambridge-analytica-britain/lawmakers-publish-evidence-that-cambridge-
analytica-work-helped-brexit-group-idUSKBN1HN2H5 (accessed 2 March 2022).
211 Kelly H. 2018 California just passed the nation’s toughest data privacy law. CNN. 29 June 2018. See https://money.
cnn.com/2018/06/28/technology/california-consumer-privacy-act/index.html (accessed 16 March 2022).
212 Ion M et al. 2017 Private Intersection-Sum Protocol with Applications to Attributing Aggregate Ad Conversions. See
https://eprint.iacr.org/2017/738.pdf (accessed 25 March 2022).
Differential privacy can be used to safeguard PETs may also be used to share social
datasets for release to researchers by media data between researchers, or to
obscuring information pertaining to specific enable open access social media databases
users in a dataset . In social media datasets, without compromising privacy. For example,
this could mean sharing regional or other centralised data stores could be built and
cohort-based data to prevent reidentification queried. This could include specific attributes,
of individuals. There are limitations around keywords, locations or other demographics in
combining data (such as layering spatial data a centralised model. Homomorphic encryption
using maps) from multiple sources, alongside or other cryptographic tools may be applied to
the addition of noise. This is one area for social network data, allowing researchers to
further research213. query to the data holders without requesting
data. The data holder could then run the
Facebook’s Data for Good programme214, query and release differentially private results.
launched in 2017, has used differential privacy Synthetic data may also be used to release
to provide access to researchers studying versions of datasets.
crucial topics, including disease transmission,
humanitarian responses to natural disasters and
extreme weather events. Where public datasets
are considered sensitive in aggregation, noise
is added to prevent reidentification using
a Differential Privacy Framework215, 216, 217.
Facebook’s Data for Good programme has
received criticism for its execution; researchers
have been denied access to the programme,
or provided with inaccurate data, invalidating
months of research218.
213 With regard to mobility data, for example, ‘as various providers stack up different sources of data in a collaborative
project such as the Network, it often erodes corrections made for differential privacy noise in a single dataset.’ Open
Data Institute COVID-19 Mobility Data Network’s Use of Facebook Data for Good Mobility Data. See http://theodi.
org/wp-content/uploads/2021/04/5-COVID-19-Mobility-Data-Networks-Use-of-Facebook-Data_v2.pdf (accessed 7
October 2022).
214 Facebook (Data for Good). See https://dataforgood.fb.com/ (accessed 18 August 2022).
215 Facebook Research (Privacy protected data for independent research on social media data). See https://research.
fb.com/blog/2020/02/new-privacy-protected-facebook-data-for-independent-research-on-social-medias-impact-on-
democracy/ (accessed 2 September 2022).
216 Jin KX, McGorman L. Data for Good: New tools to help health researchers track and combat COVID-19. Facebook
News. 6 April 2020. See https://about.fb.com/news/2020/04/data-for-good/ (accessed 15 March 2022).
217 Facebook Research (Protecting privacy in Facebook mobility data during the Covid-19 response). See https://
research.fb.com/blog/2020/06/protecting-privacy-in-facebook-mobility-data-during-the-covid-19-response/ (accessed
23 September 2022).
218 Moon M. Facebook has been giving misinformation researchers incomplete data. Engadget. See https://www.
engadget.com/facebook-misinformation-researchers-incomplete-data-050143486.html (accessed 30 August 2022).
BOX 11 Conclusions
In this use case, PETs are used as tools
PETs for transparency: for privacy and confidentiality, as well as
Twitter and OpenMined partnership accountability and transparency through
for algorithmic accountability external audit. While social media data is not
usually sold, social media business models
In January 2022, Twitter’s ML Ethics, depend on personal data – and derived
Transparency, and Accountability (META) insights – collected and analysed through
team announced a partnership with opaque processes. A privacy-enhanced
OpenMined to explore the use of PETs for strategy for enhancing access to data and
public accountability over social media data. increasing transparency will improve user trust
OpenMined is an open-source non-profit and mitigate legal or reputational risks for social
organisation that aims to build and promote media platforms. Furthermore, the amount
the use of PETs through educating data of compute power required to analyse large
owners and making privacy-preserving social media datasets may motivate platforms
technologies more accessible to private and to use networked PETs to provide analysis as
public organisations. a service219.
The Twitter-OpenMined partnership As the types and scale of personal data shared
proposes the use of PETs as a tool for on social media continues to expand, novel
accountability. Currently, one barrier to privacy concerns will emerge. For example, the
algorithmic accountability is external linking of consumer genomics products with
researchers and third parties lack access social media platforms is increasingly popular
to proprietary algorithms and the data on sites like Ancestry.com, or opensource
they use, rendering it difficult to conduct genetics databases such as GEDmatch or
independent investigations and audits. PETs Promethease. While open DNA databases have
in this instance may allow companies to prompted some users to consider the risks
share internal algorithms and datasets for associated with making their genome public220,
algorithmic audits and replicating research, the implications of linking an individual’s DNA
while avoiding concerns around privacy, to social media metadata (such as location,
security or intellectual property. behavioural patterns or social networks) are
less understood.
The first project will involve developing
a method of replicating internal research
findings on algorithmic amplification of
political content on Twitter by using a
synthetic dataset. Long-term, Twitter
suggests they will share their actual
internal data through PETs to enable
external researchers to conduct their own
investigations on currently non-public data.
219 The Royal Society. 2022 The online information environment: Understanding how the internet shapes people’s
engagement with scientific information. See https://royalsociety.org/-/media/policy/projects/online-information-
environment/the-online-information-environment.pdf?la=en-GB&hash=691F34A269075C0001A0E647C503DB8F
(accessed 30 March 2022).
220 Mittos A, Malin B, De Cristofaro E. 2018 Systematizing genome privacy research: A privacy-enhancing technologies
perspective. See https://arxiv.org/abs/1712.02193 (accessed 23 March 2022).
USE CASE 4
221 The Digital Economy Act 2017 provides a gateway for the ONS to access the data of all public authorities and Crown
bodies in support of the production of National Statistics and other official statistics, including the census. It also
entails powers to mandate data from some UK businesses. In some (limited) circumstances, ONS-held data may also
be shared with devolved administrations for statistical purposes. HM Government (Digital Economy Act 2017). See
https://www.legislation.gov.uk/ukpga/2017/30/contents/enacted (accessed 13 May 2022).
222 HM Government (Census Act 1920). See https://www.legislation.gov.uk/ukpga/Geo5/10-11/41/contents (accessed 23
April 2022).
223 The Office for Statistics Regulation (Joining up data for better statistics). See https://osr.statisticsauthority.gov.uk/
publication/joining-up-data/ (accessed 30 March 2022).
224 For example Rocher L, Hendrickx JM, de Montjoye Y-A. 2019 Estimating the success of re-identifications in
incomplete datasets using generative models. Nat Commun 10 3069. (https://doi.org/10.1038/s41467-019-10933-3)
225 Gartner Research 2022. Top strategic technology trends for 2022: Privacy-Enhancing Computation. See https://
www.gartner.co.uk/en/information-technology/insights/top-technology-trends#:~:text=Trend%203%3A%20
Privacy%2Denhancing%20Computation,well%20as%20growing%20consumer%20concerns (accessed 23
September 2022).
226 Government Statistical Service (Examples of data linking within the government statistical service). See https://
gss.civilservice.gov.uk/examples-of-data-linking-within-the-government-statistical-service/ (accessed 23
September 2022).
227 Office for National Statistics (Office for National Statistics and the Alan Turing Institute join forces
to produce better and faster estimates of changes to our economy). See https://www.ons.gov.
uk/methodology/methodologicalpublications/generalmethodology/onsworkingpaperseries/
onsmethodologyworkingpaperseriesnumber16syntheticdatapilot (accessed 23 September 2022).
228 Office for National Statistics (Synthetic data pilot working paper). See https://www.ons.gov.
uk/methodology/methodologicalpublications/generalmethodology/onsworkingpaperseries/
onsmethodologyworkingpaperseriesnumber16syntheticdatapilot (accessed 23 September 2022).
Synthetic data can also be used to improve There are several prerequisites to
the quality of data. This is achieved through implementing PPSD. The first is a consistent
data augmentation and other techniques229 and comprehensive way to evaluate synthetic
that address incompleteness in datasets, datasets. The ONS is addressing this issue
particularly where populations are small or through a framework, which will be in the form
less represented. There are potential issues of a Python library. The framework will assess
with skew or bias in these cases, which must the performance of synthetic datasets in terms
be addressed. of both utility and privacy.
Although synthetic data techniques may be Second is the investigation and assessment
applied to virtually any data, ranging from of synthetic data generation methods. This
imagery to textual, three high-value datasets means exploring off-the-shelf methods such
illustrate the potential for this technology: as Synthpop230, as well as more sophisticated
• A high-quality synthetic version of the machine and deep learning methods such
Census-Health-Mortality dataset as Generative Adversarial Networks and
(the ‘health asset’) would allow the ONS Evolutionary Optimisation. This requires a deal
to share realistic data quickly with many of technical expertise in implementation, as well
research partners, speeding up research and as deep knowledge of the context, risk factors
innovation by allowing a wide variety of users (adversaries and threat models) and potential
to rapidly develop models and hypotheses, for downstream harms.
and build pipelines which can then be applied
to the real data for decision-making; A synthetic dataset with all the utility of the
original dataset cannot offer privacy. For this
• Synthetic versions of telecoms mobility data
reason, high-dimensional datasets (which
would enable ONS and cross-government
contain many variables) may not be suitable for
partners to fully assess the opportunities for
PSSD generation. Rather, an external researcher
this data, before going to procurement. This
or client might request a custom synthesised
would provide better value for money and
dataset pertaining to a specific question (calling
would improve official mobility-based statistics
on a limited number of attributes or variables). In
such as those relating to COVID-19 analysis
this way, greater utility may be offered without
• Synthesis of administrative data higher risk of privacy loss231.
would allow for the off-line exploration of
synthetic data allowing for a single, well PPSD may also be layered with other PETs to
defined data extract request being made enhance its privacy preserving potential. For
to the data owners. If this is not practical, a example, synthetic data can be generated with
full tested and robust data pipeline could differential privacy guarantees, offering greater
be developed to process and analyse the assurance of privacy. However, further erosion
sensitive data in situ. of utility must be considered when adding noise
to a synthetic dataset232.
229 For example, missing value imputation and removing class imbalances.
230 Synthpop (Homepage). See https://cran.r-project.org/web/packages/synthpop/vignettes/synthpop.pdf (accessed 23
September 2022).
231 Jordon J et al. 2022 Synthetic data: What, why and how? See https://arxiv.org/pdf/2205.03257.pdf (accessed 2
September 2022).
232 Jordon J, Yoon J, van der Schaar M. 2019 PATE-GAN: Generating synthetic data with differential privacy guarantees.
See https://openreview.net/pdf?id=S1zk9iRqF7 (accessed 26 September 2022).
BOX 12
The Clinical Practice Research Datalink CPRD has now made two high-fidelity
(CPRD)233 is the Medicine and Healthcare synthetic datasets available234: a
products Regulatory Agency (MHRA’s) cardiovascular disease synthetic dataset
real world data research service created and a COVID-19 symptoms and risk
to support retrospective and prospective factors synthetic dataset. Both synthetic
public health and clinical studies. CPRD datasets are generated from anonymised
is jointly sponsored by the MHRA and the real primary care patient data extracted
National Institute for Health Research (NIHR) from the CPRD Aurum database235 and
as part of the Department of Health and are available to researchers for a nominal
Social Care. administrative fee.
CPRD collects anonymised patient data The MHRA was motivated to explore
from a network of GP practices across the synthetic data generation methods to
UK. Since 2018, CPRD has working on the support regulatory requirements for
development of synthetic datasets based on external validation of machine learning
GP patient data to maximise the benefit of (ML) and AI algorithms. Anonymised health
this valuable data, while balancing privacy datasets have high utility, but still carry
concerns and preventing downstream harm residual privacy risks which limit their
to data subjects. These synthetic datasets wider access236; a fully synthetic approach
can be used as sample datasets, enabling can substantially mitigate these risks237.
third parties to develop, validate and test In some cases, synthetic data may even
analytic tools. They can also be used improve the utility of anonymised data –
for training purposes, and for improving its potential to be clinically meaningful.
algorithms and machine learning workflows. This is because anonymised data may entail
gaps, which can lead to biased inferences.
Synthetic data can be used in these cases
to supplement real data by filling the gaps
or boosting underrepresented subgroups
in the dataset238.
233 Clinical Practice Research Datalink (Homepage). See https://cprd.com/ (accessed 17 September 2022).
234 Clinical Practice Research Datalink (Synthetic data CPRD cardiovascular disease synthetic dataset). See https://cprd.
com/synthetic-data#CPRD%20cardiovascular%20disease%20synthetic%20dataset (accessed 23 September 2022).
235 CPRD Aurum contains routinely collected data from practices using EMIS Web® electronic patient record system
software. Clinical Practive Research Datalink (Primary care data public health research). See https://cprd.com/primary-
care-data-public-health-research (accessed 23 September 2022).
236 Sweeney L. 2000 Simple Demographics Often Identify People Uniquely. Carnegie Mellon University, Data Privacy
Working Paper 3.
237 Park Y, Ghosh J. 2014 PeGS: perturbed Gibbs samplers that generate privacy-compliant synthetic data. Trans Data
Privacy. 7, 253—282.
238 Wu, L., He, H., Zaïane, O. R. 2013 Utility of privacy preservation for health data publishing. Proceedings of the 26th
IEEE International Symposium on Computer-Based Medical Systems. 510—511.
Conclusions
Synthetic data can be useful for expediting
CPRD uses the Synthetic Data Generation data projects and enabling partnerships. For
and Eva luation Framework239 to guide example, organisations can test whether a
synthetic data generation. It consists of partnership is worthwhile and start building
a set of procedures, including a ground models while waiting for access (such as
truth selection process as input, a synthetic through data sharing agreements or other
data generation procedure, and an means). Whether or not synthetic data will
evaluation process. provide a stand-in for useful and sufficiently
private data for analytical use cases remains
The Synthetic Data Generation Framework an open question.
has been proven to produce effective
synthetic alternatives to ‘real’ health data. The generation of synthetic datasets,
This is particularly beneficial when 1) access even ‘good enough’ synthetic versions, is
to the ground truth data is restricted; 2) challenging. As yet, there are no standards
when the sample size is not large enough, related to privacy in PPSD generation, though
or representative of a population; 3) when emerging synthetic data standards may include
lacking machine learning, AI training or privacy metrics241. Further research is required
testing datasets. There are limitations and to quantify the privacy-utility trade-offs242. To
challenges to consider during synthetic these ends, the ONS plans to test with data
data generation outside the framework, owners and the wider data community as part
including data missingness and the of their synthetic data project.
complex interactions between variables.
The Synthetic Data Generation Framework
used by CPRD is flexible enough to allow
for generation of different types of synthetic
datasets, while at the same time enabling
researchers to demonstrate that they have
balanced data utility with patient privacy
needs.
239 Wang Z, Myles P, Tucker A. 2021 Generating and evaluating cross-sectional synthetic electronic healthcare data:
Preserving data utility and patient privacy. Comput Intell. 37, 819—851.
240 The Synthetic Data Generation and Evaluation Framework, owned by the MHRA, was developed through a grant from
the Regulators’ Pioneer Fund launched by BEIS and managed by Innovate UK. Further development of the COVID-19
synthetic data and refinement of synthetic data generation methods was funded by NHSX.
241 Institute of Electrical and Electronics Engineers (Synthetic data standards). See https://standards.ieee.org/industry-
connections/synthetic-data/ (accessed 18 August 2022).
242 One recent publication finds the privacy gain is highly variable, and utility loss unpredictable, when used in high-
dimensional datasets: Stadler T, Oprisanu B, Troncoso C et al. 2021 Synthetic Data—Anonymisation Groundhog Day.
See https://arxiv.org/abs/2011.07018 (accessed 27 September 2022).
Pets in the public sector: Collective intelligence, crime prevention and online voting
Collaborative analysis
for collective intelligence
The opportunity
A wealth of data is collected and stored
across government departments and non-
public bodies in the UK and abroad. This
data potentially holds insights that could save
substantial money, make government services
more efficient and effective, drive the transition
to net zero by 2050 (see page 67), guide
life-saving choices during a pandemic or
understand the effect of regional policies.
© iStock / vm.
challenges is sensitive. Particularly where
politically sensitive data is used, there are
inherent security risks. While there are some
special accessions to using health data
SMPC has been demonstrated using large-
during emergencies, intra-departmental
scale studies on government data since
collaboration must adhere to privacy legislation,
2015243. The performance of SMPC relates
including data protection. As such, the risk of
to the analysis, or functions, to be computed.
collaboration between departments may be
Summations (adding numbers together) are
deemed larger than potential benefits.
faster than more complex computations244.
This is a rapidly advancing technology with the
Collaborative analysis with SMPC
potential for use in long-term data governance;
Secure multi-party computation (SMPC) allows
this is because SMPC depends on access
multiple parties to jointly compute a function
control by all parties involved, meaning
using inputs from all parties, while keeping
analysis can only be performed if all parties
those inputs private. In this way, SMPC is a tool
agree. SMPC protocols ensure input privacy
for securely generating insights using data
(no information can be obtained or inferred by
held by different departments or organisations.
any party aside from their own input and the
For example, in a health study, patient data
output). As such, SMPC may provide a generic,
may be input from different hospitals, or even
standardised – and potentially certifiable –
combined with other datasets – such as social
method for computation on encrypted data245.
demographic data – without researchers ever
seeing or accessing the data directly.
243 Bogdanov D, Kamm L, Kubo B, Rebane R, Sokk V. 2015 Students and taxes: a privacy-preserving social study using
secure computation. See https://eprint.iacr.org/2015/1159.pdf (accessed 25 September 2022).
244 UN PET Lab Handbook. See https://unstats.un.org/bigdata/task-teams/privacy/UN%20Handbook%20for%20Privacy-
Preserving%20Techniques.pdf (accessed 17 July 2022).
245 Archer DW et al. 2018 From keys to databases: real-world applications of secure multi-party computation. See https://
eprint.iacr.org/2018/450 (accessed 10 October 2022).
246 Hazebroek E, Jonkers K, Segers T. 2021 Secure net NCSC partnership for rapid and safe data sharing. See https://
emagazine.one-conference.nl/2021/secure-net-ncscs-partnership-for-rapid-and-safe-information-sharing/ (accessed
23 September 2022).
247 Secretarium (Homepage) See https://secretarium.com/ (accessed 27 September 2022).
SMPC can drive public sector efficiency Only registered parties can contribute to SMPC
by allowing for safe and rapid collective analyses. Registered parties should not have
intelligence. While technical challenges such as intent to input information that is invalid (eg
performance and compute power were once reporting false information).
primary challenges to implementation, this is no
longer the case. One of the biggest challenges SMPC applications may be purchased as a
around SMPC is the understanding of legal software package, which enables different
implications, for instance the impact of EU and parties to collaborate on sensitive data through
UK GDPR requirements. Other challenges analysis ‘in the blind’. While open frameworks
include alignment of data structures and formats require deep knowledge of SMPC, suppliers
(interoperability), reliability and auditability, are trialling software that will be usable by
data availability and complexity of ongoing data scientists with no previous experience
management of SMPC248.. with SMPC.
248 The Financial Action Taskforce. 2021 Stocktake on data pooling, collaborative analytics and data protection. See
https://www.fatf-gafi.org/media/fatf/documents/Stocktake-Datapooling-Collaborative-Analytics.pdf (accessed 22
September 2022).
BOX 13
Roseman Labs in the Netherlands is encouraging the uptake of PETs in the Dutch public sector
through creative, low-risk collaborations that demonstrate the value of SPMC.
First, they identify use cases for SMPC • Reducing money laundering.
relevant to a given public body. In some Where multiple banks are able to
instances, a use case idea is generated generate graphs using transaction
through scoping conversations between data, these graphs can be compared
public sector stakeholders. The use case using SMPC for patterns that suggest
idea is then formulated into a pilot project, money laundering (namely, money going
or proof-of-concept, which can be carried in circles, or ‘smurfing’, where many
out within the low-risk public procurement small transactions that are ultimately
threshold (for example, conducting a six- deposited with one entity). This cross-
month trial). Once the economic and social bank identification of patterns is far more
value of the SMPC solution becomes clear, reliable than each individual bank looking
the public sector organisation may begin an at their own data – often generating a
informed RFI process with an aim to scale- very large number of false positives. With
up the solution long-term. this cross-bank approach, the likelihood
of spotting true positives increases and
This has resulted in successful applications, Banks and LEAs can then focus resources.
including: This allows law enforcement to set
• Increasing digital resilience with the Dutch priorities. This should open up private
National Cyber Security Centre (NCSC). partnerships between banks, for example,
The NCSC collects cybersecurity where thousands of employees are
intelligence from organisations across dedicated to identifying potential money
the Netherlands, which report risks such laundering incidents (compared to just
as hacking or ransomware incidents. hundreds at the national public sector
Organisations are not motivated to level)249.
publish data on security breaches, which
Roseman Labs has technical and in-house
could compromise their reputation and
legal expertise (complemented with external
marketability. An SMPC system now
privacy experts), meaning they are able to
allows the NCSC to collect intelligence
prescribe a data-use solution that meets
on cyber security risks from over tens of
current data protection requirements,
organisations (scaling to 15.000 over time)
helping clients to complete the Data
in the Netherlands in a private fashion:
Protection Impact (DPI) process together.
each organisation inputs data on cyber
This added value bolsters their work with
attacks and breeches in a fully anonymous
public sector clients.
and confidential way on a weekly basis.
The NCSC does not see provenance
information but is able to identify trends
and take action accordingly;
249 Roseman Labs (Secure data collaboration in financial services). See https://rosemanlabs.com/blog/financial_services.
html (accessed 10 October 2022).
FIGURE 7
The client sends encrypted data to a server, where a specific analysis is performed on the
encrypted data, without decrypting that data. The encrypted result is then sent to the client,
who can decrypt it to obtain the result of the analysis they wished to outsource.
KEY
X Encryption of x
FF X Encryption of F(x)
F X
ANALYSIS
F X
FF X
CLIENT SERVER
While the UK Government has considered the Homomorphic encryption (HE) has been
banning of end-to-end encryption in efforts demonstrated as a PET that allows for the
to stymie CSEA material sharing256, end- analysis of encrypted data, and which could be
to-end encryption offers critical benefits to used as a tool for identifying CSEA material on
private citizens and must be preserved and encrypted platforms. Apple’s planned roll-out
promoted257. Recent technical advances may of a very similar programme received criticism
provide a solution to detecting harmful content from privacy rights groups. The Apple case
without ending end-to-end encryption or the illustrates how PETs may be applied in ways
privacy of individual users. perceived to violate, rather than preserve,
privacy. This use case is intended to provide
an explanation, rather than an endorsement,
of how PETs could be used to detect illegal
material on encrypted platforms.
256 HM Government (International statement: End-to-end encryption and public safety). See https://www.gov.
uk/government/publications/international-statement-end-to-end-encryption-and-public-safety (accessed 20
September 2022).
257 The Royal Society. 2016 Progress and research in cybersecurity: Supporting a resilient and trustworthy system for
the UK. See https://royalsociety.org/-/media/policy/projects/cybersecurity-research/cybersecurity-research-report.pdf
(accessed 27 September 2022).
Alternative hashing techniques may address The ability to securely detect CSEA on mobile
these issues. One alternative is Locality devices requires the combination of several
Sensitive Hashing (LSH), which accounts for techniques. Perceptual hashing can help to
visually similar images (it intentionally hashes match images with a matching database of illicit
similar inputs to close or identical outputs). LSH material, even images with small perceptual
and similar alternatives are useful where small changes. Combining this with private set
variations in the input image are expected. intersection preserves the security of the
However, transformations of the image, such matching database, whilst providing privacy
as mirroring, could result in completely distinct for individuals. Together, these technologies
raw data and would not be matched. The could be used to develop a robust CSEA
digital definition of ‘similar’ is not necessarily detection system that does not compromise
comparable to human perceptions of similarity. end‑to‑end encryption.
Risks and challenges Legal challenges and public trust must also be
One challenge is the potential for ‘scope addressed. Law enforcement agencies would
creep’ – the adding of additional functionality need to be clear on the legal basis for running
above and beyond the detection of CSEA such systems, which may constitute a passing
material. This may include, for example, state on of their legal duties to third parties. This has
actors using the technology to counter digital implications for public trust, particularly where
piracy, digital rights management, or for national on-device screening is used.
security and surveillance purposes. Platforms
may face reputational risk and loss of users if The UK government aims to minimise the
systems were perceived as disproportionate existence of spaces online where illegal
surveillance tools. material can be securely shared. Likewise,
social media companies are motivated to
While perceptual hashing algorithms allow for ensure users are not breaching their terms
modified images to be matched against the of use, even in encrypted spaces. A PETs-
database, they are also more likely to flag false enabled system for identifying illegal material is
positives. Innocent images may appear close an alternative to privacy rollbacks such as the
enough to illicit images to be flagged by the outright banning of end-to-end encryption.
machine learning system. This could lead to
innocent people being identified as possessing
CSEA material, entailing negative impact for the
individual. The performance of the perceptual
hashing system would need to be closely
tested and monitored to measure the false
positive rates. Ultimately, a human moderator
should always verify whether flagged material
is illegal or harmful; a user should never be
charged based on the automated system
detection alone.
BOX 14
In August 2021 Apple announced new child Critics argued this could make the system
safety features to be implemented on its US vulnerable to state censorship of political
devices. Three planned changes aimed to dissent or LGBTQ+ content, or flagging of
mitigate child sexual abuse. One change innocent images, causing unnecessary
related to iCloud Photos, which would scan distress. Further criticism targeted the
images to find CSEA. While cloud service efficacy of the system: researchers reverse-
companies such as Google, Microsoft engineered the hashing algorithm and were
and Dropbox already scan material for able to create images that were falsely
CSEA, Apple planned to conduct scans on flagged by the system260.
personal iPhone devices using a technology
called NeuralHash. In September 2021, Apple announced
it was pausing implementation of CSAM
NeuralHash scans images without revealing scanning to collect feedback and make
them to moderators. It translates the image improvements. In April 2022, Apple
into a unique number (a hash) based on its announced its intention to introduce the
features. Before uploading to iCloud Photos, parental control safety feature on the
the hash is compared on-device against a Messages app on iPhones in the UK261.
database of known CSEA hashes provided
by child safety organisations. Any matches It is unclear how an image hashing program
prompt the creation of a cryptographic would operate under UK and EU data
safety voucher. If a user reaches a threshold protection law. On-device screening would
of safety vouchers, they are decrypted and likely entail explicit consent and user opt-in
shared with Apple moderators for review258. (rather than opt-out)262. User images are not
necessarily personal data under the GDPR;
The proposals were welcomed by many, they must depict identifiable living people,
including child safety organisations259. or be linked to a living person, to constitute
However, the image hashing feature personal data. However, neural hashes may
faced criticism from privacy advocates, constitute personal data. These emerging
cryptographers and other tech companies, legal questions, as well as general public
who viewed Apple’s proposals as scepticism, suggest that an on-device
introducing a backdoor on their devices. detection system may face barriers in the
UK or EU contexts.
269 Archer DW et al. 2018 From keys to databases: Real-world applications of secure multi-party computation. Comput J.
61, 1749—1771. (https://doi.org/10.1093/comjnl/bxy090)
270 PRIViLEDGE Project (Homepage). See https://priviledge-project.eu/ (accessed 30 March 2022).
USE CASE 6
271 Privacy International. 2018 The humanitarian data problem: ‘doing no harm’ in the digital era. See https://
privacyinternational.org/sites/default/files/2018-12/The%20Humanitarian%20Metadata%20Problem%20-%20
Doing%20No%20Harm%20in%20the%20Digital%20Era.pdf (accessed 10 October 2022).
272 OCHA Centre for Humanitarian Data 2021. Data Responsibility Guidelines. See https://data.humdata.org/
dataset/2048a947-5714-4220-905b-e662cbcd14c8/resource/60050608-0095-4c11-86cd-0a1fc5c29fd9/download/
ocha-data-responsibility-guidelines_2021.pdf (accessed 10 October 2022).
273 El Emam K. 2020 Viewpoint: Implementing privacy-enhancing technologies in the time of a pandemic. Journal of
Data Protection & Privacy. 3, 344—352.
274 Shainski R, Dixon W. 2020 How privacy enhancing technologies can help COVID-19 tracing efforts. World Economic
Forum Agenda. 22 May 2020. See https://www.weforum.org/agenda/2020/05/how-privacy-enhancing-technologies-
can-help-covid-19-tracing-efforts/ (accessed 10 October 2022).
275 Inter-Agency Standing Committee 2021. Operational Guidance on Data Responsibility in Humanitarian Action. See
https://interagencystandingcommittee.org/system/files/2021-02/IASC%20Operational%20Guidance%20on%20
Data%20Responsibility%20in%20Humanitarian%20Action-%20February%202021.pdf (accessed 10 October 2022).
276 Global Privacy Assembly. 2015 37th International Conference of Data Protection and Privacy Commissioners. See
http://globalprivacyassembly.org/wp-content/uploads/2015/02/Resolution-on-Privacy-and-International-Humanitarian-
Action.pdf (accessed 10 October 2022).
277 Pozen DE. 2005 The mosaic theory, national security, and the freedom of information act. Yale L J. 115.
BOX 15
278 Chae J, Thom D, Jang Y, Kim S, Ertl T, Ebert DS. 2014 Public behavior response analysis in disaster events utilizing
visual analytics of microblog data. Comput. Graph. 38, 51–60. (https://doi.org/10.1016/j.cag.2013.10.008)
279 Kryvasheyeu Y et al. Rapid assessment of disaster damage using social media activity. Sci Adv.
2. (https://doi.org/10.1126/sciadv.1500779)
280 Sakaki T, Okazaki M, Matsuo Y. Tweet analysis for real-time event detection and earthquake reporting system
development. IEEE Trans. Knowl. Data Eng., 25, 919–931. (https://doi.org/10.1109/tkde.2012.29)
281 Knox AJ et al. 2013 Tornado debris characteristics and trajectories during the 27 April 2011 super outbreak as
determined using social media data. Bulletin of the American Meteorological Society. 94, 1371—1380.
The mosaic effect risk is related to the The Centre for Humanitarian Data is focused
increased use of metadata, or data about on increasing the use and impact of data in the
other data, in humanitarian contexts. This could humanitarian sector and are interested in the
be the time and location of a message sent, potential for PETs to address the mosaic effect
rather than the content of the message itself. and enhance collaboration283. They make the
Communications with people affected by crises following recommendations:
can include social media or SMS messaging, • Technical actions Humanitarian organisations
sharing information-as-aid, mobile cash transfer should invest in further strengthening
programmes, and monitoring and evaluation metadata standards and interoperability,
systems (such as those used to detect enabling monitoring of related datasets to
fraud), all of which entail rich and potentially counter mosaic effect risks;
compromising metadata282. Privacy International
• Procedural actions A data asset registry and
has therefore recommended that humanitarian
data ecosystem mapping assessment should
organisations practice do no harm principles by
be completed as per the recommendations
understanding how the data and metadata they
included in the IASC Operational Guidance
store and use may be employed for purposes
on Data Responsibility in Humanitarian Action
beyond aid – such as for profiling, surveillance
(2021);
or political repression. This highlights the need
for mitigation tools to be developed (such as • Governance actions Sector-wide fora should
PETs) and the importance of data minimisation. be used to ensure that datasets are not
shared on different platforms at different
levels of aggregation, and determine
consistent standards for approaches such as
anonymisation and;
282 Privacy International. Humanitarian Metadata Problem Doing No Harm in the Digital Era. See https://
privacyinternational.org/sites/default/files/2018-12/The%20Humanitarian%20Metadata%20Problem%20-%20
Doing%20No%20Harm%20in%20the%20Digital%20Era.pdf (accessed 28 September 2022).
283 Weller S. 2022 Minimizing privacy risks in humanitarian data. Privitar blog. 9 March 2022. See https://www.privitar.
com/blog/fragility-forum-minimizing-privacy-risks-in-humanitarian-data/ (accessed 10 October 2022).
The role of PETs in countering the risk of The model would return to the owner
mosaic effect with improved ability to predict peoples’
PETs could
PETs could help safeguard personal data movements or locations. This type of model
while still allowing researchers to utilise it in would be incredibly valuable for humanitarian help safeguard
humanitarian efforts. Differential privacy could organisations making decisions about where personal data
be used to add ‘noise’, to make any one to direct resources during crises. There are
while still allowing
true datapoint more difficult to trace to a real already examples of federated learning being
individual. The resulting ‘noisy’ dataset can then used in medical research (see Use case 1.1, researchers
be shared between organisations more safely. page 57). Homomorphic encryption has to utilise it in
The noise can be adjusted for extra privacy been used to perform large-scale studies on humanitarian
(and reduced utility), allowing data controllers cross-border health data (see Use case 1.1,
efforts.
to make contextual privacy-utility trade-offs. page 57), including multiple institutions in
collaboration and crowdsourced materials
Federated learning could be used on (such as genomics)284.
geospatial datasets, such as people’s locations,
without sharing the data used to train the
model. This would entail training a model, or
a predictive algorithm, on a local geospatial
dataset. The model would then be shared
for training on remote datasets at other
organisations, which are never revealed to the
model owner. External organisations holding
relevant data might include telecoms, other
humanitarian organisations, or social media
sites, all of which may not have established
data partnerships or sharing agreements.
284 Blatt M, Gusev A, Polyakov Y, Goldwasser S. 2020 Secure large-scale genome-wide association studies using
homomorphic encryption. P Natl Acad Sci USA. 117, 11608—11613. (https://doi.org/10.1073/pnas.1918257117)
BOX 16
A partnership was established between Dutch law enforcement and NGOs working against
human trafficking. The law enforcement agency (LEA) wanted to shadow potential trafficking
victims from a long list of identified individuals.
However, local human trafficking NGOs The result was a random list of 20 people
also held informant lists, with potential who were candidates for LEA shadowing.
overlap between their lists and that of the A future SMPC application may include
LEA. The NGOs were concerned that their tracking the movements of potential
informants would feel confidentiality had trafficking victims across agencies and
been breached by the NGO if they were NGOs without sharing their names, to
shadowed by the LEA. The long list from law identify trends and potential trafficking
enforcement was compared to the short list routes. This approach could also shed
from the NGO ‘in the blind’ using SMPC285. light on the extent of human trafficking
crimes more widely – an issue otherwise
impossible to measure.
FIGURE 8
Using MPC, different parties send encrypted messages to each other, and obtain the model
F(A,B,C) they wanted to compute without revealing their own private input, and without the need
for a trusted central authority.
F(A,B,C)
B
A C
A C
285 Pinsent Masons. Data sharing coalition helps flag victims of human trafficking. See https://www.pinsentmasons.com/
out-law/news/data-sharing-coalition-helps-flag-victims-of-human-trafficking (accessed 7 July 2022).
Conclusions
There are no known cases of the mosaic
effect causing harm in humanitarian, crises or
development scenarios. At the same time, there
is a cost to not sharing or linking data in such
cases, particularly where lives may be saved.
Humanitarian organisations seek to consolidate
and strengthen approaches to reduce risk
through data responsibility practices and
increasing cross-organisational work (including
with NSOs [Use Case 4]).
286 Council of the European Union. 2012 Applicability of the General Data Protection Regulation to the activities of the
International Committee of the Red Cross. See http://data.consilium.europa.eu/doc/document/ST-7355-2015-INIT/en/
pdf (accessed 26 June 2022).
287 Veale M, Binns R, Edwards L. 2018 Algorithms that remember: model inversion attacks and data protection law. Philos
T R Soc A. 376. (https://doi.org/10.1098/rsta.2018.0083)
Left
Shanghai, China. © iStock / AerialPerspective Works.
Conclusions
This report sets out to refresh perspectives Given their multipurpose nature, networked
on PETs following the society’s 2019 report PETs that allow for collaborative analysis
PETs may be
Protecting privacy in practice. In doing so, might be viewed as an upgrade to traditional
used in any it considers the role of PETs beyond data systems of information sharing, such as the
scenario where protection and highlights the secondary effects internet, rather than new privacy compliance
of PETs in motivating partnerships and enabling tools. For this reason, in the future, PETs may
data benefits
collaboration across sectors and international be used for any sufficiently valuable data, not
those with borders. The risk of personal data use is just sensitive category data (such as personal
exclusive access, considered in terms of privacy attack (what is or commercially advantageous data). Rather,
or where open technically possible) as well as the severity of PETs may be used in any scenario where data
potential downstream harms of compromised benefits those with exclusive access, or where
access could
data (which is contextual). open access could cause harm. This could
cause harm. include, for example, data pertaining to natural
This could Several questions remain beyond the scope resources (to prevent over-exploitation).
of this report and suggest areas for further
include, for
research. First, very little is known about the Finally, more work is required to integrate
example, data potential market value of PETs as discreet PETs into wider data governance systems. The
pertaining to technologies, or their true significance in data tendency for PETs to be developed as discreet
natural resources use in collaborative scenarios. It is therefore technologies has led users to approach PETs
difficult to estimate what value would be as a set of tools, each with unique problem-
(to prevent
unlocked with widespread uptake of PETs, solving capabilities. In the future, PETs may
over‑exploitation). whether in economic terms or in social benefit. operate more like complementary pieces of
The market value of PETs may also depend machinery which, when combined with other
on trends in use cases, whether PETs are technological, legal and physical mechanisms,
employed as security tools or for increased will amount to automated data governance
collaborative learning and analysis. systems. These systems could help to enact
an organisation’s data policy and facilitate
Second, this report has not explored the full responsible information flows at unprecedented
range of potential follow-on effects of PETs scales. This next level of PETs abstraction will
adoption. These include potential harms require collaboration between PETs developers
which may stem from greater monitoring and and leading organisations to develop and test
surveillance on the part of governments and use cases.
private sector actors, leading to enhanced
profiling and resulting in increased distrust of PETs can play an important role in a privacy
public services and loss of privacy in online by design approach to data governance
spaces (such as through highly targeted when considered carefully, informed by
advertisement). In some cases, PETs are already appropriate guidance and assurances. Given
being used to facilitate business-as-usual in the rapid development of these technologies,
online advertising288, easing companies’ access it is a critical time to consider how PETs will
to, and use of, customer data to the usual ends. be used and governed for the promotion of
human flourishing.
APPENDIX 1:
Definitions
Differential privacy: security definition which Privacy budget (also differential privacy
means that, when a statistic is released, it budget, or epsilon): a quantitative measure
should not give much more information about of the change in confidence of an individual
a particular individual than if that individual had having a given attribute.
not been included in the dataset. See also
Privacy-preserving synthetic data (PPSD):
privacy budget.
synthetic data generated from real-world
Distributed Ledger Technology (DLT): data to a degree of privacy that is deemed
an open, distributed database that can record acceptable for a given application.
transactions between several parties efficiently
Private Set Intersection (PSI): secure multiparty
and in a verifiable and permanent way. DLTs are
computation protocol where two parties
not considered PETs, though they can be used
compare datasets without revealing them in
(as some PETs) to promote transparency by
an unencrypted form. At the conclusion of the
documenting data provenance.
computation, each party knows which items
Epsilon (Ɛ
Ɛ): see privacy budget. they have in common with the other. There are
some scalable open-source implementations of
Fully homomorphic encryption (FHE): a type
PSI available.
of encryption scheme which allows for any
polynomial function to be computed on Secure multi-party computation (SMPC
encrypted data, which means both additions or MPC): a subfield of cryptography concerned
and multiplications. with enabling private distributed computations.
MPC protocols allow computation or analysis
Homomorphic encryption (HE): a property that
on combined data without the different parties
some encryption schemes have, so that it is
revealing their own private inputs to the
possible to compute on encrypted data without
computation.
deciphering it.
Somewhat Homomorphic Encryption (SHE):
Metadata: data that describes or provides
a type of encryption scheme which supports a
information about other data, such as time and
limited number of computations (both additions
location of a message (rather than the content
and multiplications) on encrypted data.
of the message).
Synthetic data: data that is modelled to
Mosaic effect: the potential for individuals
represent the statistical properties of original
of groups to be re-identified through using
data; new data values are created which, taken
datasets in combination, even though each
as a whole, reproduce the statistical properties
dataset has been made individually safe.
of the ‘real’ dataset.
Noise: noise refers to a random alteration of
Trusted Execution Environment (TEE):
data/values in a dataset so that the true data
secure area of a processor that allows code
points (such as personal identifiers) are not as
and data to be isolated and protected from
easy to identify.
the rest of the system such that it cannot
be accessed or modified even by the
operating system or admin users. Trusted
execution environments are also known as
secure enclaves.
APPENDIX 2:
Acknowledgements
Working Group members
The members of the Working Group involved in this report are listed below. Members acted in
an individual and not a representative capacity and declared any potential conflicts of interest.
Members contributed to the project on the basis of their own expertise and good judgement.
Chair
Professor Alison Noble FRS FREng OBE, Technikos Professor of Biomedical Engineering and
Department of Engineering Science, University of Oxford
Members
Professor Jon Crowcroft FRS FREng, Marconi Professor of Communications Systems in the
Computer Lab, University of Cambridge; Alan Turing Institute
George Balston, Co-Director, Defence and Security, Alan Turing Institute
Professor Sir Anthony Finkelstein CBE FREng, President, City University London
Professor Carsten Maple, Professor of Cyber Systems Engineering, University of Warwick Cyber
Security Centre
Dr Suzanne Weller, Head of Research, Privitar
Helena Gellersen, Patricia Jimenez, Louise Parkes, UKRI work placement (various periods)
Reviewers
This report has been reviewed by expert readers and by an independent Panel of experts, before
being approved by Officers of the Royal Society. The Review Panel members were not asked to
endorse the conclusions or recommendations of the report, but to act as independent referees of
its technical content and presentation. Panel members acted in a personal and not a representative
capacity. The Royal Society gratefully acknowledges the contribution of the reviewers.
Reviewers
Dr Clifford Cocks CB FRS
Alex van Someren FREng, Chief Scientific Adviser for National Security, UK Government
Event participants
The Royal Society would like to thank all those who contributed to the development of this project,
in particular through participation in the following events.
PETs Contact Group Session One: Evidence and advice needs (21 April 2021)
15 participants from UK government, regulators and civil society.
PETs Contact Group Session Two: Use cases and outputs development (18 October 2021)
13 participants from UK government, regulators and civil society.
• London Economics and the Open Data Institute. 2022 Privacy Enhancing Technologies: Market
readiness, enabling and limiting factors. The Royal Society. See https://royalsociety.org/topics-
policy/projects/privacy-enhancing-technologies/ This project was partly funded by a grant from
the Centre for Data Ethics and Innovation.
Market research London Economics / Open Data Institute PETs market research
The Royal Society worked with London Economics and the Open Data Institute on exploratory
research into the state of PETs adoption, barriers and incentives within key UK public sector data
institutions. A sample of seven public sector organisations were interviewed by invitation, chosen to
represent a cross-section of criteria including function relevant to data (eg storage, processing) and
type of data used.
DataLoch
National Archives
9 781782 526278
ISBN: 978-1-78252-627-8
Issued: January 2023 DES7924