AI Privacy Risks AI
AI Privacy Risks AI
SUPPORT POOL
OF EXPERTS PROGRAMME
By Isabel BARBERÁ
1
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
As part of the SPE programme, the EDPB may commission contractors to provide reports and tools on
specific topics.
The views expressed in the deliverables are those of their authors and they do not necessarily reflect
the official position of the EDPB. The EDPB does not guarantee the accuracy of the information included
in the deliverables. Neither the EDPB nor any person acting on the EDPB’s behalf may be held
responsible for any use that may be made of the information contained in the deliverables.
Some excerpts may be redacted or removed from the deliverables as their publication would undermine
the protection of legitimate interests, including, inter alia, the privacy and integrity of an individual
regarding the protection of personal data in accordance with Regulation (EU) 2018/1725 and/or the
commercial interests of a natural or legal person.
2
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
TABLE OF CONTENTS
1 How To Use This Document .............................................................................................................. 5
Structure and Content Overview ........................................................................................................5
Guidance for Readers ..........................................................................................................................6
2 Background ........................................................................................................................................ 6
What Are Large Language Models?.....................................................................................................6
How Do Large Language Models Work? .............................................................................................7
Emerging LLM Technologies: The Rise of Agentic AI ........................................................................ 13
Common Uses of LLM Systems ........................................................................................................ 16
Performance Measures for LLMs ..................................................................................................... 19
3 Data Flow and Associated Privacy Risks in LLM Systems ................................................................ 25
The Importance of the AI Lifecycle in Privacy Risk Management .................................................... 25
Data Flow and Privacy Risks per LLM Service Model ....................................................................... 27
Roles in LLMs Service Models According to the AI Act and the GDPR ............................................. 44
4 Data Protection and Privacy Risk Assessment: Risk Identification .................................................. 49
Criteria to Consider when Identifying Risks ..................................................................................... 49
Examples of Privacy Risks in LLM Systems ....................................................................................... 52
5 Data Protection and Privacy Risk Assessment: Risk Estimation & Evaluation ................................ 58
From Risk Identification to Risk Evaluation ...................................................................................... 58
Criteria to Establish the Probability of Risks in LLM Systems ........................................................... 59
Criteria to Establish the Severity of Risks in LLM Systems ............................................................... 61
Risk Evaluation: Classification of Risks ............................................................................................. 66
6 Data Protection and Privacy Risk Control ........................................................................................ 67
Risk Treatment Criteria..................................................................................................................... 67
Example of Mitigation Measures Related to Risks of LLM Systems ................................................. 68
7 Residual Risk Evaluation .................................................................................................................. 77
Identify, Analyze and Evaluate Residual Risk ................................................................................... 77
8 Review & Monitor ........................................................................................................................... 78
Risk Management Process Review ................................................................................................... 78
Continuous Monitoring .................................................................................................................... 78
9 Examples of LLM Systems’ Risk Assessments.................................................................................. 83
First Use Case: A Virtual Assistant (Chatbot) for Customer Queries ................................................ 83
Second Use Case: LLM System for Monitoring and Supporting Student Progress .......................... 95
Third Use Case: AI Assistant for Travel and Schedule Management ............................................... 99
10 Reference to Tools, Methodologies, Benchmarks and Guidance ................................................. 102
Evaluation Metrics for LLMs ........................................................................................................... 102
Other Tools and Guidance .............................................................................................................. 104
3
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Disclaimer by the Author: The examples and references to companies included in this report are provided for illustrative purposes only and
do not imply endorsement or suggest that they represent the sole or best options available. While this report strives to provide thorough and
insightful information, it is not exhaustive. The technology analysis reflects the state of the art as of March 2025 and is based on extensive
research, referenced sources, and the author's expertise. For transparency reasons, the author wants to inform the reader that a LLM system
has been used for the exclusive purpose of improving the readability and formatting of parts of the text.
4
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
2. Background
This section introduces Large Language Models, how they work, and their common applications. It also
discusses performance evaluation measures, helping readers understand the foundational aspects of
LLM systems.
5. Data Protection and Privacy Risk Assessment: Risk Estimation & Evaluation
Guidance on how to analyse, classify and assess privacy risks is provided here, with criteria for evaluating
both the probability and severity of risks. This section explains how to derive a final risk evaluation to
prioritize mitigation efforts effectively.
5
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
2 BACKGROUND
What Are Large Language Models?
Large Language Models (LLMs) represent a transformative advancement in artificial intelligence. These
general purpose models are trained on extensive datasets, which often encompass publicly available
content, proprietary datasets, and specialized domain-specific data. Their applications are diverse,
ranging from text generation and summarization to coding assistance, sentiment analysis, and more.
Some LLMs are multimodal LLMs, capable of processing and generating multiple data modalities such
as image, audio or video.
The development of LLMs has been marked by key technological milestones that have shaped their
evolution. Early advancements in the 1960s and 1970s included rule-based systems like ELIZA, which
laid foundational principles for simulating human conversation through predefined patterns. In 2017,
the introduction of transformer architectures (see Figure 2) in the seminal paper "Attention Is All You
Need"1 revolutionized the field by enabling efficient handling of contextual relationships within text
sequences. Subsequent developments, such as OpenAI’s GPT series and Google’s BERT (see Figure 3),
have set benchmarks for natural language processing (NLP)2, culminating in models like GPT-4, LaMDA3,
and DeepSeek-V34 (see Figure 4) integrating multimodal capabilities.
6
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
1. Dataset Collection:
The foundation of LLM training lies in the use of extensive datasets (such as such as Common Crawl and
Wikipedia) that are carefully curated to ensure they are relevant, diverse, and high-quality. Filtering
eliminates low-quality or redundant content, aligning the training data with the intended goals of the
model.
2. Data Pre-processing:
Text is cleaned and normalized by removing inconsistencies (e.g., special characters) and
irrelevant content, ensuring uniformity in the training data.
Text data is broken into smaller units called tokens, which can be words, subwords, or even
individual characters. Tokenization algorithms transforms unstructured text into manageable
sequences for computational processing.
Tokens are converted into numerical IDs that represent their vocabulary position. These IDs are
then transformed into word embeddings9—dense vector representations that capture semantic
similarities and relationships between words. For instance, semantically related words like
“king” and “queen” will occupy nearby positions in the embedding space.
5
Wikipedia, ‘Deep Learning Architecture’ (2025) https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)
6
Artificial Intelligence, ‘Why does the transformer do better than RNN and LSTM in long-range context dependencies?’ (2020)
https://ai.stackexchange.com/questions/20075/why-does-the-transformer-do-better-than-rnn-and-lstm-in-long-range-context-depen
7
A.Gu, T.Dao, ‘Mamba: Linear-Time Sequence Modeling with Selective State Spaces’ (2024) https://arxiv.org/pdf/2312.00752, B.Peng et al,
‘RWKV: Reinventing RNNs for the Transformer Era’ (2023) https://arxiv.org/pdf/2305.13048
8 Y.Liu et al., ‘Understanding LLMs: A Comprehensive Overview from Training to Inference’ (2024) https://arxiv.org/pdf/2401.02038v2
9
V.Zhukov, ’A Guide to Understanding Word Embeddings in Natural Language Processing (NLP)’ (2023) https://ingestai.io/blog/word-
embeddings-in-nlp
7
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
3. Transformer Architecture:10
Transformer architectures can be categorized into three main types: encoder-only, encoder-decoder,
and decoder-only. While encoder-only architectures were foundational in earlier models, they are
generally not used in the latest generation of LLMs. Most state of the art LLMs today use decoder-only
architectures, while encoder-decoder models are still used in tasks like translation and instruction
tuning.
Encoder:11
The encoder takes the input text and converts it into a contextualized representation by
analyzing relationships between words. Key elements include:
o Token embeddings: Tokens are transformed into numerical vectors that capture their
meaning.
o Positional encodings: Since the transformer processes words in parallel, positional
encodings are added to token embeddings to represent the order of words, preserving
the structure of the input.
o Attention mechanisms: The encoder evaluates the importance of each word relative to
others in the input sequence, capturing dependencies and context. For example, it
helps distinguish between “park” as a verb and “park” as a location based on the
surrounding text.
o Feed-Forward Network: A series of transformations are applied to refine the
contextualized word representations, preparing them for subsequent stages.
Decoder:12
The decoder generates text by predicting one token at a time. It builds upon the encoder’s
output (if used) and the sequence of tokens already generated. Key elements include:
o Input: Combines encoder outputs with tokens generated so far.
10 See footnote 1
11 Geeksforgeels, ‘Architecture and Working of Transformers in Deep Learning’ (2025) https://www.geeksforgeeks.org/architecture-and-
working-of-transformers-in-deep-learning/
12
idem
8
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Figure 3. A comparison of the architectures for the Transformer, GPT and BERT.
Source: B.Smith ‘A Complete Guide to BERT with Code’ (2024)
https://towardsdatascience.com/a-complete-guide-to-bert-with-code-9f87602e4a11
13
The architecture of DeepSeek models contains an innovative attention mechanism called Multi-head Latent Attention (MLA) that compresses
Key/Value vectors offering better compute and memory efficiency.
14
DeepSeek models employ the DeepSeekMoE architecture based on Mixture-of-Experts (MoE) introducing multiple parallel expert networks
(FFNs) instead of a single FFN.
9
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Mixture of Experts (MoE) is a technique used to improve transformer-based LLMs making them more
efficient and scalable. Instead of using the entire model for every input, MoE activates only a few smaller
parts of the model—called "experts"—based on what the input needs. This means the model can be
much larger overall, but only the necessary parts are used at any time, saving computing power without
losing performance.
Figure 4. Illustration of DeepSeek-V3’s basic architecture called DeepSeekMoE based on Mixture-of-Experts (MoE).
Source: ‘DeepSeek-V3 Technical Report’
https://arxiv.org/pdf/2412.19437
15
‘PyTorch Loss.backward() and Optimizer.step(): A Deep Dive for Machine Learning’ (2025) https://iifx.dev/en/articles/315715245
10
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
This method uses human feedback to train a reward model (RM), which helps guide the AI during its
learning process. The reward model acts as a scorekeeper, showing the AI how well it's performing
based on the feedback. Techniques like Proximal Policy Optimization (PPO) are then used to fine-tune
the language model. In simple terms, the language model learns to make better decisions based on the
reward signals it receives. Direct Preference Optimization (DPO)20 is an emerging reinforcement learning
approach that simplifies this process by directly incorporating user preference data into the model's
optimization process.
While RLHF aims to align the model with human preferences across diverse scenarios using human
feedback, another variation of the PPO technique called Group Relative Policy Optimization (GRPO)21
introduced by DeepSeek researchers, takes a different approach. Instead of relying on human
annotations, GRPO uses computer-generated scores to guide the model’s learning process and
reasoning capabilities in an automated manner.
Parameter-Efficient Fine-Tuning (PEFT):22 This technique adapts pre-trained models to new
tasks by training only some of the model's parameters, leaving the majority of the pre-trained
model unchanged. Some PEFT techniques are adapters, LoRA, QLoRA and prompt-tuning.
Retrieval-Augmented Generation (RAG):23,24,25 This method enhances LLMs by integrating
information retrieval capabilities, enabling them to reference specific documents. This
approach allows LLMs to incorporate domain-specific or updated information when responding
to user queries.
Transfer Learning:26,27 With this technique, knowledge learned from a task is re-used in another
model.
16
C.R. Wolfe, ‘Understanding and Using Supervised Fine-Tuning (SFT) for Language Models ‘ (2023)
https://cameronrwolfe.substack.com/p/understanding-and-using-supervised
17
Bergmann, D. ‘What IS fine-tuning?’ (2024) https://www.ibm.com/think/topics/fine-tuning
18
D.Bergman, ‘What is instruction tuning?’ (2024) https://www.ibm.com/think/topics/instruction-tuning
19 S. Chaudhari et al. ‘RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs’ (2024)
https://arxiv.org/abs/2404.08555
20
R. Rafailov, ’ Direct Preference Optimization: Your Language Model is Secretly a Reward Model’ (2024) https://arxiv.org/abs/2305.18290
21
Z.Shao, ‘DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models’ (2024)https://arxiv.org/abs/2402.03300
22
Stryker, C. et al., ‘What is parameter-efficient fine-tuning (PEFT)?’ (2024) https://www.ibm.com/think/topics/parameter-efficient-fine-
tuning
23
AWS, ‘What is RAG (Retrieval-Augmented Generation)’? (2025) https://aws.amazon.com/what-is/retrieval-augmented-generation/
24
Wikipedia, ‘Retrieval Augmented Generation’ (2025) https://en.wikipedia.org/wiki/Retrieval-augmented_generation
25 IBM, ‘Retrieval Augmented Generation’ (2025) https://www.ibm.com/architectures/hybrid/genai-rag?mhsrc=ibmsearch_a&mhq=RAG
26 V.Chaba, ‘Understanding the Differences: Fine-Tuning vs. Transfer Learning ‘ (2023) https://dev.to/luxacademy/understanding-the-
differences-fine-tuning-vs-transfer-learning-370
27
Wikipedia, ‘Transfer Learning’ (2025) https://en.wikipedia.org/wiki/Transfer_learning
11
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Feedback loops:28 Real-world user feedback helps refine the model’s behavior, allowing it to
adapt to new contexts or correct inaccuracies. Feedback can be collected through user
behaviour, for instance inferring whether the user engages with or ignores a response.
Feedback can also be collected when users directly provide feedback on the model's output,
such as a thumbs-up/thumbs-down rating, qualitative comments, or error corrections. The LLM
is then refined based on this feedback.
The three key stages described outline how a traditional text-only LLM is developed. Multimodal LLMs
follow a similar process but to handle multiple data modalities, they incorporate specialized
components such as modality-specific encoders, connectors and cross-modal fusion mechanisms to
integrate the different data representations, along with a shared decoder to generate coherent outputs
across modalities. Their development also involves pre-training and fine-tuning stages; however, some
architectures build multimodal LLMs by fine-tuning an already pre-trained text-only LLM rather than
training one from scratch.
28
Nebuly AI, ‘LLM Feedback Loop’ (2024) https://www.nebuly.com/blog/llm-feedback-loop
29
Wikipedia, ‘Softmax Function’ (2025) https://en.wikipedia.org/wiki/Softmax_function
12
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
In practice, LLMs are often part of a system and can be accessed directly via APIs, are embedded within
SaaS platforms, deployed as off-the-shelf foundational models fine-tuned for specific use cases, or
integrated into on-premise solutions. It is important to note that while LLMs are essential components
of AI systems, they do not constitute AI systems on their own. For an LLM to become part of an AI
system, additional components such as a user interface, must be integrated to enable it to function as
a complete system30. Throughout this document, we will refer to such complete systems as LLM-based
systems or simply LLM systems to emphasize their broader context and functionality. This distinction is
crucial when assessing the risks associated with these systems, as an LLM system inherently carries more
risks due to its additional components and integrations compared to a standalone LLM.
Each stage of an LLM’s development lifecycle could introduce potential privacy risks, as the model
interacts with large datasets that might contain personal data and it generates outputs based on that
data. Some of the key privacy concerns may occur during:
The collection of data: The training, testing and validation set could contain identifiable
personal data, sensitive data or special category of data.
Inference: Generated outputs could inadvertently reveal private information or contain
misinformation.
RAG process: We might use knowledge bases containing sensitive data or identifiable personal
data without implementing proper safeguards.
Feedback loops: User interactions might be stored without adequate safeguards.
30
Recital 97 AI Act
31
J.Loucks ‘Autonomous generative AI agents: Under development’ (2024)
https://www2.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2025/autonomous-generative-
ai-agents-still-under-development.html
32 C. Gadelho, ‘Building AI and LLM Agents from the Ground Up: A Step-by-Step Guide’ (2024) https://www.tensorops.ai/post/building-ai-and-
llm-agents-from-the-ground-up-a-step-by-step-guide
33
OpenAI's Operator (2025) https://openai.com/index/introducing-operator/
34
Anthropic, ‘Building effective agents’ (2024) https://www.anthropic.com/research/building-effective-agents
13
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Agents, in contrast, are designed to function dynamically. They allow LLMs to autonomously
direct their processes and determine how to use tools and resources to achieve objectives.
1. Perception module
This module handles the agent’s ability to process inputs from the environment and format them into a
structure that the LLM can understand. It converts raw inputs (e.g., text, voice, or data streams) into
embeddings or structured formats that can be processed by the reasoning module.
2. Reasoning module
The reasoning module enables the agent to interpret input data, analyze its context, and decompose
complex tasks into smaller, manageable subtasks. It leverages the LLM’s ability to understand and
process natural language to make decisions. The reasoning mechanism enables the agent to analyze
user inputs to determine the best course of action and leverage the appropriate tool or resource to
achieve the desired outcome.
3. Planning module
The planning module determines how the agent will execute the subtasks identified by the reasoning
module. It organizes and sequences actions to achieve a defined goal.
5. Action module
This module is responsible for executing the plan and interacting with the external environment. It
carries out the tasks identified and planned by earlier modules. The agent must have access to a defined
set of tools, such as APIs, databases, or external systems, which it can use to accomplish the specific
tasks. For example, an AI assistant might use a calendar API for scheduling or a booking service for travel
reservations.
35
idem
14
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Figure 6. Source: Z.Deng et al. AI ‘Agents Under Threat: A Survey of Key Security Challenges and Future Pathways’ (2024)
https://www.researchgate.net/figure/General-workflow-of-AI-agent-Typically-an-AI-agent-consists-of-three-
components_fig1_381190070
36
Cabalar, R., ‘What are small language models?’ (2024) https://www.ibm.com/think/topics/small-language-models
37 D. Biswas, ICAART, ‘Stateful Monitoring and Responsible Deployment of AI Agents’, (2025)
38 Windland, V. et al. ’What is LLM orchestration’ (2024) https://www.ibm.com/think/topics/llm-orchestration
39
D. Vellante et al., ‘From LLMs to SLMs to SAMs, how agents are redefining AI’ (2024) https://siliconangle.com/2024/09/28/llms-slms-sams-
agents-redefining-ai
15
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
determines the most appropriate model—LLM or SLM—for a given task, routes inputs accordingly, and
combines their outputs into a unified response.
Privacy Concerns40
The growing adoption of AI agents powered by LLMs, brings the promise of revolutionizing the way
humans work by automating tasks and improving productivity. However, these systems also introduce
significant privacy risks that need to be carefully managed:
To perform their tasks effectively, AI agents often require access to a wide range of user data,
such as:
o Internet activity: Browsing history, online searches, and frequently visited websites.
o Personal applications: Emails, calendars, and messaging apps for scheduling or
communication tasks.
o Third-party systems: Financial accounts, customer management platforms, or other
organizational systems.
This level of access significantly increases the risk of unauthorized data exposure, particularly if the
agent's systems are compromised.
AI agents are designed to make decisions autonomously, which can lead to errors or choices
that users may disagree with.
Like other AI systems, AI agents are susceptible to biases originating from their training data,
algorithms and usage context.
Privacy trade-offs for user convenience:41 As AI agents grow more capable, users will need to consider
how much personal data they are willing to share in exchange for convenience. For example, an agent
might save time by managing travel bookings or negotiating purchases but requires access to sensitive
information such as payment details or login credentials42. Balancing these trade-offs requires clear
communication about data usage policies and robust consent mechanisms.
Accountability for Agent decisions:43 AI agents operate in complex environments and may encounter
unforeseen challenges. When an agent makes an error, or its actions cause harm, determining
accountability can be difficult. Organizations must ensure transparency in how decisions are made and
provide mechanisms for users to intervene when errors occur.
40
B.O’Neill, ‘What is an AI agent? A computer scientist explains the next wave of artificial intelligence tools’ (2024)
https://theconversation.com/what-is-an-ai-agent-a-computer-scientist-explains-the-next-wave-of-artificial-intelligence-tools-242586
41
Z.Zhang et al. ‘"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational
Agents’ (2024) https://arxiv.org/abs/2309.11653
42 Login credentials are the unique information used to access systems, accounts, or services, typically consisting of a username and password,
but they can also include additional methods like two-factor authentication, biometric data, or security PINs for added protection.
43
J. Zeiser, ‘Owning Decisions: AI Decision-Support and the Attributability-Gap’ (2024). https://doi.org/10.1007/s11948-024-00485-1
16
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Several European companies and collaborations are contributing to the LLM landscape:
Mistral AI,47a Paris-based startup established in 2023 by former Google DeepMind and Meta AI
scientists offers both open source and proprietary AI models.
Aleph Alpha48is based in Heidelberg, Germany, and it specializes in developing LLMs designed to
provide transparency regarding the sources used for generating results. Their models are intended
for use by enterprises and governmental agencies, trained in multiple European languages.
Silo AI's Poro49, through its generative AI arm SiloGen, has developed Poro, a family of multilingual
open source LLMs. This initiative aims to strengthen European digital sovereignty and democratize
access to LLMs for all European languages.
TrustLLM50is a coordinated project by Linköping University that focuses on developing trustworthy
and factual LLM technology for Europe, emphasizing accessibility and reliability.
OpenEuroLLM51 is an open source family of performant, multilingual, large language foundation
models for commercial, industrial and public services.
44 ChatGPT (https://chatgpt.com/)
45
Gemini (https://gemini.google.com/)
46
Claude (https://claude.ai/)
47
Mistral (https://mistral.ai/)
48
Aleph Alpha (https://aleph-alpha.com/)
49
Silo AI, ‘Poro - a family of open models that bring European languages to the frontier’ (2023) https://www.silo.ai/blog/poro-a-family-of-
open-models-that-bring-european-languages-to-the-frontier
50 TrustLLM ( https://trustllm.eu/)
51 OpenEuroLLM (https://openeurollm.eu/)
52
Hugging Face, ‘Transformers’ (n.d) https://huggingface.co/docs/transformers/v4.17.0/en/index
53
Deepseek (https://www.deepseek.com/
17
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Deepset's Haystack54 is an open source framework designed to build search systems and question-
answering applications powered by Large Language Models (LLMs) and other natural language
processing (NLP) techniques.
OLMo 32B55 is the first fully open model (all data, code, weights, and details are freely available).
Meta's LLaMA56 models focus on research and practical applications in NLP.
BLOOM57 was developed by BigScience as a multilingual open source model capable of generating
text in over 50 languages, with a focus on accessibility and inclusivity.
BERT58 was created by Google to understand the context of text through bidirectional language
representation, excelling in tasks like question answering and sentiment analysis.
Falcon59 was developed by the Technology Innovation Institute as a high-performance model
optimized for text generation and understanding, with significant efficiency improvements over
similar models.
Qwen60 is a large language model family built by Alibaba Cloud.
LangChain61 is an open source framework for building applications powered by large language
models.
54
Haystack (https://haystack.deepset.ai/
55
Ai2, ‘OLMo 2 32B: First fully open model to outperform GPT 3.5 and GPT 4o mini’ (2025) https://allenai.org/blog/olmo2-32B
56 Llama (https://www.llama.com/ )
57Hugging Face, ‘Introducing The World’s Largest Open Multilingual Language Model: BLOOM’ (2025)
https://bigscience.huggingface.co/blog/bloom
58
Hugging Face, ‘BERT’ (n.d) https://huggingface.co/docs/transformers/model_doc/bert
59
TTI, ‘Introducing the Technology Innovation Institute’s Falcon 3’ (n.d) https://falconllm.tii.ae/
60
Hugging Face, ‘’Qwen’ (n.d) https://huggingface.co/Qwen
61
LangChain, ‘Introduction’ (n.d) https://python.langchain.com/
62
Microsoft, ‘Azure OpenAI Service’ (2025) https://azure.microsoft.com/en-us/services/cognitive-services/openai-service/
63 AWS, ‘Bedrock’ (n.d)https://aws.amazon.com/bedrock
64 Vertex AI Platform, ‘Innovate faster with enterprise-ready AI, enhanced by Gemini models’ (n.d) https://cloud.google.com/vertex-ai
65
IBM, ‘IBM Watson to watsonx’ (n.d) https://www.ibm.com/watson
66
Cohere (https://cohere.com/)
18
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Applications of LLMs
LLMs are employed across various applications67, enhancing both user experience and operational
efficiency. This list represents some of the most prominent applications of LLMs, but it is by no means
exhaustive. The versatility of LLMs continues to unlock new use cases across industries, demonstrating
their transformative potential in various domains.
Chatbots and AI Assistants:68 LLMs power virtual assistants like Siri, Alexa, and Google
Assistant, understand and process natural language, interpret user intent, and generate
responses.
Content generation:69 LLMs assist in creating articles, reports, and marketing materials by
generating human-like text, thereby streamlining content creation processes.
Language translation: 70 Advanced LLMs facilitate real-time translation services.
Sentiment analysis: 71 Businesses use LLMs to analyze customer feedback and social media
content, gaining insights into public sentiment and informing strategic decisions.
Code generation and debugging:72 Developers leverage LLMs to generate code snippets and
identify errors, enhancing software development efficiency.
Educational support tools:73 LLMs play a key role in personalized learning by generating
educational content, explanations, and answering student questions.
Legal document processing:74 LLMs help professionals in the legal field by reviewing and
summarizing legal texts, extracting important information, and offering insights.
Customer support:75 Automating responses to customer inquiries and escalating complex cases
to human agents.
Autonomous vehicles:76 Driving cars with real-time decision-making capabilities.
67 N. Sashidharan, ‘Three Pillars of LLM: Architecture, Use Cases, and Examples ‘ (2024) https://www.extentia.com/post/pillars-of-llm-
architecture-use-cases-and-examples
68
Google Assistant (https://assistant.google.com/)
69
Jasper AI (https://www.jasper.ai/)
70
Deepl (https://www.deepl.com/en/translator )
71
SurveySparrow (https://surveysparrow.com/features/cognivue/)
72
GitHub Copilot (https://github.com/features/copilot)
73 Khanmigo (https://www.khanmigo.ai/ )
74 Luminance (https://www.luminance.com/)
75
Salesforce (https://www.salesforce.com/eu/)
76
Tesla Autopilot ( https://www.tesla.com/autopilot)
19
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Some of the most common LLM performance evaluation criteria are Answer Relevancy, Correctness,
Semantic Similarity, Fluency, Hallucination, Factual Consistency, Contextual Relevancy, Toxicity, Bias
and Task-Specific Metrics.
The following metrics77 are commonly used, each offering different insights:
Accuracy78measures how often an output aligns with the correct or expected results. In tasks like
text classification or question answering, accuracy is calculated as the ratio of correct predictions to
the total number of predictions. However, for generative tasks such as text generation, traditional
accuracy metrics may not fully capture performance due to the open-ended nature of possible
correct responses. In such cases, metrics like BLEU (Bilingual Evaluation Understudy) and ROUGE
(Recall-Oriented Understudy for Gisting Evaluation) are employed to assess the quality of generated
text by comparing it to reference texts.
Precision quantifies the ratio of correctly predicted positive outcomes to the total number of
positive predictions made by the model. In the context of LLMs, a high precision score indicates the
model is accurate when making predictions. However, it does not account for relevant instances the
model fails to predict (false negatives), so it is commonly combined with recall for a more
comprehensive evaluation.
Recall, also referred to as sensitivity or the true positive rate, measures the proportion of actual
positive instances that the model successfully identifies. A high recall score reflects the model’s
effectiveness in capturing relevant information but does not address irrelevant predictions (false
positives). For this reason, recall is typically evaluated alongside precision to provide a balanced
view.
F1 Score offers a balanced metric by combining precision and recall into their harmonic mean. A
high F1 score indicates that the model achieves a strong balance between precision and recall,
making it a valuable metric when both false positives and false negatives are critical. The F1 score
ranges from 0 to 1, with 1 representing perfect performance on both metrics.
Specificity79 measures the proportion of true negatives correctly identified by a model.
AUC (Area Under the Curve) and AUROC80(Area Under the Receiver Operating Characteristic Curve)
quantify a model's ability to distinguish between classes. It evaluates the trade-off between
sensitivity (true positive rate) and 1-specificity (false positive rate) across various thresholds. A
higher AUC value indicates better performance in classification tasks.
AUPRC81(Area Under the Precision-Recall Curve), measures a model's performance in imbalanced
datasets, focusing on the trade-off between precision and recall. A high AUPRC indicates that the
model performs well in identifying positive instances, even when they are rare.
Cross Entropy82is a measure of uncertainty or randomness in a system's predictions. It measures
the difference between two probability distributions: the true labels (actual data distribution) and
the predicted probabilities from the model (output). Lower entropy means higher confidence in
predictions, while higher entropy indicates uncertainty.
77
A. Chaudhary, ‘Understanding LLM Evaluation and Benchmarks: A Complete Guide’ (2024)
https://www.turing.com/resources/understanding-llm-evaluation-and-benchmarks
78
S. Karzhev, ‘LLM Evaluation: Metrics, Methodologies, Best Practices’ (2024) https://www.datacamp.com/blog/llm-evaluation
79
Wikipedia, ‘Sensitivity and Specificity’ (2025) https://en.wikipedia.org/wiki/Sensitivity_and_specificity
80
E.Becker and S.Soatto, ‘Cycles of Thought: Measuring LLM Confidence through Stable Explanations’ (2024)
https://arxiv.org/pdf/2406.03441v1
81 J. Czakon, ‘F1 Score vs ROC AUC vs Accuracy vs PR AUC: Which Evaluation Metric Should You Choose?’ (2024) https://neptune.ai/blog/f1-
score-accuracy-roc-auc-pr-auc
82
C.Xu, ‘ Understanding the Role of Cross-Entropy Loss in Fairly Evaluating Large Language Model-based Recommendation’ (2024)
https://arxiv.org/pdf/2402.06216v2
20
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Perplexity83derives from cross entropy and evaluates how well a language model predicts a sample,
serving as an indicator of its ability to handle uncertainty. A lower perplexity score means better
performance, indicating that the model is more confident in its predictions. Some studies suggest
that perplexity has proven unreliable84 to evaluate LLMs due to their long-context capabilities. It is
also difficult to use perplexity as a benchmark between models since its scores depend on factors
like tokenization method, dataset, preprocessing steps, vocabulary size, and context length. 85
Calibration86refers to the alignment between a model's predicted probabilities and the actual
probability of those predictions being correct. A well-calibrated model provides confidence scores
that accurately reflect the true probabilities of outcomes. Proper calibration is vital in applications
where understanding the certainty of predictions is important, such as in medical diagnoses or legal
document analysis.
MoverScore87is a modern metric developed to assess the semantic similarity between two texts.
Other metrics used for assessing the performance and usability of LLM-based systems, especially in real-
time or high-demand applications are:88
Completed requests per minute: Measures how many requests the LLM can process and return
responses for in one minute. It reflects the system's efficiency in handling multiple queries.
Time to first token (TTFT): The time taken from when a request is submitted to when the first token
of the response is generated.
Inter-token Latency (ITL): The time delay between generating consecutive tokens in the response.
This metric evaluates the speed and fluidity of text generation.
End to end Latency /ETEL): The total time taken from when a request is made to when the entire
response is completed. It encompasses all processing stages, including input handling, model
inference, and output generation.
In addition to these metrics, there are comprehensive evaluation frameworks or benchmarks89 such as
GLUE (General Language Understanding Evaluation)90, MMLU (Massive Multitask Language
Understanding)91, HELM (Holistic Evaluation of Language Models)92, DeepEval93 or OpenAI Evals94.
document).
90
Gluebenchmark (https://gluebenchmark.com/)
91
Papers with code, ‘MMLU (Massive Multitask Language Understanding)’ (n.d) https://paperswithcode.com/dataset/mmlu
92
Center for Research on Foundation Models, ‘A reproducible and transparent framework for evaluating foundation models’ (n.d)
https://crfm.stanford.edu/helm/
93
GitHub, ‘The LLM Evaluation framework’ (n.d) https://github.com/confident-ai/deepeval
94
GitHub, ‘Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks’ (n.d)
https://github.com/openai/evals
95 Wikipedia, ‘BLEU’ (2025) https://en.wikipedia.org/wiki/BLEU
96
Wikipedia, ‘Rouge’ (2025) https://en.wikipedia.org/wiki/ROUGE_(metric)
97
GitHub, ‘BLEURT is a metric for Natural Language Generation based on transfer learning’ (n.d) https://github.com/google-research/bleurt
21
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Representations from Transformers) are widely used for evaluating text generation, summarization, and
translation.
It is important to recognize that quantitative metrics alone are not sufficient. While these metrics are
highly valuable in identifying risks, especially when integrated into automated evaluation pipelines, they
primarily serve as early warning signals, prompting further investigation when thresholds are exceeded.
Many critical risks, including misuse potential, ethical concerns, and long-term impact, cannot be
effectively captured through those numerical measurements alone.
To ensure a more holistic evaluation, organizations should complement quantitative indicators with
expert judgment, scenario-based testing, and qualitative assessments.
Open source frameworks like Inspect98, support an integrated approach by enabling model-graded
evaluations, prompt engineering, session tracking, and extensible scoring techniques. These tools help
operationalize both metric-based and qualitative evaluations, offering better observability and insight
into LLM behavior in real-world settings.
98
AISI, ‘An open-source framework for large language model evaluations’ (n.d) https://inspect.aisi.org.uk/
99
P. Bhavsar ‘Mastering Agents: Metrics for Evaluating AI Agents’ (2024) https://www.galileo.ai/blog/metrics-for-evaluating-ai-agents
100
https://smythos.com/ai-agents/impact/ai-agent-performance-measurement/
101 AISERA, ‘An Introduction to Agent Evaluation’ (n.d) https://aisera.com/blog/ai-agent-evaluation/
102 Smyth OS, ‘Conversational Agents and Context Awareness: How AI Understands and Adapts to User Needs’ (n.d)
https://smythos.com/artificial-intelligence/conversational-agents/conversational-agents-and-context-awareness/
103
Papers with code, ‘ Dialogue State Tracking’, (n.d) https://paperswithcode.com/task/dialogue-state-tracking/codeless?page=2
22
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Evaluating AI agents with traditional LLM benchmarks presents challenges, as they often fail to capture
real-world dynamics, multi-step reasoning, tool use, and adaptability. Effective assessment requires
new benchmarks that measure long-term planning, interaction with external tools, and real-time
decision-making. Below are some of the most recognized benchmarks currently used:
SWE-bench:105Software Engineering Benchmark dataset, created to systematically evaluate the
capabilities of an LLM in resolving software issues.
AgentBench: 106,107 It is designed for evaluating and training visual foundation agents based on
LMMs.
MLAgentBench:108 To evaluate if agents driven by LLMs perform machine learning experimentation
effectively.
BFCL (Berkeley Function-Calling Leaderboard):109 To evaluate the ability of different LLMs to call
functions (also referred to as tools).
τ-bench:110 A benchmark for tool-agent-user interaction in real-world domains.
Planbench:111 To evaluate LLMs on planning and reasoning.
2. Model limitations
Understanding context:114 Despite advanced architectures, LLMs can struggle with nuanced
contexts or multi-turn conversations where earlier parts of the dialogue must inform later
responses.
104
N. Bekmanis, ‘Artificial Intelligence Conversational Agents: A Measure of Satisfaction in Use’ (2023)
https://essay.utwente.nl/94906/1/Bekmanis_MA_BMS.pdf
105
Swebench (https://www.swebench.com/)
106 Github, ‘A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)’, (n.d) https://github.com/THUDM/AgentBench
107 Papers with code, ’Agentench’ (n.d) https://paperswithcode.com/dataset/agentbench
108
Q.Huang et al. ‘MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation’ (2024)
https://arxiv.org/abs/2310.03302
109
Hugging Face Dataset (https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard)
110
GitHub, ‘Code and Data’ (n.d) https://github.com/sierra-research/tau-bench
111
GitHub, ‘An extensible benchmark for evaluating large language models on planning’ (n.d) https://github.com/karthikv792/LLMs-Planning
112
I. O. Gallegos et al. ‘Bias and Fairness in Large Language Models: A Survey’ (2024)
https://direct.mit.edu/coli/article/50/3/1097/121961/Bias-and-Fairness-in-Large-Language-Models-A
113 https://www.ox.ac.uk/news/2023-11-20-large-language-models-pose-risk-science-false-answers-says-oxford-study
114
J. Browning, ‘Getting it right: the limits of fine-tuning large language models’ (2024) https://link.springer.com/article/10.1007/s10676-024-
09779-1
23
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Handling ambiguities:115 Ambiguous input can lead to incorrect or nonsensical outputs if the
model cannot infer the intended meaning.
6. Limitations in knowledge
Knowledge cutoff:122 LLMs are trained on data up to a specific point in time. They may lack
awareness of recent developments or emerging knowledge.
Factual errors:123LLMs can "hallucinate" information, generating plausible but factually
incorrect responses due to the probabilistic nature of their predictions.
7. Lack of robustness
Adversarial inputs:124LLMs may fail when presented with deliberately manipulated or
adversarial inputs designed to exploit their weaknesses.
Noise and variability:125Spelling errors, slang, or non-standard language can lead to
misinterpretations and lower accuracy.
115
E.Jones and J. Steinhardt, ‘Capturing Failures of Large Language Models via Human Cognitive Biases’ (2022)
https://arxiv.org/abs/2202.12299
116
G.B.Mohan et al. ’ An analysis of large language models: their impact and potential application’ (2024)
https://link.springer.com/article/10.1007/s10115-024-02120-8
117
H.Naveed et al ‘A Comprehensive Overview of Large Language Models’ (2024) https://arxiv.org/abs/2307.06435
118
P.Jindal ‘Evaluating Large Language Models: A Comprehensive Guide’ (2024) https://www.labellerr.com/blog/evaluating-large-language-
models
119 idem
120 J.Browning ‘Getting it right: the limits of fine-tuning large language models’ (2024) https://link.springer.com/article/10.1007/s10676-024-
09779-1
121
H.Naveed et al. ‘A Comprehensive Overview of Large Language Models’ (2024) https://arxiv.org/abs/2307.06435
122
University of Oxford, ‘Large Language Models pose risk to science with false answers, says Oxford study’ (https://www.ox.ac.uk/news/2023-
11-20-large-language-models-pose-risk-science-false-answers-says-oxford-study
123
Ho, D.E., ‘Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive’(2024) https://hai.stanford.edu/news/hallucinating-
law-legal-mistakes-large-language-models-are-pervasive
124 E.Jones and J.Steinhardt, ‘Capturing Failures of Large Language Models via Human Cognitive Biases’ (2022)
https://arxiv.org/abs/2202.12299
125
G.B.Mohan ‘An analysis of large language models: their impact and potential applications’ (2024)
https://link.springer.com/article/10.1007/s10115-024-02120-8
24
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
8. Inadequate calibration
Overconfidence:126 Poorly calibrated models may assign high confidence scores to incorrect
predictions, misleading users. Failing to properly convey uncertainty in predictions can erode
trust in the model.
In this document, we use this AI lifecycle as a reference framework, recognizing that each organization
may have its own adapted version based on its specific needs. While the core stages of the lifecycle are
generally similar across organizations, the exact phases may vary.
Each one of the phases of the lifecycle involves unique privacy risks that require tailored mitigation
strategies. Implementing Privacy by Design into each phase helps to address risks proactively rather
than retroactively fixing them.
126 L.Li et al. ‘Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models’ (2024)
https://arxiv.org/abs/2402.12563
127
ISO/IEC 22989 (Artificial Intelligence – Concepts and Terminology)
128
ISO/IEC 5338:2023 Information technology — Artificial intelligence — AI system life cycle processes
25
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
1. Inception and Design: In this phase, decisions are made regarding data requirements, collection
methods, and processing strategies. The selection of data sources may introduce risks if
sensitive or personal data is included without adequate safeguards.
2. Data Preparation and Preprocessing: Raw data is collected, cleaned, in some cases
anonymized129, and prepared for training or fine-tuning. Datasets are often sourced from
diverse origins, including web-crawled data, public repositories, proprietary data, or datasets
obtained through partnerships and collaborations.
Privacy risks:
o Training data may inadvertently include personal details, confidential
documents, or other sensitive information.
o Inadequate anonymization or handling of identifiable data can lead to breaches
or unintended inferences during later stages.
o Biases present in the datasets can affect the model's predictions, resulting in
unfair or discriminatory outcomes.
o Errors or gaps in training data can adversely impact the model's performance,
reducing its effectiveness and reliability.
o The collection and use of training data may violate privacy rights, lack proper
consent, or infringe on copyrights and other legal obligations.
3. Development, Model Training: Prepared datasets are used to train the model, which involves
large-scale processing. The model may inadvertently memorize sensitive data, leading to
potential privacy violations if such data is exposed in outputs.
4. Verification & Validation:130 The model is evaluated using test datasets, often including real-
world scenarios. Testing data may inadvertently expose sensitive user information, particularly
if real-world datasets are used without anonymization.
5. Deployment: The model interacts with live data inputs from users, often in real-time
applications that could integrate with other systems. Live data streams might include highly
sensitive information, requiring strict controls on collection, transmission, and storage.
6. Operation and Monitoring: Continuous data flows into the system for monitoring, feedback,
and performance optimization. Logs from monitoring systems may retain personal data such as
user interactions, creating risks of data leaks or misuse.
7. Re-evaluation, Maintenance and Updates: Additional data may be collected for retraining or
updating the model to improve accuracy or address new requirements. Using live user data for
updates without proper consent or safeguards can violate privacy principles.
8. Retirement: Data associated with the model and its operations is archived or deleted. Failure
to properly erase personal data during decommissioning can lead to long-term privacy
vulnerabilities.
Throughout the AI system lifecycle, it is important to consider how different types of personal data may
be involved at each phase. Depending on the stage, personal data can be collected, processed, exposed,
129
Important to consider the EDPB opinion 28/2024 and section 3.2 On the circumstances under which AI models could be considered
anonymous and the related demonstration: ‘…, the EDPB considers that, for an AI model to be considered anonymous, using reasonable means,
both (i) the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to
train the model; as well as (ii) the likelihood of obtaining, intentionally or not, such personal data from queries, should be insignificant for any
data subject.’
130
Testing, Evaluation, Validation, and Verification (TEVV) is an ongoing process that occurs throughout the AI lifecycle to ensure that a system
meets its intended requirements, performs reliably, and aligns with safety and compliance standards.
26
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
or transformed in different ways. Recognizing this variability is essential for implementing effective
privacy and data protection measures.
Figure 8. The illustration shows how different types of personal data can arise across various phases of the AI
lifecycle.
131
AI Action Summit, ‘International AI Safety Report on the Safety of Advanced AI’ , p - 167, (2025)
https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf
27
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
2. LLM ‘off-the-shelf’: In this service model the deployer can customize weights and fine tune the model.
This happens sometimes through platforms like Microsoft Azure and AWS where a deployer can select
a model and develop their own solution with it. It is also commonly used with open weight models, such
as LLaMA or BLOOM. While an LLM as a Service typically involves API-based interaction without model
ownership, the LLM ‘off-the-Shelf’ service emphasizes more developer and deployer control. The
distinction lies in this level of control and access provided, for instance, in Hugging Face models can be
downloaded locally.
3. Self-developed LLM: In this model, organizations develop and deploy LLMs on their own
infrastructure, maintaining full control over data and model interaction. While this option may offer
more privacy, this service model requires of significant computational resources and expertise.
Each of the three service models features a distinct data flow. While there are similarities across models,
each phase—from user input to output generation—presents unique risks that can impact user privacy
and data protection. In this section, we will first examine the data flow in an LLM as a Service solution,
followed by an analysis of the key differences in data flow when using an LLM ‘off-the-shelf’ model and
a self-developed LLM system.
*Note that in this section, the terms 'provider'136 and 'deployer'137 are used as defined in the AI Act,
where the provider refers to the entity developing and offering the AI system, and the deployer refers
to the entity implementing and operating the system for end-users.
132
Open AI, ‘The most powerful platform for building AI products, (2025) https://openai.com/api/
133 Microsoft, ‘Azure OpenAI Service’ (2025) https://azure.microsoft.com/en-us/services/cognitive-services/openai-service/
134 Wikipedia, ‘API’ (2025)https://en.wikipedia.org/wiki/API
135
S.Pagezy, ‘Use Hugging Face models with Amazon Bedrock’ (2024) https://huggingface.co/blog/bedrock-marketplace
136
‘provider’ means a natural or legal person, public authority, agency or other body that develops an AI system or a general-purpose AI model
or that has an AI system or a general-purpose AI model developed and places it on the market or puts the AI system into service under its own
name or trademark, whether for payment or free of charge; (Article 3 (3) AI Act)
137
‘deployer’ means a natural or legal person, public authority, agency or other body using an AI system under its authority except where the
AI system is used in the course of a personal non-professional activity; (Article 3 (4) AI Act)
28
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
in the self-developed LLM system example. It is important to note that each use case will have its own
specific data flow depending on its unique requirements and context and the examples provided in this
section are intended to be generic representations.
In an LLM as a Service scenario we could find these general data flow phases:
User input:
The process starts with the user submitting input, such as a query or command. This could be entered
through a web-based interface, mobile application, or other tools provided by the LLM provider.
Provider interface & API:
The input is sent through an interface or application managed by the provider (e.g., a webpage, app or
a chatbot window embedded on a website). This interface ensures the input is formatted appropriately
and securely transmitted to the LLM infrastructure.
LLM processing at providers’ infrastructure:
The API receives the input and routes it to the LLM model hosted on the provider's infrastructure.
The LLM processes the input using its trained parameters (weights) to generate a relevant response.
This may involve steps like tokenization, context understanding, reasoning, and text generation. The
model generates a response.
* Logging: The provider may log the user input (query) along with the generated response to analyze
the interaction and identify system errors or gaps in response quality.
The data could be also included in a training dataset to improve the model’s ability to handle similar
queries in the future. In this case, anonymization and filtering techniques are often applied.
Processed output:
The generated output is returned via the provider's interface to the user. The response is typically in a
format ready for display or integration, such as text, suggestions, or actionable data.
29
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
138 G. Nagli ‘Wiz Research Uncovers Exposed DeepSeek Database Leaking Sensitive Information, Including Chat History’ (2025)
https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepseek-database-leak
139
T.S. Dutta ‘New Jailbreak Techniques Expose DeepSeek LLM Vulnerabilities, Enabling Malicious Exploits’ (2025)
https://cybersecuritynews.com/new-jailbreak-techniques-expose-deepseek-llm-vulnerabilities/
140
S.Schulhoff ‘Prompt Injection vs. Jailbreaking: What's the Difference?’ (2024) https://learnprompting.org/blog/injection_jailbreaking
141
https://www.nightfall.ai/ai-security-101/data-leakage-prevention-dlp-for-llms
142
Some of the tools used are Google Cloud DLP, Microsoft Presidio, OpenAI Moderation API, Hugging Face Fine-Tuned NER Models and spaCy
(links available in section 10)
143 P.A. Grassi et al., (2017) NIST Special Publication 800-63-3 Digital Identity Guidelines
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-63-3.pdf
144
ENISA, ‘Basic security practices regarding passwords
and online identities’ (2014) https://enisa.europa.eu/sites/default/files/all_files/ENISA%20guidelines%20for%20passwords.pdf
30
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
145
Kosinski, M., ‘How to prevent prompt injection attacks’ (2024) https://www.ibm.com/think/insights/prevent-prompt-injection
146
A.Peng et al. ‘Rapid Response: Mitigating LLM Jailbreaks with a Few Examples’ (2024) https://arxiv.org/abs/2411.07494
147
B.Peng et al. ‘Jailbreaking and Mitigation of Vulnerabilities in Large Language Models’ (2024) https://arxiv.org/abs/2410.15236
148
S.Cheng et al. ‘StruQ: Defending Against Prompt Injection with Structured Queries’ (2024) https://arxiv.org/abs/2402.06363
149
Open AI Platform, ‘Safety best practices’ (n.d) https://platform.openai.com/docs/guides/safety-best-practices#constrain-user-input-and-
limit-output-tokens
150 Trust Community, NIST password guidelines 2025: 15 rules to follow’ (2024) https://community.trustcloud.ai/article/nist-password-
guidelines-2025-15-rules-to-follow/
151
Wikipedia, ‘Password Manager’ (2025) https://en.wikipedia.org/wiki/Password_manager
31
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
152
LLM Engine, (n.d) https://llm-engine.scale.com/guides/rate_limits/
153
OWASP, ‘OWASP Top Ten’ (2025) https://owasp.org/www-project-top-ten/
154
The data stored could be sensitive data such as credit card numbers, or special category of data such as health data (article 9 GDPR).
155
Aubert, P. et al., ‘Data Poisoning: a threat to LLM’s Integrity and Security’ (2024) https://www.riskinsight-wavestone.com/en/2024/10/data-
poisoning-a-threat-to-llms-integrity-and-security/
32
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Lack of data retention policies: The provider could store the data indefinitely without
having retention policies in place.
Processed Risks:
Output Inaccurate or sensitive responses: The model may generate outputs that reveal
unintended sensitive information or provide inaccurate or misleading information
(hallucinations)158, leading to harm or misinformation.
Re-identification risks: Outputs could inadvertently reveal information about the user’s
query or context that can be linked back to them.
Output misuse159: Users or third parties may misuse the generated output.
156
OWASP, ‘LLM10:2023 - Training Data Poisoning’ (2023) https://owasp.org/www-project-top-10-for-large-language-model-
applications/Archive/0_1_vulns/Training_Data_Poisoning.html
157
Center for Internet Security, ’ The 18 CIS Critical Security Controls’ (2025) https://www.cisecurity.org/controls/cis-controls-list
158
Wikipedia, ‘Hallucination Artificial Intelligence’ (2025) https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)
159
OWASP, ‘LLM05:2025 Improper Output Handling’ (2025) https://genai.owasp.org/llmrisk/llm052025-improper-output-handling/
33
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
- For critical applications, ensure generated outputs are reviewed by humans before
implementation or dissemination.
- Educate end-users on ethical and appropriate use of outputs, including avoiding
overreliance on the model for critical or high-stakes decisions without verification.
- Securely store outputs and restrict access to authorized personnel or systems only.
However, several key differences and limitations set these models apart:
Roles and responsibilities:
Organizations developing an LLM system using the ‘off-the-shelf’ model may be considered providers 160,
particularly when they intend to place the system on the market for use by others (deployers of their
system and end-users). This introduces an additional layer of responsibility for data handling, security,
and compliance with privacy regulations. The organization may also be developing the AI system for its
own internal use.
Hosting and processing:
In a LLM ‘off-the-shelf’ based system, the provider hosts the model on their infrastructure or a third-
party cloud environment of their choice. This contrasts with the LLM as a Service model, where hosting
and processing are entirely managed by the original model provider. The new provider is now
responsible for all aspects of system integration, maintenance, and security.
160
According to Article 25 of the AI Act, a deployer of a high risk AI system becomes a provider when they substantially modify an existing AI
system, including by fine-tuning or adapting a pre-trained model for new applications. In such cases, the deployer assumes the responsibilities
of a provider under the AI Act.
34
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
A similar approach is cache-augmented generation (CAG) 162 which can reduce latency, lower compute
costs, and ensure consistency in responses across repeated interactions but that is less practical for
large datasets that are often updated.
The figure below illustrates how RAG163 works: the user's query is first enhanced with relevant
information retrieved from an external database, and this enriched input is then sent to the language
model to generate a more accurate and grounded response.
Insecure logging or caching: User queries and retrieved documents may be stored insecurely,
increasing the risk of unauthorized access or data leaks.
161
EDPB, ‘Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models, Adopted
on 17 December 2024, (2024) https://www.edpb.europa.eu/our-work-tools/our-documents/opinion-board-art-64/opinion-282024-certain-
data-protection-aspects_en
162 Sharma, R., ‘Cache RAG: Enhancing speed and efficiency in AI systems’ (2025) https://developer.ibm.com/articles/awb-cache-rag-efficiency-
speed-ai/
163
Theja, R., ‘Evaluate RAG with LlamaIndex’ (2023) https://cookbook.openai.com/examples/evaluation/evaluate_rag_with_llamaindex
35
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Third-party data handling: If the retrieval system uses external APIs or services, user queries
may be sent to third parties, where they can be logged, tracked, or stored without user consent.
Exposure of sensitive data: The model may retrieve personal or confidential information if this
is stored in the knowledge base.
164
Atlan, ‘Data Curation in Machine Learning: Ultimate Guide 2024’ (2023) https://atlan.com/data-curation-in-machine-learning/
36
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Dataset
Model
Collection
Training & Deployment
and
Fine-tuning
Preparation
165
ENISA, ‘Pseudonymisation techniques and best practices. Recommendations on shaping technology according
to data protection and privacy provisions’ (2019)
https://www.enisa.europa.eu/sites/default/files/publications/Guidelines%20on%20shaping%20technology%20according%20to%20GDPR%2
0provisions.pdf
166 Marwala, T., ‘Algorithm Bias — Synthetic Data Should Be Option of Last Resort When Training AI Systems’ (2023)
https://unu.edu/article/algorithm-bias-synthetic-data-should-be-option-last-resort-when-training-ai-systems
167
Van Breugel, B. et al., ‘Synthetic Data, Real Errors: How (Not) to Publish and Use Synthetic Data’ (2023)
https://proceedings.mlr.press/v202/van-breugel23a/van-breugel23a.pdf
168
Vongthongsri, K., ‘Using LLMs for Synthetic Data Generation: The Definitive Guide’ (2025) https://www.confident-ai.com/blog/the-
definitive-guide-to-synthetic-data-generation-using-llms
169
Desfontaines, D., ‘The fundamental trilemma of synthetic data generation’ (n.d) https://www.tmlt.io/resources/fundamental-trilemma-
synthetic-data-generation
37
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
- Regularly audit datasets for bias and sensitive content, removing any problematic
entries.
- Implement robust data validation and monitoring to detect and prevent malicious
or corrupted data. Use trusted data sources, apply automated checks for
anomalies, and cross-validate data from multiple sources.
Fine-Tuning Risks:
Exposure of proprietary or sensitive data: Fine-tuning data may include sensitive or
proprietary information, risking leakage.
Third-party risks: If external platforms are used for fine-tuning, sensitive data may be
exposed to additional risks.
Deployment Risks:
Unauthorized access: Weak access controls could allow unauthorized parties to
interact with the model or access underlying systems.
Unsecure hosting: Hosting the model on an unsecured server or cloud environment
could expose sensitive data.
170
Wikipedia, ‘Private access management’ (2025) https://en.wikipedia.org/wiki/Privileged_access_management
171
Wikipedia, ‘Role-based access control’ (2025) https://en.wikipedia.org/wiki/Role-based_access_control
38
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
User input See previous table on risks in the data flow of LLM as a Service, where this phase is
detailed. The applicable mitigation measures primarily concern providers, but also
extend to deployers in scenarios where developers are deploying and using their self-
developed AI systems.
Provider interface Idem
& API
LLM processing at Idem
Providers’
infrastructure
Processed Output Idem
172
Anthropic, ‘Introducing the Model Context Protocol’ (2024) https://www.anthropic.com/news/model-context-protocol
39
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
- The application processes the query and returns a response, such as a list of available
flights.
- The agent receives and processes the response for integration into the overall
workflow.
Interaction with application 2 (e.g., hotel booking system):
The agent engages with the second external application to complete another part of the task. For
example, it might request hotel options based on the destination and travel dates.
o Actions:
- Data (e.g., travel dates) is transmitted to the application.
- The application provides a response, such as available hotels, which is processed by the
agent.
Aggregation of responses:
The AI agent integrates the responses from both applications to generate a cohesive result. For instance,
it compiles the flight and hotel options into a single output for the user.
o Actions:
- Responses are validated and formatted for clarity and relevance.
- Potential errors or conflicts (e.g., overlapping schedules) are resolved.
Output generation:
The agent delivers the aggregated result to the user in a user-friendly format, such as a summary of
booking options or actionable recommendations.
o Actions:
- Output is displayed via the user interface or transmitted to another system for further
action.
- If necessary, the agent provides follow-up prompts to refine the user’s preferences or
choices.
Logging and continuous improvement:
Interaction logs may be stored temporarily for debugging, system improvements, or retraining
purposes, depending on the organization’s policies.
o Actions:
- Logs are analyzed to optimize the agent’s performance and enhance user experience.
40
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
173
OWASP, ‘Agentic AI – Threats and Mitigations’ (2025) https://genaisecurityproject.com/resource/agentic-ai-threats-and-mitigations/
41
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
42
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
178
See footnote 176
179
idem
180
See footnote 175
43
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
- Limit log retention periods and ensure compliance with privacy regulations.
The addition of filters introduces complexity into the system’s architecture: Filters may add
latency/processing time, impacting response times in real-time systems. They need to be secure, as
vulnerabilities could expose sensitive data or allow malicious inputs to bypass scrutiny. And must be
monitored and regularly updated to adapt to new risks, changing regulations, or evolving system
requirements.
Roles in LLMs Service Models According to the AI Act and the GDPR
The roles of provider and deployer under the AI Act, as well as controller and processor under the GDPR,
may differ based on the different service model. Below is an explanation of how these roles may apply
and the rationale behind their assignment, categorized by each service model. Note that the
qualification of organizations as controller or processor should be assessed based on the circumstances
of each case, and the explanation provided here is intended for reference purposes only and does not
imply it will always apply in the same way.
181
Inan, H. et al., ‘Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations’ (2023)
https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/
44
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
AI Act Roles
Provider: The organization that develops and offers the LLM as a service. Providers are
responsible for ensuring compliance with the AI Act, including risk management, transparency,
and technical robustness (e.g., OpenAI providing GPT models via APIs).
Deployer: The organization using the LLM (e.g., a business using the provided interface for any
particular task).
New provider: An organization integrating the API of an LLM as a Service model into their
commercial AI system (e.g., a chatbot) could also be considered a provider under the AI Act if
their system qualifies as high-risk and falls within the scope of Article 25 of the AI Act.
GDPR Roles
Deployer as controller: The deployer using the LLM as a Service typically acts as the data
controller, as they determine the purposes and means of data processing (e.g., collecting
customer queries to improve services or using an LLM tool for summarization purposes).
Provider as controller: When providers collect or retain data for their own purposes (e.g., model
fine-tuning or feature improvement), they assume the role of controller too. This is the case in
most LLM as a Service solutions where providers have ownership of the model and the training
data. In this scenario, a joint controllership might be the more suitable option.
Processor: The provider acts as a processor when handling data strictly according to the
deployer's instructions for specific tasks, like generating responses. This might be difficult in this
service model due to the providers model’s ownership.
In an LLM as a Service model scenario, we often talk about the concept of shared responsibility, where
both the provider and the deployer play distinct but complementary roles in ensuring privacy, security,
and compliance. The provider is responsible for the infrastructure, model training, and maintenance,
while the deployer must ensure secure usage, proper integration, and adherence to applicable
regulations within their specific deployment context. This division of responsibilities requires clear
agreements and robust collaboration to effectively manage risks.
45
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
AI Act Roles
Provider: The organization that develops, puts in the market or into service the off-the-shelf
LLM model. Providers are responsible for ensuring that the model adheres to the AI Act’s
requirements182. In case of LLMs released under free and open source licenses, they should be
considered to ensure high levels of transparency and openness if their parameters, including
the weights, the information on the model architecture, and the information on model usage
are made publicly available183
o If the platform provider develops, trains, or significantly fine-tunes an LLM and makes
it available to deployers, they would act as providers under the AI Act.
o The platform could also just have the role of infrastructure enabler and not being
considered then a provider but a distributor.
Deployer: The organization using the off-the-shelf model to build or enhance its own services
takes on the role of deployer. However, in cases of high risk AI systems, the deployer may also
assume the role of provider if they significantly modify or fine-tune the model or make it
available to others as part of their own services. This dual role is addressed under Article 25 of
the AI Act.
GDPR Roles
Deployer as Controller: The deployer typically acts as the controller, as they determine the
purpose and means of processing personal data during their use of the LLM.
Provider as controller: The original model provider may act as a controller in limited scenarios
where they process data for their own purposes. If the platform provider logs, analyzes, or
retains user or deployer data for purposes like improving platform services, debugging, or
monitoring system performance, they could be taking on the role of controller for this specific
data processing.
Processor: This role could be carrying out cloud-based tasks explicitly instructed by the
deployer. For example, during data inference tasks, data might be processed according to the
deployer’s instructions. In this case, a platform providing a model could act as a processor under
the GDPR.
The provider remains accountable for the foundational model’s compliance and functionality. The
deployer is responsible for how the model is implemented, customized, and operated within their
specific context, especially in scenarios where data is processed locally, or cloud tasks are guided by the
deployer. This dual-layered responsibility emphasizes the need for clear contractual agreements and
robust governance mechanisms.
182 Note based on recital 104 AI Act: Providers of general-purpose AI models released under a free and open source license, with publicly
available parameters (including weights, architecture details, and usage information), should be subject to exceptions regarding transparency-
related requirements under the AI Act. However, exceptions should not apply when such models present a systemic risk. In such cases,
transparency and an open source license alone should not suffice to exempt the provider from compliance with the regulation's obligations.
Furthermore, the release of open source models does not inherently guarantee substantial disclosure about the datasets used for training or
fine-tuning, nor does it ensure compliance with copyright law. Therefore, providers should still be required to produce a summary of the
content used for model training and implement a policy to comply with Union copyright law, including identifying and respecting reservations
of rights as outlined in Article 4(3) of Directive (EU) 2019/790.
183
Recital 102 AI Act
46
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
3. Self-developed LLMs
All operations, from model development, infrastructure, input collection to model processing, are
performed under the responsibility of the provider that is often also deploying the model for own use.
AI Act Roles
Provider: The entity developing the LLM.
Deployer: The organization deploying the solution and taking on most operational
responsibilities, including monitoring, risk management, and transparency.
In this specific service model, the organization developing the LLM system could be the same
organization putting the system into own use. In that scenario the same organization would be
considered a provider and deployer under the AI Act.
GDPR Roles
Provider as Controller: The LLM system developer, as they control and execute all data
processing activities within their local infrastructure during development.
Deployer as Controller: The deployer, as they determine the purpose and means of processing
personal data during their use of the LLM.
Processor: Any third party processing data on behalf of the controller might take this role.
The controller’s full control over infrastructure and data makes them responsible for compliance with
GDPR and AI Act requirements.
The processor’s role is limited to any third party tool or component that the controller could be using in
the process.
4. Agentic AI Systems
Agentic AI systems introduce unique dynamics to data flows and role allocation under the GDPR and AI
Act due to their autonomous and dynamic behavior.
AI Act Roles
Provider: The entity developing and supplying the LLM or core agentic architecture.
Deployer: The organization implementing the agentic AI system for its own or third-party use.
In high risk AI systems, if the deployer fine-tunes the agent, integrates it with specific systems,
or significantly modifies its architecture, they may also assume the role of a provider under
Article 25 of the AI Act, responsible for compliance of the modified system.
47
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
In this service model, the deployer is often both a provider and a deployer, depending on the level of
customization, fine-tuning, or downstream deployment of the agentic AI system.
GDPR Roles
Deployer as Controller: The deployer typically assumes the role of the controller, as they
determine the purpose and means of processing personal data. This includes inputs, outputs,
memory management, and interactions with external systems.
Processor: When the deployer uses third-party tools, external APIs, or cloud services as part of
the agentic AI’s operations, these third-party providers could act as processors. For example, if
an external API or service facilitates real-time data retrieval or enhances functionality, it takes
on a processing role under the deployer’s instruction. In some cases, third-parties could act as
joint-controllers.
Responsibility Sharing:
The deployer bears significant responsibility for managing the AI agent’s outputs and interactions.
However, providers supplying foundational LLMs, or modules could also share responsibility for pre-
deployment compliance.
This table shows an overview of the possible roles per service model, always subject to an assessment
of the circumstances at hand:
Model Deployer as Controller Provider as Controller Processors
LLMs as a When the deployer uses the LLM for When the provider performs fine- Model Provider: For processing
Service application-specific purposes, tuning, training, or analytics beyond data under deployer’s
defining the data processing goals deployer instructions (e.g., retaining instructions, such as handling
and methods (e.g., handling user data for retraining or monitoring input/output during inference
queries). purposes). tasks.
LLM ‘Off-the- For determining the use of the LLM When the original model provider Platform: For cloud-based
Self’ system, controlling data during retains or reuses data for its purposes processing tasks performed under
preprocessing, output handling, and (e.g., debugging or performance deployer’s instructions (e.g.,
customization of the LLM for specific monitoring). hosting the model for inference).
workflows.
Self-developed Fully applicable when developer is Fully applicable, as the organization Not applicable, as no third-party
LLM also deployer, as the organization both develops and controls the model provider is involved in
directly defines data processing model. processing. There might be other
goals. third parties acting as processors.
Agentic AI The deployer acts as the primary When the foundational model Model and tool provider / other
controller, managing data inputs, provider or module supplier retains third parties: When external APIs,
task assignments, memory storage, data from interactions for their own tools, or platforms are used for
and external system interactions, purposes, such as improving specific agent functions under the
while overseeing the agent's reasoning components or tools. When deployer’s instructions.
autonomy. provider performs training beyond
deployer instructions.
48
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Risk Factors
To help identify risks associated to the use of LLMs we can make use of a variety of risk factors.
Risk factors are conditions associated with a higher probability of undesirable outcomes. They can help
to identify, assess, and prioritize potential risks. For instance, processing sensitive data and large
volumes of data are two risk factors with a high level of risk. Acknowledging them in your own use case,
can help you identify related potential risks and their severity.
The risk factors shown below are the result of analysing the contents of legal instruments such as the
GDPR184, the EUDPR185, the EU Charter186 and other applicable guidelines related to privacy and data
protection.187The following risk factors can help us identify data protection and privacy high level risks
in LLM-based systems:
High level Risk / Important concerns Examples of applicability
Sensitive & impactful purpose of the processing Deploying an LLM to determine creditworthiness or loan
Using a LLMS to decide on or prevent the exercise of approvals without human oversight, or to automate
fundamental rights of individuals, or about their access to a decisions about hiring, promotions, or job terminations
service, the execution or performance of a contract, or access without adequate safeguards could negatively impact
to financial services is a concern, especially if these decisions will individuals.
be automated without human intervention. Wrong decisions
could have an adverse impact on individuals.
Processing sensitive data Using an LLM-based system to analyze patient records,
When an LLM is processing sensitive data such as special diagnoses, or treatment plans or data related to criminal
categories of data, personal data related to convictions and convictions, court records, or investigative reports.
criminal offences, financial data, behavioral data, unique
identifiers, location data, etc. This is a reason of concern since
processing inappropriately this personal data could negatively
impact individuals.
Large scale processing An LLM deployed in a large e-commerce platform
Processing high volumes of personal data is a reason of concern, processing vast amounts of user data, or an LLM used in a
especially if these personal data are sensitive. The higher the social media platform.
volume the bigger the impact in case of a data breach or any
other situation that put the individuals at risk.
Processing data of vulnerable individuals This could be the case when LLM systems are used in the
This is a concern because vulnerable individuals often require health sector, at schools, social services organizations,
special protection. Processing their personal data without government institutions, employers, etc. For instance, an
proper safeguards can lead to violations of their fundamental LLM-based platform used in schools to assess student
rights. Some examples of vulnerable individuals are children, performance and provide personalized learning
elderly people, people with mental illness, disabled, patients, recommendations processes data about children.
people at risk of social exclusion, asylum seekers, persons who
access social services, employees, etc.
184
General Data Protection Regulation (2016/679)
185
European Union Data Protection Regulation (Reg. 2018/1725)
186
Charter of Fundamental Rights of the European Union (2012/C 326/02)
187
AEPD, ‘Risk Management and Impact Assessment in Processing of Personal Data’ p-79 (2021) https://www.aepd.es/guides/risk-
management-and-impact-assessment-in-processing-personal-data.pdf
49
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Low data quality LLMs rely heavily on the quality of both the input data
The low data quality of the input data and/or the training data provided by users and the data used for training the model.
is a concern bringing possible risks of inaccuracies in the Any inaccuracies, biases, or incompleteness in the data can
generated output what could cause wrong identification of have far-reaching consequences, as LLMs generate outputs
characters and have other adverse impacts depending on the based on patterns they detect in their training and input
use case. data. The degree of risk posed by low data quality depends
heavily on the application. In less critical use cases, such as
content generation, inaccuracies may be less impactful.
However, in high-stakes scenarios, such as healthcare,
finance, or public policy, even minor inaccuracies can have
significant negative consequences.
Insufficient security measures This could be the case if there are not sufficient safeguards
The lack of sufficient safeguards could be the cause of a data implemented to protect the input data and the results of
breach. Data could also be transferred to states or organizations the processing. This could be applicable to any use case.
in other countries without an adequate level of protection. LLMs offered as SaaS solutions involve in some cases data
being sent for processing to servers in countries without
adequate data protection laws, increasing exposure to
privacy risks.
A hazard refers to a potential source of harm, while hazard exposure describes the conditions or extent
to which individuals or systems are exposed to that harm in a hazardous situation. Safety represents
the measures implemented to minimize or mitigate harm, ensuring the system operates as intended
without causing undue risk. Threats are external factors that may exploit vulnerabilities within the LLM
based system, which are weaknesses that could be exploited to compromise functionality, security, or
data protection. The AI Act emphasizes the protection of fundamental rights, including privacy, to
ensure that AI systems do not adversely impact individuals.
When trying to identify the risks of an LLM based system, is important to consider all these components
of risks189 that could have an impact on privacy and data protection. Privacy risks often stem from
hazards, or from vulnerabilities within the system that could be exploited by external or internal threats.
188
These terms are explained here in an accessible manner to aid understanding, but they are not official definitions. The European harmonized
standard on risk management, currently being developed by CEN/CENELEC at the request of the European Commission, will contain
standardized definitions and provide formalized guidance on these terms.
189
Novelli, C et al., ‘AI Risk Assessment: A Scenario-Based, Proportional Methodology for the AI Act’ (2024) https://doi.org/10.1007/s44206-
024-00095-1
50
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
A hazard exposure in this context could refer to how individuals’ personal data is exposed to these risks
through the use of the LLM based system, for example, during input querying.
Understanding these interrelated concepts facilitates the risk management process of AI systems that
need to comply with the GDPR and the AI Act having as end goal the protection of individuals.
190
The AI Act requires in Article 9 (2)(a) for high-risk AI systems risk management systems ‘the identification and analysis of the known and the
reasonably foreseeable risks that the high-risk AI system can pose to health, safety or fundamental rights when the high-risk AI system is used
in accordance with its intended purpose;
191
NIST, ‘Artificial Intelligence Risk Management Framework (AI RMF 1.0)’ (2023) https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf
192 Threat Modeling Manifiesto, (n.d) https://www.threatmodelingmanifesto.org/
193 Meta, Frontier AI Framework (2025) https://ai.meta.com/static-resource/meta-frontier-ai-
framework/?utm_source=newsroom&utm_medium=web&utm_content=Frontier_AI_Framework_PDF&utm_campaign=Our_Approach_to_F
rontier_AI_blog
51
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
194
Ofcom, ‘Protecting people from illegal harms online - Annex 5: Service Risk Assessment Guidance (2024)
https://www.ofcom.org.uk/siteassets/resources/documents/consultations/category-1-10-weeks/270826-consultation-protecting-people-
from-illegal-content-online/associated-documents/annex-5-draft-service-risk-assessment-guidance?v=330403
195
Martineau, K. ‘What is red teaming for generative AI?’ (2024) https://research.ibm.com/blog/what-is-red-teaming-gen-AI
196
Recital 172 AI Act: “Persons acting as whistleblowers on the infringements of this Regulation should be protected under the Union law.
Directive (EU) 2019/1937 of the European Parliament and of the Council (54) should therefore apply to the reporting of infringements of this
Regulation and the protection of persons reporting such infringements.”
197
Ofcom, ‘Protecting people from illegal harms online - Annex 5: Service Risk Assessment Guidance (2024)
https://www.ofcom.org.uk/siteassets/resources/documents/consultations/category-1-10-weeks/270826-consultation-protecting-people-
from-illegal-content-online/associated-documents/annex-5-draft-service-risk-assessment-guidance?v=330403
52
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Providers and developers of LLMs must implement risk management as an iterative process to identify
and address risks, recognizing that these risks can emerge at various phases of the development
lifecycle, as discussed in previous sections.
The overview provided by the table below can serve as a practical starting point for identifying and
analyzing privacy and data protection risks throughout the lifecycle of LLM based systems. The table
presents a consolidated summary of privacy risks, complementing the details already provided in
Section 3 under Data Flow and Associated Privacy Risks in LLM Systems.
53
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Risk applicability
Data Protection and Risk description GDPR potential Impact Examples Service Model Provider Deployer
Privacy Risks
1. Insufficient protection Safeguards for the Infringement of: Sensitive data disclosure in user inputs or during
of personal data what protection of personal Art. 32 Security of processing, Art. training, inference and output. Unauthorized access, LLM as a Service
eventually can be the data are not implemented 5(1)(f) Integrity and confidentiality insufficient encryption during data transmission, API LLM ‘off-the-shelf’
cause of a data breach. or are insufficient. and Art. 9 Processing of special misuse, interface vulnerabilities, inadequate Self-developed LLM
categories of personal data anonymization or filtering techniques, third party Agentic LLM
exposure.
2. Misclassifying training Controllers may Infringement of: An LLM trained on improperly anonymized user logs
data as anonymous by incorrectly assume Articles 5(1)(a) (Lawfulness, reveals identifiable user information through model LLM as a Service198
controllers when it training data is Fairness, and Transparency), 5(1)(b) inference attacks. LLM ‘off-the-shelf’199
contains identifiable anonymous, failing to (Purpose Limitation), 25 (Data A deployer discovers that the third-party LLM they Self-developed LLM
information. implement necessary Protection by Design and Default) are using has been trained on non-anonymized Agentic LLM
safeguards for personal personal data, and the vendor fails to implement
data protection. appropriate safeguards, exposing the deployer to
compliance risks.
3. Unlawful processing of Personal data is included Infringement of: An e-commerce platform uses customer purchase
personal data in training in training datasets Articles 5(1)(a) (Lawfulness, histories to train an LLM without informing LLM as a Service200
sets. without proper legal Fairness, and Transparency) customers or obtaining their consent. LLM ‘off-the-shelf’201
basis, safeguards, or user Articles 6(1) (Lawfulness of Self-developed LLM
consent. Processing), 7 (Consent), 5(1)(c) Agentic LLM
(Data Minimization)
4. Unlawful processing of Training datasets include Infringement of: Medical records scraped from unsecured online
special categories of sensitive data, such as Articles 9(1) and 9(2) (Special sources are used to train a healthcare chatbot LLM as a Service202
personal data and data health or criminal Categories of Data), Article 10 without applying GDPR compliant safeguards. LLM ‘off-the-shelf’203
relating to criminal records, without meeting (Criminal Convictions and Self-developed LLM
convictions and offences GDPR exceptions for Offences). Agentic LLM
in training data. lawful processing.
5. Possible adverse The output of the system Infringement of: A system providing output that is not accurate or
impact on data subjects could have an adverse Art. 5(1)(d) Accuracy, Art. 5(1)(a) contain bias and does not provide with mechanisms LLM as a Service
that could negatively impact on the individual. Fairness, Art. 22 Automated to amend errors. The output of an LLM could be used LLM ‘off-the-shelf’
individual decision-making, to make automatic decisions which produce legal Self-developed LLM
198 This risk primarily applies to the provider; however, the deployer shares responsibility by ensuring they engage with lawful vendors. The deployer's role includes conducting due diligence to verify that the provider complies
with legal obligations and operates within the bounds of applicable regulations.
199
Idem
200
Idem
201
Idem
202
Idem
203
Idem
54
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
impact fundamental including profiling, Art. 25 Data effects or similarly significant effects on data Agentic LLM
rights. protection by design and by default subjects.
6. Not providing human Automated decisions that Infringement of: A chatbot automates loan approvals based on user
intervention for a significantly impact Articles 22(1) and 22(3) (Automated provided data, denying applications without LLM as a Service
processing that can have individuals are made Decision-Making), Article 12 involving a human reviewer. LLM ‘off-the-shelf’
a legal or important effect without human review, (Transparent Communication). Self-developed LLM
on the data subject. violating GDPR Agentic LLM
requirements for human
oversight, or are based
on inappropriate
ground204.
7. Not granting data Data subjects’ rights Infringement of: Data subjects’ requests to rectify or to erase personal
subjects their rights. cannot be completely or Art. 12 – 14: Information to be data cannot be completed. Users are not aware of LLM as a Service
partially granted. provided when personal data is how their data will be used, retained, or shared by LLM ‘off-the-shelf’
collected the provider. Self-developed LLM
Art. 16 and Art. 17: Right to Agentic LLM
rectification and right to erasure
Article 18 Right to restriction of
processing and Article 21 Right to
object
8. Unlawful repurpose of Personal data is used for a Infringement of: This could be the case if the provider uses the input
personal data. different purpose. Art. 5(1)(b) Purpose limitation, Art. and/or output data for training the LLM without this LLM as a Service
5(1)(a) Lawfulness, fairness and being formally agreed on beforehand. LLM ‘off-the-shelf’
transparency, Self-developed LLM
Article 28(3)(a)205 and Art. 29 Agentic LLM
Processing under the authority of
the controller or processor
9. Unlawful unlimited Input and/or output data Infringement of: The system could be unnecessarily storing input data LLM as a Service
storage of personal data. is being stored longer Art. 5(1)(e) Storage limitation and that is not directly relevant to the LLM process. In LLM ‘off-the-shelf’
than necessary. Art. 25 Data protection by design some cases, the output could be stored by the Self-developed LLM
and by default deployer longer than necessary. Providers can also Agentic LLM
Under the exceptions outlined in Article 22(2) of the GDPR, automated individual decision-making is permitted only if it is based on contractual necessity, explicit consent, or if authorized by EU or Member State law.
204
“processes the personal data only on documented instructions from the controller, including with regard to transfers of personal data to a third country or an international organisation, unless required to do so by Union or
205
Member State law to which the processor is subject; in such a case, the processor shall inform the controller of that legal requirement before processing, unless that law prohibits such information on important grounds of
public interest;”
55
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
206
Garante per la Protezione dei dati personali (DPDP), ‘Intelligenza artificiale: il Garante privacy blocca DeepSeek’ (2025) https://www.garanteprivacy.it/home/docweb/-/docweb-
display/docweb/10097450?mkt_tok=MTM4LUVaTS0wNDIAAAGYXIH0PW4qTzz-TKclqJPRoyU5yZoUVox1JLxNIcVP7RTnC_bvlu_rRyXg8hy6RdOqFw9BgFYU8wXP1XmPVVBTU7DCNt1660jK9umFkCSnLY4e#english
56
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
When assessing the risks associated with LLMs, it is crucial to consider broader issues linked to GDPR
principles such as lawfulness, fairness, transparency, and accountability. In addition to privacy concerns,
also issues related to copyright, overreliance and manipulation must be addressed.
Copyright209
LLMs trained on web-scraped or publicly available data often include copyrighted materials, raising
concerns about intellectual property violations. Outputs generated by such models may unintentionally
replicate protected content, creating legal risks for both providers and deployers. These issues highlight
the importance of ensuring that data used to train LLMs is collected and processed lawfully and in
accordance with copyright laws.
207
EDPB, ‘Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models, Adopted
on 17 December 2024, (2024) https://www.edpb.europa.eu/our-work-tools/our-documents/opinion-board-art-64/opinion-282024-certain-
data-protection-aspects_en
208
Lareo, X. ‘Large Language Models’, EDPS, (n.d) https://www.edps.europa.eu/data-protection/technology-monitoring/techsonar/large-
language-models-llm_en
209
European Innovation Council and SMEs Executive Agency, ‘Artificial intelligence and copyright: use of generative AI tools to develop new
content’ (2024) https://intellectual-property-helpdesk.ec.europa.eu/news-events/news/artificial-intelligence-and-copyright-use-generative-
ai-tools-develop-new-content-2024-07-16-0_en
210
Jacobi, O., ‘The Risks of Overreliance on Large Language Models (LLMs)’ (2024) https://www.aporia.com/learn/risks-of-overreliance-on-
llms/
57
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Various risk management methodologies are available for classifying and assessing risks. This document
does not aim to prescribe or define a specific methodology, as the choice should be determined by each
organization. However, for the purposes of this document, we will reference international standards
previously highlighted in the WP29215 and the AEPD216 guidelines as well as the work being currently
done in European AI standardization.
In general risk management terms, risk can be expressed as:
Risk = Probability x Severity
This equation highlights that risk is determined by the probability of an event occurring, combined with
the potential impact or severity of the resulting harm.
211
Note: In this document, we use the term "probability" instead of "likelihood" to align with terminology found in definitions like the one for
risk in the AI Act. While in risk management, "likelihood" typically indicates a qualitative approach to managing risks, "probability" implies a
quantitative method of risk assessment.
212 European Center for Not-for-Profit Law, ‘Framework for Meaningful Engagement: Human rights impact assessments of AI’ (2023)
https://ecnl.org/publications/framework-meaningful-engagement-human-rights-impact-assessments-ai
213
O’Neil, C. ‘Algorithmic Stakeholders: An Ethical Matrix for AI’ (2020) https://blog.dataiku.com/algorithmic-stakeholders-an-ethical-matrix-
for-ai
214
Article 29 Data Protection Working Party, ‘Guidelines on Data Protection Impact Assessment (DPIA) and determining whether processing is
“likely to result in a high risk for the purposes of Regulation 2016/679’ (2017) https://ec.europa.eu/newsroom/article29/items/611236/en
215
ISO 31000:2009, Risk management — Principles and guidelines, International Organization for Standardization (ISO); ISO/IEC 29134,
Information technology – Security techniques – Privacy impact assessment – Guidelines, International Organization for Standardization (ISO).
216
ISO 31010:2019, Risk management — Risk Assessment Techniques, International Organization for Standardization (ISO)
58
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Risk is defined in the GDPR (Recital 75) as the potential harm to the rights and freedoms of natural
persons, of varying probability and severity, arising from personal data processing. Similarly, the AI Act
(Article 3) defines risk as ‘the combination of the probability of an occurrence of harm and the severity
of that harm;’.
To evaluate the level of data protection and privacy risks when procuring, developing, or using LLMs, it
is essential to estimate both the probability and severity of the identified risks materializing.
Probability determination must be tailored to the specific risks and use cases under assessment. While
this general matrix provides a structured approach, applying more detailed criteria can enhance the
accuracy of the probability assessment.
In the table below, there is an example of criteria217 that can guide this process, helping to refine the
evaluation of probability for specific scenarios. Note that some criteria relate to system-level attributes
while other are context-specific.
217
Barberá, I. ‘FRASP, A Structured Framework for Assessing the Severity & Probability of Fundamental Rights Interferences in AI Systems’
(2025)
59
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
PROBABILITY LEVELS
Criteria Description Level 1 (Unlikely) Level 2 Level 3 Level 4
(Low) (High) (Very High)
1. Frequency How often the AI system The system is The system is The system is The system is used
of Use is used, increasing rarely used or has occasionally used frequently used and continuously or in real-
exposure to potential risk infrequent but not in critical integrated into time critical operations
affecting reliability interactions (e.g., operations (e.g., important (e.g., daily).
(expected time before annual or less). monthly). operations (e.g.,
failure) weekly).
2. Exposure to The extent to which the AI The system is not The system The system is used in The system operates in
High-Risk system operates in used in sensitive operates in high-stakes highly sensitive or
Scenarios sensitive or high-stakes or high-stakes moderately environments with critical environments
environments. scenarios. sensitive potential for (e.g., healthcare,
environments significant impact. security).
with minimal
stakes.
3. Historical Past instances of similar No similar risks or Few similar risks Similar risks or Frequent and
Precedents risks or failures in the failures have or failures have failures have significant risks or
same or comparable AI occurred in occurred in occurred frequently failures have occurred
systems. comparable comparable in comparable in comparable systems.
systems. systems. systems.
4. External, uncontrollable External External External conditions External conditions
Environmental conditions affecting conditions are conditions often impact the severely affect the
Factors system performance or stable and do not occasionally system's system's performance,
reliability (e.g., political impact the affect the performance, creating constant risks.
instability, regulatory system's system's creating
gaps, financial performance. performance but vulnerabilities.
constraints). are manageable.
5. System The degree to which the The system is The system is The system has The system lacks
Robustness AI system is resistant to highly robust with moderately some robustness robustness,
failure or unintended multiple robust with some but contains safeguards, or is prone
behaviour. redundancies and redundancies but significant to frequent failures.
safeguards. occasional vulnerabilities or
vulnerabilities. weak safeguards.
6. Data Quality The extent to which the AI Data is highly Data is mostly Data is partially Data is significantly
and Integrity system relies on accurate, accurate, accurate and accurate or inaccurate, biased, or
unbiased, and complete unbiased, and complete but has complete, with incomplete, leading to
data. Modifiable through complete with occasional minor notable biases or high risk.
better dataset curation or minimal risk of biases or errors. inconsistencies.
validation. errors.
7. Human How human operators’ Operators are Operators are Operators are Operators are
Oversight and skills and decision-making highly trained, moderately undertrained or untrained or
Expertise affect system reliability experienced, and trained and inconsistent, leading ineffective, causing
and risk probability. consistently effective, but to regular errors in frequent and severe
Modifiable through effective in occasional errors decision-making. errors.
training or oversight decision-making. occur.
improvements.
To use the criteria for determining the probability of risks you can do the following:
Step 1: Aggregate Scores
Evaluate each criterion and assign it a score based on predefined probability levels 1 to 4.
Add the scores of all factors and divide the total by the number of factors to calculate the
Aggregate Probability Score. This can be done using either:
o Weighted Average: Assign more importance to certain factors by weighting them before
averaging.
o Simple Average: Treat all factors equally and calculate the mean.
60
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Very Limited
Moderate or Minor
Harm In the above case (limited) when all effects are reversible
Note that the HUDERIA219 risk management methodology, developed by the Committee on Artificial
Intelligence of the Council of Europe, also employs a four-level severity matrix. However, it uses slightly
218
AEPD, ‘Risk Management and Impact Assessment in Processing of Personal Data’ p - 77 (2021) https://www.aepd.es/guides/risk-
management-and-impact-assessment-in-processing-personal-data.pdf
219 Council of Europe (CAI), ‘Methodology for the Risk and impact assessment of Artificial Intelligence Systems from the point of view of human
rights, democracy and the rule of law (Huderia Methodology)’ (2024) https://rm.coe.int/cai-2024-16rev2-methodology-for-the-risk-and-
impact-assessment-of-arti/1680b2a09f
61
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
different terminology, as shown in this matrix (italicized): Catastrophic Harm, Critical Harm, Serious
Harm, and Moderate or Minor Harm.
Similar to the assessment of probability, the assessment of severity can also benefit from the use of
different severity criteria220 to reduce subjectivity in the process. The severity criteria are related to a
loss of privacy that is experienced by the data subject but that may have further related consequences
impacting other individuals and/or society.
The table below outlines different severity221 criteria. The calculation of severity can follow the same
steps as those used for determining probability, including aggregating scores and mapping them to
severity levels. However, for severity, certain criteria (numbers 1 to 5, and 7 & 8) act as "stoppers" This
means that the end score will always be the highest one from those criteria no matter what the
aggregation score is. For instance, if any of these criteria are assessed at the highest level (4), the overall
severity score is immediately assigned a level 4. This approach ensures that critical harms, such as those
involving irreversible damage, are appropriately prioritized and flagged for immediate and
comprehensive mitigation measures.
220
Article 29 Working Party (WP 208), ‘Statement on the role of risk-based approach in data protection legal frameworks’, p - 4 (2014)
https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp218_en.pdf “[...] Risks, which are related to
potential negative impact on the data subject’s rights, freedoms and interests, should be determined taking into consideration specific
objective criteria such as the nature of personal data (e.g. sensitive or not), the category of data subject (e.g. minor or not), the number of
data subjects affected, and the purpose of the processing. The severity and the probability of the impacts on rights and freedoms of the data
subject constitute elements to take into consideration to evaluate the risks for individual’s privacy’’.
221
See footnote 217
62
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
SEVERITY LEVELS
Criteria Description Level 1 (Very Limited) Level 2 (Limited) Level 3 (Significant) Level 4 (Very Significant)
Moderate or Minor Harm Serious Harm Critical Harm Catastrophic Harm
Moderate or minor prejudices or Serious prejudices or impairments Critical prejudices or impairments in Catastrophic prejudices or
impairments in the exercise of in the exercise of fundamental the exercise of fundamental rights impairments in the exercise of
fundamental rights and freedoms rights and freedoms that lead to the and freedoms that lead to the fundamental rights and freedoms
that do not lead to any significant, temporary degradation of human significant and enduring that lead to the deprivation of the
enduring, or temporary degradation dignity, autonomy, physical, degradation of human dignity, right to life; irreversible injury to
of human dignity, autonomy, psychological, or moral integrity, or autonomy, physical, psychological, physical, psychological, or moral
physical, psychological, or moral the integrity of communal life, or moral integrity, or the integrity of integrity; deprivation of the welfare
integrity, or the integrity of democratic society, or just legal communal life, democratic society, of entire groups or communities;
communal life, democratic society, order or that harm to the or just legal order. catastrophic harm to democratic
or just legal order. information and communication society, the rule of law, or to the
environment. preconditions of democratic ways
of life and just legal order;
deprivation of individual freedom
and of the right to liberty and
security; harm to the biosphere.
1. Nature of the This criterion evaluates the The fundamental right affected is The fundamental right affected is The fundamental right affected is The fundamental right affected is
fundamental right nature of the fundamental right highly limited in scope and moderately limited, meaning minimally limited, meaning absolute and non-derogable,
and Legal affected—whether it is applicability, meaning it is restrictions are lawful but subject to restrictions are only lawful under meaning no lawful restriction is
limitation absolute or subject to frequently subject to lawful stricter justification requirements exceptional and tightly controlled permitted under any circumstances.
alignment limitations—and assesses the restrictions with minimal and more specific conditions. circumstances. Alternatively, the use case does not
extent to which the AI system's requirements to justify the The use case aligns with legal The use case partially aligns with align with lawful and proportionate
use case aligns with lawful and interference. limitations, but the interference lawful exceptions, but there are limitations, even if the right is not
proportionate restrictions. The use case clearly and fully aligns requires a moderate level of uncertainties about the absolute, causing severe violations of
Absolute rights are non- with permitted legal limitations, justification, such as demonstrating proportionality, necessity, or legal or normative frameworks.
derogable and cannot be and the interference is routine and proportionality and necessity, legitimacy of the interference,
restricted under any widely accepted, without causing otherwise causing possible minor causing possible major violations of * The Charter does not explicitly
circumstances, while other significant violations of legal or violations of legal or normative legal or normative frameworks. identify the rights that are absolute.
rights may be limited only if the normative frameworks. frameworks. Based on the Charter explanations,
interference meets strict legal, the ECHR and the case law of the
proportionality, and necessity European courts, it is submitted that
requirements. This criterion human dignity (Article 1 of the
helps determine the severity of Charter), the prohibition of torture
the impact based on the degree and inhuman or degrading treatment
of misalignment or violation of or punishment (Article 4 of the
the right's protections. Charter), the prohibition of slavery
and forced labour (Article 5(1) and (2)
63
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
64
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
5. Scale of Impact The breadth of the Impact is limited to a small, Impact is limited to specific groups Impact spans multiple groups or Impact is widespread, affecting
(Societal, Group, infringement across societal, localized group or individual. Fewer or a small societal segment. societal domains. Between 1,000 societal, group, and individual levels.
Individual) & group, and individual levels. than 100 individuals affected. Between 100 and 1,000 individuals and 100,000 individuals affected. Over 100,000 individuals affected.
Number of Data This criterion considers the affected.
Subjects Affected scale of the impact based on
the number of individuals
whose data is affected.
6. Contextual and How specific contextual factors Context or domain does not amplify Context or domain moderately Context or domain significantly Context or domain profoundly
Domain or domains intensify the the severity of the interference with amplifies the severity of the amplifies the severity of the amplifies the severity of the
Sensitivity interference's severity. the fundamental right. interference with the fundamental interference with the fundamental interference with the fundamental
Includes circumstantial risks right. right. right.
like socio-political instability
and if children and other
vulnerable groups are affected.
7. Reversibility, The difficulty or feasibility of Harm is fully reversible within a Harm is reversible with moderate Harm is difficult to reverse, requiring Harm is irreversible, with no feasible
recovery, degree reversing harm and the time short period with minimal effort. effort over a reasonable timeframe. significant effort or resources. means of recovery.
of remediability required for recovery. Includes
prohibitive risks where harm is
irreversible.
8. Duration and The length of time and Adverse effects are minimal and do Adverse effects persist briefly but Adverse effects persist for a Adverse effects are permanent or
Persistence of persistence of adverse effects not persist over time. do not result in long-term considerable period and can affect persist indefinitely.
Harm caused by the interference. consequences. multiple groups.
9. Velocity to The speed at which the risk Risk materialises gradually, Risk materialises at a moderate Risk materialises suddenly, leaving Risk materialises rapidly, with no
materialise materialises: gradual, sudden, providing sufficient time for pace, allowing for corrective limited time for intervention. opportunity for intervention.
continuously changing. intervention. measures.
10. Transparency The degree of system System is highly transparent with System lacks some transparency but System lacks transparency and has System is entirely opaque, with no
and mechanisms transparency and mechanisms clear and effective accountability has basic accountability weak accountability mechanisms. mechanisms for accountability.
for Accountability for accountability. mechanisms. mechanisms.
11. Ripple and The extent to which the No cascading effects; the risk is Minimal cascading effects; impacts Notable cascading effects; impacts Severe cascading effects; impacts
Cascading Effects interference triggers additional isolated and contained. are mostly contained. extend across domains. propagate extensively.
harms across systems or
domains.
65
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Assessing probability and severity provides the foundation for determining the overall risk level of the
identified privacy and data protection risks. Using a four-level classification matrix for both probability
and severity, risks can be categorized into final classifications of Very High, High, Medium, or Low.
A matrix, as shown below, serves as a practical tool to obtain these classifications, offering a clear and
structured ranking to prioritize risks and guide appropriate mitigation strategies. This classification is a
critical step in the next risk treatment process because it ensures that resources are directed toward
addressing the most pressing risks effectively.
Probability Very High Medium High Very high Very high
High Low High Very high Very high
Low Low Medium High Very high
Unlikely Low Low Medium Very high
Very limited Limited Significant Very
Significant
Severity
Figure 15. Risk Evaluation Matrix
Best practices in risk management suggest that the mitigation of very high and high level risks should
be prioritized.222 Once these critical risks are identified, the next essential step is to develop and
implement a risk treatment plan.
222
Oliva, L., ’ Successfully managing high-risk, critical-path projects’ (2003) https://www.pmi.org/learning/library/high-risk-critical-path-
projects-7675
223
Marsden, E. ’Risk acceptability and tolerability’ (n.d) https://risk-engineering.org/static/PDF/slides-risk-acceptability.pdf
224
Science Direct, ‘Definition of Residual Risk’ (2019) https://www.sciencedirect.com/topics/engineering/residual-risk
225
Article 5(2), Recital 74, GDPR
66
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Risk treatment involves developing strategies to mitigate identified risks and creating actionable
implementation plans. The choice of an appropriate treatment option should be context-specific,
guided by a feasibility analysis226 such as the following:
o Evaluate the type of risk and the available mitigation measures that can be implemented.
o Compare the potential benefits gained from implementing the mitigation against the costs and
efforts involved and the potential impact.
o Assess the impact on the intended purpose of the LLM system's implementation.
o Consider the reasonable expectations of individuals impacted by the system.
o Perform a trade-off analysis to evaluate the impact of potential mitigations on aspects such as
performance, transparency, and fairness, ensuring that processing remains ethical and
compliant based on the specific use case.
Analyzing these criteria is essential for effective risk mitigation and risk management planning, providing
clarity on whether specific mitigation efforts are justifiable. In all cases, the chosen treatment option
should be clearly justified and thoroughly documented to ensure accountability and compliance.
The most common risk treatment criteria are: Mitigate, Transfer, Avoid and Accept.
For each identified risk one of the criteria options will be selected:
Mitigate – Implement measures to reduce the probability or the severity of the risk.
Transfer – Shift responsibility for the risk to another party (e.g., through insurance or
outsourcing).
Avoid – Eliminate the risk entirely by addressing its root cause.
Accept – Decide to take no action, accepting the risk as is because it falls within acceptable limits
as defined in the risk criteria.
Deciding whether a risk can be mitigated involves assessing its nature, potential impact, and available
mitigation measures such as implementing controls, adopting best practices, modifying processes, or
using tools to reduce the probability or severity of the risk.
Not all risks can be fully mitigated. Some risks may be inherent and cannot be entirely avoided. In such
cases, the objective is to reduce the risk to an acceptable level or implement risk mitigation and control
measures that effectively manage its impact.
It is also important to maintain a dynamic risk register, containing risk records that are durable, easily
accessible, clear, and that are consistently updated to ensure accuracy and relevance 227.
Risks should also have clear ownership assigned, and regular reviews should be conducted to ensure
that risk management practices remain proactive.
226
Centre for Information Policy Leadership, ‘Risk, High Risk, Risk Assessments and Data Protection Impact Assessments under the GDPR. GDPR
Interpretation and Implementation Project’ (2016)
https://www.informationpolicycentre.com/uploads/5/7/1/0/57104281/cipl_gdpr_project_risk_white_paper_21_december_2016.pdf
227
Ofcom, ‘Protecting people from illegal harms online’ (2024) https://www.ofcom.org.uk/siteassets/resources/documents/online-
safety/information-for-industry/illegal-harms/volume-1-governance-and-risks-management.pdf?v=387545
67
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
228
EDPS, ‘TechSonar 2025 Report’ (2025) https://www.edps.europa.eu/data-protection/our-work/publications/reports/2024-11-15-techsonar-report-2025_en
229
This could be done by performing a pentest and/or requesting pentest results to the vendor.
230
AI Action Summit, ‘International AI Safety Report on the Safety of Advanced AI’ , p - 167, (2025)
https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf
231 Feretzakis, G et al., ‘V.S. Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review’(2024). https://doi.org/10.3390/info15110697
232 Examples of Memorization methods: https://blog.kjamistan.com/category/ml-memorization.html
68
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
associated to different LLM security threats233234 such as membership inference235, model inversion236 and poisoning
attacks237.
Also access and change logs are established to document access and changes to digitized records.
Employees and users are trained on security best practices.
Effective RAG systems require careful model alignment to prevent unauthorized access and sensitive data exposure.
Integration with multiple data sources necessitates robust security measures to ensure confidentiality and data
integrity, while adhering to data protection principles like necessity and proportionality. For outsourced RAG models
involving personal data transfer, compliance with GDPR's data transfer rules is critical to maintaining confidentiality
and legal obligations.238
2. Misclassifying training data as Implement robust testing and validation processes to ensure that (i) personal data associated with the training data
anonymous by controllers when it cannot be extracted from the model using reasonable means, and (ii) any outputs generated by the model do not link
contains identifiable information, back to or identify data subjects whose personal data was used during training.
leading to failure to implement
This assessment should be done taking into account ‘all the means reasonably likely to be used’ considering objective
appropriate safeguards for data
protection. (partly relating to risk 3) factors such as:241
o The characteristics of the training data, the AI model, and the training procedure.
Whenever information relating to o The context in which the AI model is released or processed.
identified or identifiable individuals o The availability of additional information that could enable identification.
242
whose personal data was used to train o The costs and time required to access such additional information, if not readily available.
the model may be obtained from an AI o Current technological capabilities and potential future advancements.
model with means reasonably likely to
Implement alternative approaches to anonymization if they provide an equivalent level of protection, ensuring they
be used239, it may be concluded that
such a model is not anonymous.240 align with the state of the art.
Implement structured testing against state of the art attacks such as attribute and membership inference, exfiltration,
regurgitation of training data model inversion, or reconstruction attacks.
Document and retain evidence to demonstrate compliance with these safeguards following accountability obligations
under Article 5(2) GDPR. Documentation should include:
233
OWASP, ‘OWASP Top 10 for LLM Applications 2025’ (2025) https://genai.owasp.org/llm-top-10/
234
Shamsabadi, S.A. et al., ’ Identifying and Mitigating Privacy Risks Stemming from Language Models’ (2024) https://arxiv.org/html/2310.01424v2
235 Shokri et al., ‘Membership Inference Attacks Against Machine Learning Models’ (2017) https://arxiv.org/abs/1610.05820
236 Zhang et al.,’Generative Model-Inversion Attacks Against Deep Neural Networks’, (2020) https://arxiv.org/abs/1911.07135
237
Guo, J. et al., ‘Practical Poisoning Attacks on Neural Networks’, (2020) https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123720137.pdf
238
https://www.edps.europa.eu/data-protection/our-work/publications/reports/2024-11-15-techsonar-report-2025_en
239
Membership Inference Attacks and Model Inversion Attacks.
240
EDPB, ‘Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models, Adopted on 17 December 2024, (2024) https://www.edpb.europa.eu/our-work-tools/our-
documents/opinion-board-art-64/opinion-282024-certain-data-protection-aspects_en
241
idem
242
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
69
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
3. Unlawful processing of personal data Document all training data sources (e.g., book databases, websites) to ensure accountability under Art. 5(2) GDPR.
in training sets Check training data for statistical distortions or biases and make necessary adjustments.
Exclude training data that includes unauthorized content, such as fake news, hate speech, or conspiracy theories.
Exclude content from publications that may contain personal data posing risks to individuals or groups, such as those
vulnerable to abuse, prejudice, or harm.
Remove unnecessary personal data (e.g., credit card numbers, email addresses, names) from the training dataset. 243
Employ methodological choices that significantly reduce or eliminate identifiability, such as using regularization
methods to enhance model generalization and minimize overfitting. 249
Implement robust privacy-preserving techniques, such as differential privacy.244
When using web scraping as a method to collect data, ensure compliance with Article 6(1)(f) GDPR by conducting a
thorough legal assessment. This includes evaluating:
o (i) the existence of a legitimate interest for data processing. Interest should be lawful, clearly articulated
and real, not speculative.
o (ii) the necessity of the processing, ensuring that personal data collected is adequate, relevant, and limited
to what is necessary for the stated purpose245, and
243
Bavarian State Office for Data Protection Supervision, ‘Data protection compliant Artificial intelligence Checklist with test criteria according to GDPR’
244 EDPB, ‘Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models, Adopted on 17 December 2024, (2024) https://www.edpb.europa.eu/our-work-tools/our-
documents/opinion-board-art-64/opinion-282024-certain-data-protection-aspects_en
245
Recital 39 GDPR clarifies that ‘Personal data should be processed only if the purpose of the processing could not reasonably be fulfilled by other means’
249
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
70
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Consideration should also be given to the reasonable expectations of data subjects regarding the use of their data.246
Involve the DPO in the balancing test, where applicable247.
For web scraping, assess whether the exemption under Article 14(5)(b) applies, ensuring all criteria are met to justify
not informing each data subject individually.
Transparency:
Provide public and easily accessible information that goes beyond GDPR requirements under Articles 13 and 14,
including details about collection criteria and datasets used, with special consideration for protecting children and
vulnerable individuals.
Use innovative approaches to inform data subjects, such as media campaigns, email notifications, graphic
visualizations, FAQs, transparency labels, model cards, and voluntary annual transparency reports.248
Implement an opt-out list managed by the controller, enabling data subjects to object to the collection of their data
from specific websites or platforms by providing identifying information before data collection begins.
4. Unlawful processing of special For the lawful processing of special categories of personal data, ensure that an exception under Article 9(2) GDPR
categories of personal data and data applies250. When relying on Article 9(2)(e), confirm that the data subject explicitly and intentionally made the data
relating to criminal convictions and publicly accessible through a clear affirmative action. The mere fact that personal data is publicly accessible does not
offences in training data. 253
imply that the data subject has manifestly made such data public251.
Given the challenges of case-by-case assessment in large-scale web scraping, implement safeguards such as filtering
to exclude data falling under Article 9(1) GDPR both during and immediately after data collection.
246 EDPB, ‘Report of the work undertaken by the ChatGPT Taskforce’ (2024) https://www.edpb.europa.eu/our-work-tools/our-documents/other/report-work-undertaken-chatgpt-taskforce_en
247
EDPB, ‘Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models, Adopted on 17 December 2024, (2024) https://www.edpb.europa.eu/our-work-tools/our-
documents/opinion-board-art-64/opinion-282024-certain-data-protection-aspects_en
248
Idem
250
EDPB, Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models. Adopted on 17 December 2024 (2024) : “The EDPB recalls the prohibition of Article 9(1) GDPR
regarding the processing of special categories of data and the limited exceptions of Article 9(2) GDPR. In this respect, the Court of Justice of the European Union (“CJEU”) further clarified that ‘where a set of data containing
both sensitive data and non-sensitive data is [...] collected en bloc without it being possible to separate the data items from each other at the time of collection, the processing of that set of data must be regarded as being
prohibited, within the meaning of Article 9(1) of the GDPR, if it contains at least one sensitive data item and none of the derogations in Article 9(2) of that regulation applies’ . Furthermore, the CJEU also emphasised that ‘for
the purposes of the application of the exception laid down in Article 9(2)(e) of the GDPR, it is important to ascertain whether the data subject had intended, explicitly and by a clear affirmative action, to make the personal data
in question accessible to the general public’ . These considerations should be taken into account when processing of personal data in the context of AI models involves special categories of data.”
251
EDPB, ‘Report of the work undertaken by the ChatGPT Taskforce’ (2024) https://www.edpb.europa.eu/our-work-tools/our-documents/other/report-work-undertaken-chatgpt-taskforce_en
253
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
71
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Maintain robust documentation and proof of these measures to comply with accountability requirements under
Articles 5(2) and 24 GDPR.252
review and document all steps taken to comply with the GDPR principles of transparency and accuracy under Articles
5(1)(a) and 5(1)(d).255
For third-party platforms, effective configuration and use of available tools are essential to enhance input handling
and ensure outputs meet accuracy and fairness standards.
Regular audits and oversight mechanisms are critical to addressing risks like data leakage, bias, or unintended
inferences.
LLMs could also be fine-tuned to handle diverse linguistic and contextual variations, reducing inaccuracies in sensitive
applications.
To mitigate the risk of adverse impacts on data subjects and fundamental rights in the context of LLMs, accuracy256
and reliability must be prioritized throughout the system lifecycle.
Ensure that training datasets are diverse and representative of different demographic groups to reduce biases
inherent in the data.
252
Idem
254
Information Commissioner Officer (ICO) ‘Generative AI third call for evidence: accuracy of training data and model outputs’ (2025) https://ico.org.uk/about-the-ico/what-we-do/our-work-on-artificial-intelligence/generative-
ai-third-call-for-evidence/
255
Idem
256
AI Model Code, ‘Evaluating language models for accuracy and bias’ (2024) https://aimodelcode.org/tech-info/llm-eval/
257
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
72
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Conduct regular audits and fairness tests and incorporate human review in sensitive decisions to ensure fairness and
accountability.
Use explainability frameworks to analyze and understand how decisions are made, what helps in identifying potential
sources of bias.
6. Not providing human intervention for Human oversight should be integrated into decision-making processes where the outputs of LLMs could lead to legal
a processing that can have a legal or or significant consequences for individuals258. This includes ensuring that automated decisions are subject to review
important effect on the data subject. by qualified personnel who can assess the fairness, accuracy, and relevance of the outputs.
Clear escalation procedures should be in place for cases where automated outputs appear ambiguous, erroneous, or
potentially harmful.
Developers and deployers must design systems to flag high-risk outputs for mandatory human intervention before
any action is taken259.
Transparency mechanisms should also be implemented260, ensuring data subjects are informed about the use of LLMs,
the capabilities and limitations of the model261, the processing of personal data through the model and their right to
contest decisions or seek human review.
Regular training for staff involved in oversight can further enhance compliance and accountability.
Implement Article 29 Working Party (“WP29”) Guidelines on Automated individual decision-making and Profiling for
the purposes of Regulation 2016/679, as last revised and adopted on 6 February 2018, endorsed by the EDPB on 25
May 2018. See also, CJEU judgment of 7 December 2023, Case C-634/21, SCHUFA Holding and Others (ECLI:EU:C:2023:957).
258
Lumenova, ‘The Strategic Necessity of Human Oversight in AI Systems’ (2024) https://www.lumenova.ai/blog/strategic-necessity-human-oversight-ai-systems/
259
Kuriakose, A.A., ’ The Role of Human Oversight in LLMOps’ (2024) https://www.algomox.com/resources/blog/what_is_the_role_of_human_oversight_in_llmops/
260 Garante per la Protezione dei Dati Personali (GDPD), ‘ChatGPT, il Garante privacy chiude l’istruttoria. OpenAI dovrà realizzare una campagna informativa di sei mesi e pagare una sanzione di 15 milioni di euro’ (2024)
https://www.garanteprivacy.it/home/docweb/-/docweb-display/docweb/10085432?mkt_tok=MTM4LUVaTS0wNDIAAAGX5pUM0HSpbBgVFc2wv7uGKk23174FM2-
cFJBvVD0FDGJCM_27RuQFPm2uSB80ihorQ2e0YWwgCPRFngJDRE4b7N_pWRz873q84sJ8ZWucdQOh#english
261
EDPB, ‘Report of the work undertaken by the ChatGPT Taskforce’ (2024) https://www.edpb.europa.eu/our-work-tools/our-documents/other/report-work-undertaken-chatgpt-taskforce_en
73
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
7. Not granting data subjects their right The right to object under Article 21 GDPR applies and should be ensured when the legal basis is legitimate interest262.
to object, rectification, and erasure. In such a case, providers should implement mechanisms to grant this right. Some measures to implement when
collecting personal data could be263:
o Introduce a reasonable period between the collection of a training dataset and its use, allowing data
subjects time to exercise their rights.
o Provide an unconditional opt-out mechanism for data subjects before processing begins.
o Permit data subjects to request data erasure, even beyond the specific grounds listed in Article 17(1) GDPR.
o Claim Handling: Enable data subjects to report instances of personal data regurgitation or memorization,
with mechanisms for controllers to assess and apply unlearning techniques to resolve such claims.
Mitigating non-compliance with GDPR concerning data subjects' rights to rectification and erasure involves exploring
machine unlearning techniques264. These approaches aim to remove the influence of data from a trained model upon
request, addressing concerns about data use, low-quality inputs, or outdated information.
269
o Exact unlearning seeks to entirely eliminate the influence of specific data points, often through retraining
or advanced methods that avoid full retraining. Techniques like Sharded, Isolated, Sliced, and Aggregated
(SISA) training divide data into subsets, simplifying data removal while striving to maintain model
robustness. Approximate unlearning attempts to reduce the impact of specific data points by adjusting
model weights or applying correction factors, offering a trade-off between precision and efficiency.
While these methods hold promise, challenges remain, including maintaining model accuracy and avoiding unintended
biases post-unlearning. Certified removal, which provides verifiable guarantees of data removal using mathematical proofs,
offers a rigorous but resource-intensive solution. As unlearning techniques evolve, they play a crucial role in enabling
compliance with GDPR while preserving the integrity and fairness of machine learning models.265
Implement mechanisms to delete personal data, such as names, ensuring their removal(block)266 is comprehensive
and context-agnostic across the dataset. Recognize that this approach might result in the deletion of the name for all
individuals with the same identifier, regardless of the context. To mitigate unintended consequences, use precise
262
Note that according to Art. 21(1) GDPR, “The controller shall no longer process the personal data unless the controller demonstrates compelling legitimate grounds for the processing which override the interests, rights and
freedoms of the data subject or for the establishment, exercise or defence of legal claims.”
263
EDPB, ‘Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models, Adopted on 17 December 2024, (2024) https://www.edpb.europa.eu/our-work-tools/our-
documents/opinion-board-art-64/opinion-282024-certain-data-protection-aspects_en
264
Shrishak, K., ‘AI-Complex Algorithms and effective Data Protection Supervision Effective implementation of data subjects’ rights’ Support Pool of Experts Programme EDPB (2024)
https://www.edpb.europa.eu/system/files/2025-01/d2-ai-effective-implementation-of-data-subjects-rights_en.pdf
265 EDPS, ‘TechSonar 2025 Report’ (2025) https://www.edps.europa.eu/data-protection/our-work/publications/reports/2024-11-15-techsonar-report-2025_en
266 Surve, D., ‘Beginner’s Guide to LLMs: Build a Content Moderation Filter and Learn Advanced Prompting with Free Groq API’ (2024) https://deveshsurve.medium.com/beginners-guide-to-llms-build-a-content-moderation-
filter-and-learn-advanced-prompting-with-free-87f3bad7c0af
269
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
74
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
filtering techniques to differentiate between contexts where the name is personally identifiable and generic. To
prevent misuse or reintroduction of deleted data, secure filter scripts or prompts by restricting access to authorized
personnel only, employing encryption, and maintaining version control. Regularly audit these scripts to ensure they
are up to date and free from vulnerabilities.
It is also important to in particular with regard to Article 21 GDPR, to establish mechanisms to comply with the
requests of users that object to the processing of their personal data based on legitimate interest.267
For deletion requests under Art. 17 GDPR, assess whether personal data can be directly identified or derived from the
AI model and implement technical deletion where feasible, such as post-training adjustments.268
8. Unlawful repurpose of personal data Ensure compliance with Article 5(1)(c) GDPR by clearly limiting personal data processing to what is necessary for
specific, well-defined purposes. Avoid overly broad purposes like "developing and improving an AI system." Instead,
specify the type of AI system (e.g., large language model, generative AI for images) and its technically feasible
functionalities and capabilities.270
Article 6(4) GDPR provides, for certain legal bases, criteria that a controller shall take into account to ascertain whether
processing for another purpose is compatible with the purpose for which personal data are initially collected.271
When outsourcing AI training, verify legal guarantees (e.g., contracts, third-country transfer measures) and ensure
training data is not used by service providers for unauthorized purposes.
9. Unlawful unlimited storage of As user, deployer and procurement entity make agreements with the third-party provider about how long the input data
personal data and output data should be stored. This can be part of the service contract, product documentation or data processing
agreement.
If data are being stored on your premises, establish retention rules and /or a mechanism for the deletion of data.
10. Unlawful transfer of personal data As user, deployer and procurement entity, verify with the provider where the data processing is taking place.
Make the necessary safeguard diligences and when necessary, perform a Data Transfer Impact Assessment.
Make the necessary contractual agreements.
Consider this risk when making a selection among different vendors.
267
EDPB, ‘Report of the work undertaken by the ChatGPT Taskforce’ (2024) https://www.edpb.europa.eu/our-work-tools/our-documents/other/report-work-undertaken-chatgpt-taskforce_en
268
Bavarian State Office for Data Protection Supervision, ‘Data protection compliant Artificial intelligence Checklist with test criteria according to GDPR’
270
CNIL, ‘Artificial Intelligence (AI)’ (2025) https://www.cnil.fr/en/topics/artificial-intelligence-ai
271
EDPB, ‘Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models, Adopted on 17 December 2024, (2024) https://www.edpb.europa.eu/our-work-tools/our-
documents/opinion-board-art-64/opinion-282024-certain-data-protection-aspects_en
75
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
11. Breach of the data minimization272 Regularly review and eliminate unnecessary data collection, automating data deletion when no longer needed.
principle Replace identifiable data with anonymized or pseudonymized alternatives immediately after collection.
Apply Privacy by Design principles at every development stage, integrating data minimization measures.
Exclude data collection from websites that object to web scraping (e.g., using robots.txt or ai.txt files).
Limit collection to freely accessible data manifestly made public by the data subjects.
Prevent combining data based on individual identifiers unless explicitly required and justified for AI system
development.273 274
Educate users about providing only essential data in inputs and transparently communicate data use practices.
Evaluate whether processing personal data is strictly necessary for the intended purpose by exploring less intrusive
alternatives, such as the use of synthetic or anonymized data, and ensuring the volume of personal data processed is
proportionate to the objective.
272
Processing personal data to address potential biases and errors is permissible only when it is explicitly aligned with the stated purpose, and the use of such data is necessary because the objective cannot be effectively
achieved using synthetic or anonymized data. Article 10(5) AI Act provides for specific rules for the processing of special categories of personal data in relation to the high-risk AI systems for the purpose of ensuring bias
detection and correction.
273
CNIL, ‘The legal basis of legitimate interests: Focus sheet’ (2024) https://www.cnil.fr/en/legal-basis-legitimate-interests-focus-sheet-measures-implement-case-data-collection-web-scraping
274
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
76
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Once residual risks are identified, organizations must decide whether these risks fall within acceptable
levels as defined by their risk tolerance277 and acceptance criteria. If residual risks are deemed
acceptable, they can be formally acknowledged and documented in the risk register. However, if the
risks exceed acceptable levels, further mitigation measures must be explored and implemented as well
as documented. The process then returns to the risk treatment phase to identify the most appropriate
treatment option for the risk.
Residual risk evaluation also plays a role in the decision to release a system into production. It is
therefore important to assess whether risks remain within defined safety thresholds. Organizations may
decide then to request further testing or additional evaluations, mandate further mitigations, or
approve the model for deployment if the residual risk is acceptable.
275
NIST, ‘Definition of Residual Risk’ (2025) https://csrc.nist.gov/glossary/term/residual_risk
276
See footnote 193
277
ISO 31000:2018 Risk Management
77
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Regular reviews also help refine risk strategies, improve processes, and adapt to changes in legislation,
business operations, or team structures.
Continuous Monitoring
Once risk mitigation measures281 have been implemented, ongoing monitoring is essential to assess
their effectiveness and identify any emerging risks. After deployment, post-market monitoring 282 plays
a critical role in identifying new risks or changes in the operational environment that may impact
privacy. This involves the systematic collection and analysis of logs and other operational data in
compliance with GDPR requirements, ensuring transparency, accountability, and the ongoing protection
of user data.
Currently, LLMs monitoring throughout the lifecycle relies primarily on the following
techniques:283model testing and evaluation, red teaming, field testing, and long-term impact
278
FITT Team, ‘How Oftern Should You Review Your Risk Management Plan’ (2023) https://www.tradeready.ca/explainer/how-often-should-
you-review-your-risk-management-plan/
279
Vn Vroonhoven, J., ‘Risk Management Plans and the new ISO 14971’ BSI, (2020)
https://compliancenavigator.bsigroup.com/en/medicaldeviceblog/risk-management-plans-and-the-new-iso-14971/
280
Wikipedia, ‘Risk Register’ (2025) https://en.wikipedia.org/wiki/Risk_register
281 Art.9 AI Act.
282 Chapter IX, Section 1 Post-market Monitoring, AI Act
283 AI Action Summit, ‘International AI Safety Report on the Safety of Advanced AI’ , p - 184, (2025)
https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf
78
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
assessment. These methods help identify and evaluate emerging risks that may not have been apparent
during initial development.
Model testing and iterative evaluations are used before and after the model is deployed and is part
of an LLM system. While essential, they are insufficient on their own due to the unpredictability of
real-world scenarios and the subjectivity of certain risks. Since LLMs can be applied in numerous
contexts, it is difficult to predict how risks will manifest in practice, and as mentioned in section 2,
performance metrics and benchmarks may not always accurately reflect those real risks.
Methodologies such as red teaming284 can be used to stress-test the model before deployment and
the LLM system before and after it is in production by simulating adversarial attacks or misuse
scenarios285, helping to uncover vulnerabilities that might not have been identified during the
development phase.
Field testing evaluates AI risks in real-world conditions, but its implementation remains challenging
due to the difficulty of accurately replicating real-world scenarios and establishing clear success
metrics. It is important to create a representative test environment and define measurable
performance benchmarks to obtain reliable insights.
Long-term impact assessments evaluate how AI systems evolve over time, aiming to identify
unintended consequences that may emerge with prolonged deployment. Continuous monitoring
and periodic reassessments are essential to detect shifts in model behavior, performance
degradation, or emerging risks that may not have been apparent during initial testing. This
technique is part of a continuous monitoring strategy and can also be part of threat modeling
sessions.
Across all these techniques, defining robust and reliable monitoring metrics is essential. However,
current automated assessments and quantitative metrics often lack reliability and validity, 286 making it
difficult to assess risks effectively. For this reason, qualitative human review also plays a crucial role in
capturing the broader sociotechnical implications of LLMs and their associated risks.
284
Open AI, ‘Advancing red teaming with people and AI’ (2024) https://openai.com/index/advancing-red-teaming-with-people-and-ai/
285
Google Threat Intelligence Group, ‘Adversarial Misuse of Generative AI’, (2025) https://cloud.google.com/blog/topics/threat-
intelligence/adversarial-misuse-generative-ai
286
Koh Ly Wey, T., ‘Current LLM evaluations do not sufficiently measure all we need’ (2025) https://aisingapore.org/ai-governance/current-
llm-evaluations-do-not-sufficiently-measure-all-we-need/
79
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Risks
Figure 17. Evaluation techniques are used to assess the probability of identified risks
287
Databricks, ‘LLMOps’ (2025) https://www.databricks.com/glossary/llmops
288
Ghosh, B., ‘LLMSecOps Elevating Security Beyond MLSecOps’ (2023) https://medium.com/@bijit211987/llmsecops-elevating-security-
beyond-mlsecops-94396768ecc6
289 All Tech is Human x IBM Research, ‘AI Governance Workshop’ (2025)
https://static1.squarespace.com/static/60355084905d134a93c099a8/t/677c492a161e58148fc60706/1736198443181/IBM+Research+x+ATI
H+AI+Governance+Workshop.pdf
80
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
81
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
82
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Scenario: A company specialized in kitchen equipment wants to deploy a chatbot to provide general
information about its products and services to its customers. The chatbot will have access to pre-existing
customer data through integration with the customer management system (e.g., CRM databases). This
will allow the chatbot to recognize users based on identifiers like email or account credentials and
provide personalized responses without requiring users to re-enter their data. This chatbot interface
will be built using as foundation an ‘off-the-shelf’ LLM that will use RAG to acquire the domain specific
knowledge required.
Lifecycle phase we are now: Design & Development
83
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
6. Personalized response generation → The chatbot will use stored user data from the CRM system and
the fine-tuned LLM capabilities to generate tailored recommendations and responses.
7. Data sharing → The chatbot may share minimal, (anonymized) user data with external services (e.g.,
third-party APIs for additional functionality or promotional tools).
8. Feedback collection → Users provide feedback on chatbot interactions (e.g., thumbs-up/down,
comments) to improve the system’s performance. This is process by the system for analytics purposes.
9. Deletion and user rights management → Users can request access to, deletion of, or updates to their
personal data in compliance with GDPR or similar regulations.
To facilitate the risk assessment process, it is also possible to create a data flow diagram290, providing a
graphical representation of the processes, data movements, and interactions within the system.
Possible Architecture
Considering that we are at the design phase of the AI lifecycle, we anticipate that the architecture of
our LLM-based system will include the following key layers:291
User Chatbot
Business Integration CRM Extermal Security
Interface Application LLM Layer
Logic Layer System Services Layer
(UI) Layer Layer
User Interface (UI) Layer: The interface where users interact with the chatbot through text or voice
input. (e.g.; Webpage, mobile app)
Chatbot Application Layer: Manage the flow of conversation and determines chatbot responses
based on user input and context. Directs queries to the Business Logic Layer.
Business Logic Layer: Orchestrates chatbot workflows, such as checking customer profiles or placing
orders. Crucially, it decides whether to call the LLM directly or trigger a retrieval step (RAG) — for
example, by querying the CRM or knowledge base when additional context is needed before
generating a response.
Integration Layer: Contains the API Gateway to manage the transmission between layers. Connects
the chatbot to the LLM, the CRM system and external services and facilitates secure communication
and data exchange between systems. It also handles data transformation, ensures compatibility
between the chatbot and the CRM, and implements authentication and authorization for secure
access to CRM data. For the RAG setup, this layer may also route queries to a retrieval component
or knowledge base before passing enriched inputs to the LLM.
LLM Layer: Performs natural language understanding and generation. Receives either raw user
input or input enriched with retrieved content (from the RAG step). Returns contextually relevant
responses to the Business Logic Layer.
CRM System: Stores customer data, such as contact information, purchase history, preferences, and
support tickets. It also contains CRM APIs that provide endpoints to retrieve, update, or add
customer data and event handlers that trigger actions based on events, such as creating a support
ticket when a customer raises an issue through the chatbot. Supplies customer data to personalize
chatbot responses and stores data generated during interactions.
290
Wikipedia, ‘Data-flow diagram’ (2025) https://en.wikipedia.org/wiki/Data-flow_diagram
291
This architecture is provided as a simplified example and may vary significantly depending on the specific requirements, use case, and
technical constraints of each deployment.
84
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
External Services Layer: It integrates with analytics tools to track user interactions and generate
insights into customer behavior. It also integrates with other services, such as payment gateways,
email services, or marketing tools.
Security Layer: It encrypts data during transmission using protocols like HTTPS and SSL/TLS, restrict
unauthorized access to the chatbot, the LLM and CRM using techniques like OAuth2, implements
security and privacy controls, vulnerability scans, threat monitoring, etc.
Having an overview of the possible architecture at this stage provides a clearer understanding of the
data flows and potential risks associated with deploying the chatbot. This architectural insight sets the
groundwork for identifying privacy and security concerns early in the process.
Risk Analysis – Stakeholder Collaboration
The next step involves gathering a diverse group of stakeholders to collaboratively identify potential
risks. Inviting the right stakeholders is not an exact science, but it is critical to include individuals who
will have decision-making authority, direct involvement in its development, deployment and use, and
could add value to the risk identification process. Key participants could include representatives from
engineering, security, privacy, and UX design teams. If possible, it is highly beneficial to involve
individuals with expertise in ethics and fundamental rights, as well as members from civil society groups,
deployers and end-users’ representatives (customers in our use case). Collecting input from a broader
audience through a client survey can also provide valuable insights into user expectations and concerns.
Stakeholder Analysis
Before starting with the risk identification process, the group should analyze the use case to determine
which stakeholders target group will interact with the chatbot and identify those who should not have
access. Designing barriers where necessary, such as an age verification mechanism, ensures the system
aligns with the intended user base. In this specific use case, the entry point for the interface is restricted
to logged-in and recognized customers, making additional barriers possibly unnecessary. However, a
comprehensive evaluation of all potential risks remains crucial to the system's success.
Stakeholder analysis292 is a process used to identify and understand the roles, interests, and influence
of various stakeholders involved in or affected by a project. Beyond analyzing those directly engaged
with the system, it is equally important to assess which stakeholders could be negatively impacted by
the tool. This includes recognizing if vulnerable groups might be involved or if the tool's impact could
extend to a large number of individuals. Where relevant, it may be valuable to engage affected
communities in subsequent phases of risk identification to better capture context-specific concerns and
impacts. Participatory engagement tools293 like the ethical matrix, mentioned in a previous section, can
help evaluate the potential consequences for different stakeholder groups.
In our use case, we have identified our customers as the only authorized users. Given the nature of our
business, we do not anticipate children accessing our platform. However, we remain mindful of
implementing appropriate security measures to ensure that access is restricted, and unauthorized use
is prevented.
292
Rodgers, A.,’ What is a Stakeholder Impact Analysis?’, Simply Stakeholders (2024) https://simplystakeholders.com/stakeholder-impact-
analysis/,
293
Park, T., Stakeholder Engagement for Responsible AI: Introducing PAI’s Guidelines for Participatory and Inclusive AI’, Partnerships on AI’
(2024) https://partnershiponai.org/stakeholder-engagement-for-responsible-ai-introducing-pais-guidelines-for-participatory-and-inclusive-
ai/
85
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
We have identified several risk factors that require attention, as they indicate a higher probability of
undesirable outcomes. While our system does not fall under the classification of a high-risk system
under the AI Act, there is, from the GDPR perspective, sufficient evidence to justify initiating294 the
process for creating a Data Protection Impact Assessment (DPIA). This risk assessment we are
performing now will serve as a valuable foundation for the DPIA process.
It is important to emphasize when a DPIA is necessary and when a Fundamental Rights Impact
Assessment (FRIA) is required. A DPIA, under Article 35 of the GDPR, is required whenever a data
processing is likely to result in a high risk to the rights and freedoms of natural persons.295
Even when a DPIA is not explicitly required by law, conducting one can be prudent for best practices in
privacy and security. It allows organizations to preemptively address potential risks, assess the impact
of their solutions, and demonstrate accountability. In contrast, a FRIA, as outlined in Article 27 296of the
AI Act, can be mandatory for some deployers of high-risk AI systems (bodies governed by public law,
private entities providing public services, organisations doing creditworthiness evaluations, pricing and
life and health insurance risk assessments. A FRIA evaluates the potential impact of such systems on
fundamental rights like privacy, fairness, and non-discrimination. Deployers of high-risk AI systems must
document:
How the system will be used, including its purpose, duration, and frequency.
The categories of individuals or groups affected by the system.
Specific risks of harm to fundamental rights.
Measures for human oversight and governance.
294 EDPB, ‘Data Protection Guide for Small Business’ (2025) https://www.edpb.europa.eu/sme-data-protection-guide/be-compliant_en
295 Article 29 Data Protection Working Party, ‘Guidelines on Data Protection Impact Assessment (DPIA) and determining whether processing is
"likely to result in a high risk" for the purposes of Regulation 2016/679, WP248 rev.01, endorsed by the EDPB’ (2017)
https://ec.europa.eu/newsroom/article29/items/611236
296
Article 27(1) AI Act: “Prior to deploying a high-risk AI system referred to in Article 6(2), with the exception of high-risk AI systems intended
to be used in the area listed in point 2 of Annex III, deployers that are bodies governed by public law, or are private entities providing public
services, and deployers of high-risk AI systems referred to in points 5 (b) and (c) of Annex III, shall perform an assessment of the impact on
fundamental rights that the use of such system may produce….”
86
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
For this use case, we will integrate these questions into the risk management process as follows:
297
AEPD, ‘Technical Note: An Introduction to LIINE4DU 1.0: A New Privacy & Data Protection Threat Modeling Framework’ (2024)
https://www.aepd.es/guides/technical-note-introduction-to-liine4du-1-0.pdf
298 Slattery P., et al., ‘The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence’
(2024) https://arxiv.org/abs/2408.12622
299
LINDDUN, ‘Privacy Threat Modelling’ (2025) https://linddun.org/
300
AEPD, ‘Technical Note: An Introduction to LIINE4DU 1.0: A New Privacy & Data Protection Threat Modeling Framework’ (2024)
https://www.aepd.es/guides/technical-note-introduction-to-liine4du-1-0.pdf
301
PLOT4AI, ‘Practical Library of Threats 4 Artificial Intelligence’ (2025) https://plot4.ai/
302
Shostack, A., ‘The Four Question Framework for Threat Modeling’ (2024)
https://shostack.org/files/papers/The_Four_Question_Framework.pdf
87
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
From the potential privacy risks outlined in Section 3 for systems based on an LLM ‘off-the-shelf-model,’
we have reviewed all risks across the standard data flow phases and identified that most of these risks
are covered under Risk 1 (Insufficient protection of personal data leading to a data breach) and Risk 3
(Possible adverse impact on data subjects that could negatively impact fundamental rights) from Section
4.
User Input Sensitive data disclosure, unauthorized access, lack of transparency, adversarial attacks
Provider Interface & API Data interception, API misuse, interface vulnerabilities
LLM Processing at Model inference risks, unintended data logging, anonymization failures, unauthorized access
Providers’ Infrastructure to logs, data aggregation risks, third-party exposure, inadequate data retention policies
For example, in the case of Risk 2 (Misclassification of training data as anonymous), we can
already perform tests to detect the presence of personal data in our datasets. These results
would help us assess the probability of the risk occurring given the current dataset conditions.
At this stage of the AI lifecycle (pre-deployment phase), the available evaluations are limited.
However, when risk assessments take place post-development, additional evaluations can be
conducted, providing further quantitative criteria to refine risk assessment and decision-
making.
Probability
We are going to assess the probability of identified risks, categorizing them into one of the four
levels in the probability matrix: Very High, High, Low, or Unlikely. This categorization should be
done by directly assigning a level to each risk based on quantitative and/or qualitative criteria
and through collaborative decision-making with stakeholders. Alternatively, we can also
employ a list of predefined criteria to guide our assessment.
For a more quantitative approach calculating probability, aggregation methods can be applied
to calculate its level. In this use case, we will use the FRASP framework to structure and refine
our probability assessment process.
88
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Severity
Next, we will assess the potential privacy impact and severity of these risks on data subjects, individuals,
and society. Based on this severity assessment, we will assign one of the four levels from the severity
classification matrix: Very Significant, Significant, Limited, or Very Limited.
The calculation of severity will follow the same steps as those used for determining probability.
However, for severity, the highest level obtained among criteria 1 to 5, as well as 7 and 8, will set the
total severity score.
Once the aggregate score is calculated, we will map it to one of the predefined severity levels based on
the following ranges:
In this case the final score is determined by the highest score in criteria 1, 5 and 7 giving as result Level
3 severity for all the risks.
89
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
By applying the classification matrix to the obtained probability and severity scores, we can determine
the corresponding risk classification level. In our case for all the risks the combination Low Probability +
Significant Severity offers a result of High Risk.
Although high level risks always require treatment, it is considered best practice to assess whether
classified risks need treatment evaluating predefined acceptance criteria and acceptable metric
thresholds established by the organization. These criteria can be adjusted per use case and tailored to
pre-deployment and post-deployment phases to ensure a context-aware risk management approach.
In our specific use case, the organization's risk acceptance criteria are as follows:
A risk that can result in a violation of data protection regulations is not acceptable.
A risk of unauthorized access, exposure, or retention of persona data beyond what is strictly
necessary is not acceptable.
Re-identification risk must remain below 1%, verified through privacy-preserving evaluations
and testing.
Membership inference and model inversion attack risks must remain below a 1% success rate
as verified through internal testing and, for sensitive data, independent external audits.
Inaccurate datasets are only acceptable if the error rate does not exceed 5% and all available
data validation and cleaning processes have been applied.
90
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
The chatbot must clearly inform users when their data is being used and provide access to data
usage policies. Transparency risks are not acceptable.
No risk is acceptable if it prevents users from exercising their data rights, unless explicitly
justified under legal exceptions.
User input is validated and Unsecured APIs could Use robust API security
formatted before being sent allow attackers to measures, including access
to the chatbot’s API for intercept or manipulate controls, authentication,
processing. The chatbot user data. and rate limiting.
interacts with a fine-tuned Malicious inputs (e.g., Sanitize user input to
Data
off-the-shelf LLM hosted on injections) could exploit prevent injection attacks.
preprocessing
the cloud and connects to
and API system vulnerabilities. Minimize API logging or
the CRM system both to
interaction Logs might inadvertently ensure logs are anonymized
retrieve or update user
information and to fetch store sensitive user data. and protected by access
relevant content that is Retrieved content may controls.
passed to the LLM as contain sensitive or Restrict retrieval sources to
context (RAG). outdated information, approved, privacy-screened
91
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
92
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
93
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
With the four levels matrix that we use in this example, a Low Probability and Limited Severity result in
a Medium Risk level because, while unlikely, the consequences of a risk, though mitigated, are still non-
negligible. That means the remaining risk after mitigation measures might still be above an acceptable
threshold for your organization.
What can we do to address Residual Risk in this case? Some options that organizations can apply are:
Reduce Probability by strengthening preventive controls (e.g., access measures, anomaly detection)
and enhancing event prevention mechanisms.
Implement extra mitigations measures to reduce severity.
Implement robust monitoring and establish a clear incident response plan to minimize impact if the
risk materializes.
Explore additional mitigations: for instance, use advanced technologies (e.g., differential privacy) or
fail-safe mechanisms to further mitigate risks.
Reevaluate whether the residual risk is within organizational risk tolerance and document
justification for maintaining it.
Discuss options to share or transfer the risk (e.g., insurance, vendor agreements).
94
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Second Use Case: LLM System for Monitoring and Supporting Student Progress
Scenario: A school wants to adopt a third party LLM system to monitor and evaluate students' academic
performance and provide tailored recommendations for improvement. The tool is an LLM-based system
developed with and LLM ‘off-the shelf’ model. This tool would analyze a combination of data, including
test scores, assignment completion rates, attendance records, and teacher feedback, to identify areas
where students may need additional support or resources. For example, if a student struggles with
math, the tool could recommend targeted practice exercises, suggest online tutoring sessions, or notify
parents and teachers about specific challenges. The goal is to create a personalized learning plan that
helps each student achieve their full potential.
This system would deal with sensitive information about minors, including their academic records and
behavioral patterns, which introduces significant privacy and ethical risks.
95
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
If the vendor does not comply with Conduct regular API security reviews and
data protection regulations, it validation.
increases the risk of a data breach. Enforce strict role-based access control (RBAC)
The tool might interact with third- policies.
party services or platforms (e.g., online Implement multi-factor authentication (MFA)
tutoring systems, analytics services, or for all users accessing sensitive data.
cloud-based storage) for functionality, Regularly review and update user access
exposing student data to external permissions.
entities. Conduct vendor due diligence, including Data
Protection Impact Assessments (DPIAs) and
security certifications.
Include specific data protection clauses in
contracts with vendors, ensuring
accountability for compliance.
Require vendors to provide evidence of GDPR-
compliant practices.
Establish robust data-sharing agreements with
third-party platforms, ensuring compliance
with GDPR requirements.
Limit data shared with third parties to
anonymized or pseudonymized datasets.
Monitor third-party systems for adherence to
agreed data protection measures.
2.Misclassifying training Adversaries might exploit the LLM to infer Use differential privacy techniques to minimize
data as anonymous by whether specific student data was used in the risk of data inference.
controllers when it contains training, indicating a misclassification of Conduct structured testing against
identifiable information training data.
membership inference and attribute inference
attacks.
Validate that the LLM provider has
implemented safeguards to prevent such
attacks.
3. Unlawful processing of If personal data (e.g., academic Verify that the LLM provider’s training datasets
personal data in training records) is unlawfully processed in exclude sensitive personal data without proper
sets training datasets by the LLM provider safeguards.
Behavioral and academic data require Require documentation from vendors proving
explicit consent or another valid legal that training data was lawfully collected and
basis to be processed lawfully. processed.
Use models trained on synthetic or
anonymized data when possible.
4. Unlawful processing of If health-related or behavioral data about Ensure explicit consent is obtained from
special categories of children, such as indications of mental parents or guardians before processing
personal data and data health conditions, is processed—such as children’s data.
relating to criminal when identifying special assistance needs
Conduct a DPIA and identify lawful grounds for
convictions and offences in for conditions like dyslexia, ADHD, or
training data. similar. processing.
Provide clear, accessible information to
parents about how data is processed.
Implement stricter safeguards for sensitive
data, including encryption and access controls.
Limit processing to data strictly necessary for
the intended purpose.
96
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
7. Not granting data Failure to obtain proper parental or Create a user-friendly interface explaining how
subjects their rights guardian consent violates GDPR's data is collected, processed, and stored.
requirements for minors. Offer accessible resources to help students
Students and parents may not fully
and parents exercise their GDPR rights,
understand how their data is processed,
limiting their ability to exercise their rights. including erasure, rectification, and access.
8. Unlawful repurpose of It the academic and behavioral data from Ensure data usage aligns with the original
personal data children is used without a compatible purpose of collection and assess any
purpose to the original one. repurposing against GDPR principles.
Document purpose compatibility assessments
for accountability.
303
Models cards and system cards are example of information that can be provided to deployers:
Green, N et al., System Cards, a new resource for understanding how AI systems work (2022) https://ai.meta.com/blog/system-cards-a-new-
resource-for-understanding-how-ai-systems-work/; Hugging Face, ‘Model Cards’ (2024) https://huggingface.co/docs/hub/en/model-cards
97
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
9. Unlawful unlimited Data might be retained longer than Define clear data retention policies and
storage of personal data necessary for its intended purpose, automatically delete data once is no longer
particularly behavioral or health-related needed.
data.
Regularly audit stored data for compliance
with retention limits.
10. Unlawful transfer of If the tool relies on cloud services or Verify that cloud service providers comply with
personal data external platforms hosted in jurisdictions GDPR’s data transfer rules, including adequacy
without adequate data protection decisions or Standard Contractual Clauses
standards.
(SCCs).
Perform Data Transfer Impact Assessments
(DTIAs) when required.
11. Breach of the data Excessive data collection or processing Apply strict data collection filters to gather
minimization principle beyond what is necessary infringe the data only the data necessary for the tool’s purpose.
minimization principle. Anonymize or pseudonymize data where
possible to minimize risk.
It is important to note that the list of risks and mitigations provided is based on generic information and
assumptions. In a real-world scenario, a detailed risk assessment tailored to the specific
implementation, context, and operational environment of the LLM based tool would be necessary. This
includes collaboration with stakeholders, such as the LLM system provider, school administrators,
teachers, parents, and students, to identify unique risks and address them effectively.
98
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Scenario: A personal assistant AI agent is designed to help users manage their travel plans and daily
agendas. The agent can book flights, reserve hotels, schedule meetings, and send reminders based on
user-provided inputs and preferences. For instance, a user might ask the agent to "book a round trip to
Madrid next week and find a hotel near the Prado Museum." To fulfill this request, the agent accesses
the user’s calendar, retrieves personal preferences (e.g., preferred airlines or hotel chains), and
interacts with third-party booking platforms. This system is developed with various ‘off-the shelf’ LLMs
and SLMs.
Lifecycle phase we are now: Operations and Monitoring
2. Misclassifying training data as Not applicable in this use case as the focus (Not directly applicable in this case, as the
anonymous by controllers when is on operational data rather than training system uses pre-trained models, but
it contains identifiable data. (This use case is based on the applicable to providers.)
information Lifecycle Phase: Operations and
Monitoring)
Chen, G et al., ‘Encryption–decryption-based consensus control for multi-agent systems: Handling actuator faults.’, Automatica, Volume
304
99
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
3. Unlawful processing of Not applicable in this use case, as the (Not directly applicable, as no training occurs
personal data in training sets system is already in operation and relies in the operational phase.)
on pre-trained LLMs and SLMs.
4. Unlawful processing of Behavioral and personal preferences data Implement explicit consent mechanisms for
special categories of personal (e.g., specific health conditions inferred processing sensitive data like health-related
data and data relating to from travel patterns) may fall into special information inferred from user interactions
criminal convictions and categories, requiring explicit consent or
(e.g., calendar or travel preferences).
offences in training data. valid legal basis.
Interactions with calendars or other Validate that sensitive data collected (if any)
sensitive tools may inadvertently process is necessary for the intended purpose.
health-related or sensitive data. Use privacy-preserving techniques for
sensitive data handling.
5. Possible adverse impact on Manipulation or overreliance on Monitor agent outputs for manipulative or
data subjects that could suggestions, where the agent biased behavior (e.g., unfair pricing or
negatively impact fundamental prioritizes third-party interests over recommendations).
rights
user preferences. Evaluate recommendations for fairness and
Profiling and unfair treatment, such ensure they do not disproportionately
as price discrimination305 or biased impact vulnerable groups.
recommendations. Implement behavioral consistency checks to
The agent might rely on outdated or identify and address erratic or unfair
inaccurate information from decision-making.
external sources, leading to errors in
bookings or scheduling, which could
inconvenience users.
Users may not fully understand how
the system operates, including how
decisions are made or how their data
is processed and shared.
6. Not providing human Lack of human oversight in automated Require user confirmation for critical
intervention for a processing decisions, such as booking flights or decisions, such as booking or payment.
that can have a legal or scheduling, could lead to significant user Implement fallback mechanisms where
important effect on the data inconvenience or adverse impacts.
human oversight is necessary for high-stakes
subject.
scenarios.
Train users on how to interpret AI outputs
and intervene if necessary.
7. Not granting data subjects Users may struggle to exercise GDPR Provide clear interfaces for users to access,
their rights rights (e.g., access, rectification, deletion) rectify, or delete their data.
due to complex vendor dependencies or Maintain detailed, accessible logs of actions
inadequate user interfaces.
for audit purposes and compliance with
Article 15 GDPR.
305
Zainea, A.A, ‘Automated Decision-Making in Online Platforms: Protection Against Discrimination and Manipulation of Behaviour’ (2024)
https://www.autoriteitpersoonsgegevens.nl/documenten/scriptie-aylin-alexa-zainea-automated-decision-making-in-online-platforms-
protection-against-discrimination-and-manipulation-of-behavior
100
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
8. Unlawful repurpose of Data repurposing risks may arise if travel, Restrict data use to the specific purposes
personal data calendar, or preference data is used for outlined in the terms of service.
purposes other than intended, such as Ensure vendor agreements explicitly prohibit
targeted advertising.
the repurposing of collected data.
Implement consent management systems to
track user preferences and restrict secondary
use.
9. Unlawful unlimited storage of If data retention expands beyond Define clear retention periods for different
personal data necessity, particularly for travel data types (e.g., calendar data, travel
itineraries, calendars, or personal history).
preferences.
Automate data deletion processes once the
data is no longer necessary for the purpose.
Regularly audit storage systems to ensure
compliance with retention policies.
10. Unlawful transfer of Cross-border data sharing risks due to Verify the location of third-party services and
personal data reliance on third-party platforms or ensure compliance with GDPR cross-border
services in jurisdictions without adequate transfer rules.
data protection standards.
Perform Transfer Impact Assessments (TIAs)
for all external vendors.
Use standard contractual clauses and other
safeguards for data-sharing agreements with
third-party providers.
11. Breach of the data Excessive data collection: The system may Limit data collection to what is strictly
minimization principle collect or process more data than necessary for fulfilling user requests (e.g.,
necessary for fulfilling user requests (e.g., exclude unnecessary calendar details).
unnecessary calendar details or
Implement input validation and filters to
preferences).
prevent over-collection of data.
Use anonymization or pseudonymization to
minimize the risk of misuse or exposure of
collected data.
From identifying data flows to classifying risks and implementing mitigations, risk management is a
continuous iterative journey. It requires consistent monitoring, stakeholder collaboration, and
adjustments based on real-world observations and emerging technologies.
Risk management should remain adaptable, incorporating feedback and evolving alongside regulatory
and technological advancements.
As we conclude this report, it is important to reiterate that while the risk management framework
presented in this document provides guidance, every organization must customize its approach to
address the specific nuances of their LLM based use cases.
Privacy and data protection are not static goals but ongoing commitments.
101
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Example: Word Embedding Association Test (WEAT): This test measures how strongly certain
words are associated with particular groups of people, aiming to detect stereotypes in the
model’s word embeddings. For instance, comparing the proximity of words that indicate gender
(such as names or pronouns) with various career-related words can point to gender bias in the
word embeddings, such as 'man’ being represented as closer to 'doctor', and 'woman’ being
embedded closer to 'nurse'. This can predict bias in the model's output as well.
Toxicity Detection
Toxicity evaluation assesses how often LLMs generate harmful, offensive, or inappropriate content. This
includes hate speech, insults, or harassment. What is considered ‘inappropriate’ content can be context-
dependent; for instance, AI systems that interact with children might have a lower threshold for
inappropriate content than adult-only systems.
Example: Toxicity Score: This metric aims to predict the probability of a piece of text being
considered 'toxic'. Usually expressed as a percentage, the closer this score is to 0, the less likely
it is for the text to be toxic. This metric is used in toxicity detection tools such as Perspective
API, aiming to detect and reduce toxicity and harmful content in textual data.
Fairness Metrics
Fairness evaluation focuses on evaluating the extent to which LLMs treat all user groups equitably
without exhibiting or perpetuating systematic biases. This is of course tricky because fairness is an
inherently complex term whose definition is debated and open to interpretation. Therefore, the chosen
metrics are usually geared towards optimizing the definitions/dimensions of fairness that are most
appropriate for each given case.
306
Verma, A., Plain English AI,’NLP evaluation: Intrinsic vs. extrinsic assessment’ Medium (2023) https://ai.plainenglish.io/nlp-evaluation-
intrinsic-vs-extrinsic-assessment-ff1401505631
102
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Example: Demographic Parity: Initially a metric used in classification, demographic parity can
be adapted to a text output. It measures whether the model generates text that represents all
demographic groups equally in terms of frequency, sentiment, and associations. It can answer
questions such as ‘are individuals of different ethnicities represented equally positively in the
generated text?’ or ‘are women as frequently associated with high athletic performance as
men?'.
Benchmarks
Benchmarks are standardized datasets, tasks, and evaluation protocols used to measure and compare
the performance of various AI models, including LLMs. They provide a consistent framework to assess a
model's capabilities, ensuring that performance can be compared across different models, tasks, and
implementations.
Here are some common benchmarks for LLMs:
General Language Understanding Evaluation (GLUE): A collection of tasks designed to evaluate
natural language understanding, including sentiment analysis and sentence similarity. This
benchmark is model-agnostic, meaning that it can be used to assess any system that takes a
text input and generates a text output. Given the considerable recent progress of language
models, the SuperGLUE benchmark has been introduced as a more challenging and nuanced
version of GLUE. It includes more advanced language understanding tasks and a public
leaderboard for state of the art models. https://gluebenchmark.com/
AlpacaEval: An LLM-based automatic evaluation based on the AlpacaFarm evaluation set, which
tests the ability of models to follow general user instructions. https://tatsu-
lab.github.io/alpaca_eval/
HellaSwag: A challenge dataset for evaluating commonsense NLI that is specially hard for state
of the art models, though its questions are trivial for humans (>95% accuracy).
https://rowanzellers.com/hellaswag/
Big-Bench (Beyond the Imitation Game Benchmark): A set of tasks designed to evaluate the
capabilities and limitations of LLMs on diverse and challenging tasks. These tasks are designed
to test abilities beyond what is evaluated by standard benchmarks, assessing abstract
reasoning, problem-solving, or the ability to handle more unconventional or complex prompts.
The higher the BIG-bench score, the better the model performs in complex tasks.
https://github.com/google/BIG-bench
103
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
AIR-BENCH 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies
(https://arxiv.org/pdf/2407.17436v2) & https://huggingface.co/datasets/stanford-crfm/air-
bench-2024
LLM Guard (by Protect AI): It is a comprehensive tool designed to fortify the security of Large
Language Models (LLMs). https://llm-guard.com/
Safeguards/Guardrails in LLMs
Safeguards (or guardrails) in LLMs are mechanisms implemented to ensure that the models operate in
a safe, ethical, and reliable manner. They can be applied to various stages of the LLM pipeline (pre-
processing, training, output…), and be focused on addressing different risks. For instance, some
safeguards aim to avoid the generation of unethical, harmful or inappropriate content (so the behavior
of the model), while others focus on preserving the privacy of the owners of the data (or other
stakeholders).
Here are some examples of behavioral guardrails that aim to moderate the LLM's output and mitigate
harm that could be caused by the output without intervention:
Content filters: moderate outputs by blocking or flagging harmful or toxic content
Prompt refusals: prevent responses to dangerous or unethical prompts (like a request for
instructions to a successful robbery)
Bias mitigation: reduce stereotypical or unfair outputs during inference
Human-in-the-Loop approaches: human oversight for high-risk applications, in order to not
leave important decision-making fully in the 'hands’ of an automated system, which cannot truly
comprehend what is at stake.
Post-processing detoxification: filter or rewrite outputs to remove harmful content
Adversarial testing (red teaming): evaluate and stress-test the model's ability to successfully
deal with harmful prompts
104
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
105
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Methodologies and Tools for the Identification of Data Protection and Privacy Risks
Practical Library of Threats (PLOT4ai) is a threat modeling methodology for the identification of risks
in AI systems. It also contains a library with more than 80 risks specific to AI systems:
https://plot4.ai/
MITRE ATLAS™ (Adversarial Threat Landscape for Artificial-Intelligence Systems), is a knowledge
base of adversary tactics, techniques, and case studies for machine learning (ML) systems:
https://atlas.mitre.org/
Assessment List for Trustworthy Artificial Intelligence (ALTAI) is a checklist that guides developers
and deployers of AI systems in implementing trustworthy AI principles: https://digital-
strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-
assessment
Guidance
OECD AI Language Models:
https://www.oecd.org/content/dam/oecd/en/publications/reports/2023/04/ai-language-
models_46d9d9b4/13d38f92-en.pdf
NIST GenAI Security: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-218A.pdf
NIST Artificial Intelligence Risk Management Framework - NIST AI 600-1:
https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
OECD Advancing accountability in AI Governing and managing risks throughout the lifecycle for
trustworthy AI:
https://www.oecd.org/content/dam/oecd/en/publications/reports/2023/02/advancing-
accountability-in-ai_753bf8c8/2448f04b-en.pdf
FRIA methodology for AI design and development:
https://apdcat.gencat.cat/es/documentacio/intelligencia_artificial/index.html
AI Cyber Security Code of Practice (gov.uk): https://www.gov.uk/government/publications/ai-
cyber-security-code-of-practice
Standards
The European Standardisation Body CEN/CENELEC is currently developing different AI harmonized
standards following the AI Act Standardization Request307 from the European Commission.
High-risk AI systems or general-purpose AI models that comply with these forthcoming harmonized
standards are presumed to meet the specific requirements outlined in the AI Act 308. However, this
presumption does not extend to international standards such as ISO/IEC 42001309 and ISO/IEC 23894310.
Nevertheless, these standards provide a robust foundation and offer valuable best practices.
307
European Commission, ‘Implementing decision C(2023)3215 final of 22.5.2023 on a standardisation request to the European Committee
for Standardisation and the European Committee for Electrotechnical Standardisation in support of Union policy on artificial intelligence’
(2023), https://ec.europa.eu/transparency/documents-register/detail?ref=C(2023)3215&lang=en
308
Article 40 AI Act
309
ISO/IEC 42001:2023 Information technology — Artificial intelligence — Management system
310
ISO/IEC 23894:2023 Information technology — Artificial intelligence — Guidance on risk management
106
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
107