5. Privacy Models Differential Privacy I
5. Privacy Models Differential Privacy I
privacy
Roadmap
• Differential privacy background
• Differentially Private Applications
• New applications: more than privacy
• Conclusion
2
1. Privacy preserving
background
3
Privacy Model
• A set of rules/assumptions to describe/measure the privacy of a dataset.
• Privacy: (informal definition) adversary cannot learn anything new about
particular individual after accessing to the dataset
Alice
Public Users or Entities
Bob Original PM
Dataset D
Curator
Cathy
4
Typical Privacy Model
5
Weakness of Traditional Privacy
Model
• Privacy level is difficult to be measured and compared
• The privacy guarantee is hardly to be proved theoretically
• Susceptible to background attack
6
Background Attack
D Attacker
Query result
X1 Query on (x1, « , xn)
differentiates with
X2 background
... information to get xn
Xn-1
Xn
X1
Background information
X2
on (x1,...xn-1)
...
Xn-1
7
Differential Privacy
• An individual is in or out of the database should make little difference
of the analytical output. Dwork, C. (2006). Differential Privacy
D
X1
X2 f(D) Why does it work?
… Output=S
Xn-1
Xn
Neighboring
datasets: differ in
a single record D’ If adversary cannot tell
X1 the difference between
X2 f(D’) the output of D and D’,
… Output=S
then Xn is safe.
Xn-1
Xn
8
2. Differential Privacy
9
Privacy Definition: DP
• Definition:
• a mechanism M is -differential privacy if for all pairs of neighboring datasets D
and D’, and for all possible output S, satisfy with:
is Privacy Budget
Pr[ M ( D) S ] Pr[M()]
D
e e D’ Ratio Bounded
Pr[ M ( D' ) S ]
DP-Mechanism
10
How to achieve the definition?
• Curator will not release a dataset, instead, user provide statistical
queries to the curator, and curator replies with query answers.
• Add uncertainty to the output
• True answer is unique, but DP answer is a distribution.
Pr
True answer S
11
How to achieve the definition?
• Add uncertainty to the output
• perturbation: how much is enough?
• Which Privacy Level? Privacy Budget
• How much the difference (between query results on neighboring datasets) should be
hide? Sensitivity
Pr ¿ 𝑓 ( 𝐷 ) − (𝑓𝐷( )𝐷′
𝑀𝑎𝑥∨𝑓 ( 𝐷′ )∨¿
− 𝑓)∨¿
13
Sensitivity
• Sensitivity is a parameter determining how much perturbation is
required in mechanisms.
Pr ¿ 𝑓 ( 𝐷 ) − (𝑓𝐷( )𝐷′
𝑀𝑎𝑥∨𝑓 ( 𝐷′ )∨¿
− 𝑓)∨¿
14
Principle Differential Privacy
Mechanisms
• Laplace Mechanism: add Laplace noise to the query result.
• How many people in this room have blue eyes?
Noise: Laplace(sensitivity/privacy budget)
-4 -3 -2 -1 0 1 2 3 4 5
Dwork, C., Mcsherry, F., Nissim, K., & Smith, A. (2006). Calibrating Noise to Sensitivity in Private Data Analysis, Theory of Cryptography , 265-284.
McSherry, F., & Talwar, K. (2007). Mechanism Design via Differential Privacy. (FOCS)
16
Laplace Mechanism
• Let f(D) be a numeric query on dataset D
• How many people in this room have blue eyes?
• The sensitivity of f:
M(D)
X1
X2 f(D) +noise
True answer Noisy output S
Xn-1
Xn
17
Laplace Example
Job Sex Age Disease • Query: How many people have HIV?
Engineer Male 35 Hepatitis • DP answer = True answer + Noise
Engineer Male 38 Hepatitis M)
Lawyer Male 38 HIV • Sensitivity is 1, because the answer is
Writer Female 30 Flu changed most at 1 if one user is deleted.
Writer Female 30 HIV
• If we define privacy budget=1, the noise
is sampled from Lap(1)
Dancer Female 30 HIV
18
Exponential Mechanism
• Exponential Mechanism is suitable for non-numeric output R
• What is the most common eye color in this rooms?
• i.e. R={Brown, Blue, Black, Green}
0%, 0%,
10%, 5%, 100%,
80%, 5%
0%
Sensitivity of q:
19
Exponential Example
• What is the most common eye color in this rooms?
• i.e. R={Brown, Blue, Black, Green}
0%, 0.01%,
11.8%, 0%, 100%, 0%
88%, 0.0001%
Sensitivity
Impactofofq:changing a single record
Sampling Probability
Option Score
=0 =0.1 =1
Brown 23 0.25 0.34 0.12
Blue 9 0.25 0.16 10-4
Black 27 0.25 0.40 0.88
Green 0 0.25 0.10 10-6
20
Local Differential Privacy (LDP)
Alice
Public Users or Entities
Bob LDP Original
Dataset D
Curator
Cathy
21
From DP to LDP: Formal Definition
Idea of DP: Any output should be about as likely
regardless of whether or not I am in the dataset
A randomized algorithm satisfies -differential privacy, iff for any two
neighboring datasets and and for any output Sof ,
A randomized algorithm satisfies -local differential privacy, iff for any two
inputs and and for any output of ,
[1] Optimal lower bound for differentially private multi-party aggregation by T.-H. H.
Chan, E. Shi, and D. Song
Practice at Scale (Part B) 23
Apple Differential Privacy
24
Apple Differential Privacy
25
Other adoptions
26
Advantage of Differential Privacy
• Differential privacy is a promising privacy model that can provide
provable privacy guarantee.
• It also has potential development on various research communities
such as data mining, machine learning, etc.
27
Disadvantage of Differential Privacy
• It is quite successful on the mathematic theory, but brings huge utility
loss in real applications.
• High noise
• Large sensitivity
• Sparse Dataset
• Limited privacy budget
• …
28