Lecture 09 DifferentialPrivacy
Lecture 09 DifferentialPrivacy
•Consent or anonymize
•Anonymization
2
Approach for private release
•Anonymization
• removing identifiers (safe harbor)
• k-anonymity and l-diversity
3
Solution?
4
Basic Setting
x1 query 1
x2 answer 1 Users
x3 interface
DB= (government,
query T researchers,
xn-1 marketers, …)
xn answer T
Table 1:
Risk to Privacy
An attacker knows the age 20 and zipcode 15000 of Alice. In order to
infer something about Alice’s income, s/he issues 2 queries:
• q0: SELECT COUNT(*) FROM T WHERE Age ∈ [20, 20] AND Zipcode ∈ [15k, 15k]
AND Income ∈ [80k, +∞)
Table 1:
Risk to Privacy
An attacker knows the age 20 and zipcode 15000 of Alice. In order to
infer something about Alice’s income, s/he issues 2 queries:
• q0: SELECT COUNT(*) FROM T WHERE Age ∈ [20, 20] AND Zipcode ∈ [15k, 15k]
AND Income ∈ [80k, +∞)
• 1
• q’0: SELECT COUNT(*) FROM T WHERE Age ∈ [20, 20] AND Zipcode ∈ [15k, 15k]
AND Income ∈ (-∞, 80k)
• 0
Table 1:
Conclusion?
• Carefully drafted queries can lead to privacy violations
x1 query 1
x2 answer 1 Users
x3 interface
DB= (government,
query T researchers,
xn-1 marketers, …)
xn answer T
➢ Q(Di-me)=Q(Di)
➢ Prob(secret(me)| R)=Prob(secret(me))
12
Why can’t we have it?
13
Problems?
What do we want?
I would feel safe submitting a
survey if……
➢ Prob(secret(me)| R)=Prob(secret(me))
14
More examples
• Example:
• The research findings show the connection between
(gender, age) and how one listens to Bieber (for example
90% of boys age 21 like Justin Beiber):
• Will be almost the same regardless of my participation
• If I am a 21-year-old boy, then the result will implicate me even if I
do not participate.
18