0% found this document useful (0 votes)

2 views

Aim

The document presents AIM, an adaptive and iterative mechanism for generating differentially private synthetic data that preserves statistical properties of sensitive datasets. It outlines the methodology, including the select-measure-generate paradigm and the importance of judiciously selecting marginal queries while considering privacy budgets and workload requirements. The document also discusses theoretical analysis, experimental results, and open problems for future research.

Uploaded by

myshenc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Aim

Uploaded by

myshenc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

AIM: An Adaptive and Iterative Mechanism for

Differentially Private Synthetic Data

Presented by: Xizixiang Wei

University of Virginia
[email protected]

November 2, 2023

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 1 / 27
Agenda

1 Motivation and Method Overview

2 Concepts and Tools

3 Technical Details on AIM

4 Theoretical Analysis

5 Experiments

6 Open Problems

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 2 / 27
Motivation

Private synthetic data

Given sensitive data about individuals, construct a synthetic dataset
that preserves important statistical properties of the original dataset,
while offering formal privacy guarantees to individual in the dataset.

Pros
Has the same form as the original dataset, making it easy to work with.

Can be used in place of the original dataset for any downstream task.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 3 / 27
Problem formulation

Given:
A sensitive dataset D
Privacy parameters (ϵ, δ)
A workload W

Problem: Design an (ϵ, δ)-differentially private mechanism M such that

generates a synthetic dataset D̂ = M(D) such that W (D̂) ≈ W (D).

Workload: In this work (and a series of works), we focus on the special

(but common) case where the workload consists of a collection of
weighted marginal queries.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 4 / 27
The select-measure-generate paradigm

Select a set of marginal queries to measure.

Measure marginals privately using a noise addition mechanism.

Generate synthetic data that best explains the noisy marginals.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 5 / 27
Iterative select-measure-generate paradigm

Initialize estimate of data distribution

Repeat

Select marginal query poorly approximated by current estimate

Measure selected marginal using noise-addition mechanism

Update estimate of data distribution from measured info

Generate synthetic data by estimated data distribution

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 6 / 27
Iterative select-measure-generate paradigm

Initialize estimate of data distribution How to initialize?

Repeat
How many rounds to run?
How much budget to spend per round?
Select marginal query poorly approximated by current estimate
What set of candidates to select from?
What quality score function to use?
What selection mechanism to use?
Measure selected marginal using noise-addition mechanism
What noise addition mechanism to use?
What privacy accounting method to use?
Update estimate of data distribution from measured info
What estimation algorithm to use?
What measured information to incorporate?
Generate synthetic data by estimated data distribution

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 7 / 27
Main considerations

Must select marginal queries judiciously:

Budget-aware: should intelligently adapt to the available privacy

budget
Workload-aware: should help answer the workload
marginal selection that independent with workload is necessarily
sub-optimal for a specific workload.
Data-aware: should exploit knowledge of domain and data
distribution
Select marginal queries from a set of candidates based on the data.
Efficiency-aware: should enable tractable post-processing
Mechanisms that build on top of Private-PGM must ensure JT-SIZE
remains sufficiently small for computational tractability.
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 8 / 27
Compared with existing method

Budget-aware: should intelligently adapt to the available privacy

budget
Workload-aware: should help answer the workload
Data-aware: should exploit knowledge of domain and data distribution
Efficiency-aware: should enable tractable post-processing

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 9 / 27
AIM: method overview

Initialize estimate of data distribution [New] Initialization method

Repeat
[New]
Adaptive rounds + budget split (hyper-parameter free)
Select marginal query poorly approximated by current estimate
[New]
Workload- and efficiency-aware candidate set
Budget- and data-aware quality score function
Measure selected marginal using noise-addition mechanism
[Prior work]
Gaussian noise, zCDP accounting
Update estimate of data distribution from measured info
[Prior work]
Private-PGM, ICML 2019
Generate synthetic data by estimated data distribution [Prior
work]Private-PGM
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 10 / 27
Data

A dataset D is a multiset of N records

Each record x ∈ D is a d-tuple (x1 , · · · , xd )
The domain of possible values for xi is denoted by Ωi , with size
|Ωi | = ni
The full domain of x: Ω = Ω1 × · · · × Ωd , with size n ≜ |Ω| = Πi ni
The set of all possible datasets: D

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 11 / 27
Marginals

Definition (Marginal)
Let r ⊆ [d] be a subset of attributes, Ωr = Πi∈r Ωi , nr = |Ωr |, and
xr = (xi )i∈r . The marginal on r is a vector µ ∈ Rnr , indexed by domain
Pt ∈ Ωr , such that each entry is a count, i.e.,
elements
µ[t] = x∈D 1[xr = t]. We let Mr : D → Rnr denote the function that
computes the marginal on r , i.e., µ = Mr (D).

It is easy to verify that the l2 sensitivity of any marginal query Mr (D)

is 1.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 12 / 27
Workload

This work focuses on the special (but common) case where the
workload consists of a collection of weighted marginal queries.
Utility measure: workload error

Definition (Workload error)

A workload W consists of a list of marginal queries r1 , · · · , rk where
ri ⊆ [d], together with associated weights ci ≥ 0. The error of a synthetic
dataset D̂ is defined as:
k
1 X
Error(D, D̂) = ci ∥Mri (D) − Mri (D̂)∥1
k · |D|
i=1

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 13 / 27
Differential Privacy

Definition of (ϵ, δ) − DP, sensitivity and Gaussian Mechanism...

Definition (Exponential Mechanism)

Let qr : D → R be quality score function defined for all r ∈ R and let
ϵ ≥ 0 be a real number. Then the exponential mechanism outputs a
candidate r ∈ R according to the following distribution:
ϵ
Pr [M(D) = r ] ∝ exp · qr (D) ,
2∆
where ∆ = maxr ∈R ∆(qr ) is the sensitivity.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 14 / 27
zero-Concentrated Differential Privacy (zCDP)

Definition (zCDP)
A randomized mechanism M is ρ-zCDP if for any two neighboring
datasets Dand D ′ , and all α ∈ (1, ∞), we have:

Dα (M(D)||M(D ′ )) ≤ ρα,

where Dα is the Rényi divergence of order α.

1
The Gaussian Mechanism satisfies 2σ 2
-zCDP;
2
The Exponential Mechanism satisfies ϵ8 -zCDP;
Composition of two mechanisms with ρ1 -zCDP and ρ2 -zCDP satisfies
(ρ1 + ρ2 )-zCDP
If a mechanism M satisfies ρ-zCDP, it also satisfies (ϵ, δ)-DP for all
α
ϵ ≥ 0 and δ = minα>1 exp((α−1)(αρ−ϵ))
α−1 1 − α1 .

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 15 / 27
Private-PGM

The heart of Private-PGM is an optimization problem to find a

distribution p̂ that “best explains” the noisy observations µ̃i :
k
X 1
p̂ := arg min ∥Mri (p) − µ̃i ∥22 ,
p∈S σi
i=1
P
where S = {p|p(x) ≥ 0 and x∈Ωp(x)n } is the set of(scaled) probability
distributions over the domain Ω.
Junction tree size: Private-PGM exposes a callable function
JT-SIZE(r1 , · · · rk ) that can be invoked to check how large a junction
tree is.
The runtime of distribution estimation is roughly proportional to
JT-SIZE.
If arbitrary marginals are measured, JT-SIZE can grow out of control,
no longer fitting in memory, and leading to unacceptable runtime.
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 16 / 27
Technical Details on AIM

Initialization: line 7
Iteration: line 10
Select: line 14
Measure: line 15
Generate: line 19

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 17 / 27
Intelligent initialization

Spend a small fraction of the privacy

budget to measure 1-way marginals;
Estimates p̂ an independent model
where all 1-way marginals are preserved
well;
Provide a far better initialization than
the default uniform distribution.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 18 / 27
New Candidates

Which candidates in the workload W

can be selected? Marginal queries in
the downward closure of the workload.
The downward closure
W+ = {r |r ⊆ s, s ∈ W };
Lower-dimensional marginals has a
priority to be chosen.
The set will only consist of candidates
with JT-SIZE below a prespecified limit.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 19 / 27
Better Selection Criteria

New quality score function in line 14

p
∥Mr (D) − Mr (pt−1 )∥1 − 2/πσt nr :
the l1 error under the current model
minus the expected l1 error if it is
measured at the current noise level
P
Weight wr = s∈W cs |r ∩ s|: captures
the degree to which the marginal
queries in the workload overlap with r .
In general, put more weight on
marginals with more attributes.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 20 / 27
Better Selection Criteria

Trade-off in quality score function

p
The penalty term 2/πσt nr
discourages marginals with more
attributes.
Weight wr favors marginals with more
attributes.
However, if the inner expression is
negative, then the larger weight will
make it more negative, and much less
likely to be selected.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 21 / 27
Adaptive Rounds and Budget Split

The annealing condition is activated if

the difference between Mrt (p̂t ) and
Mrt (p̂t−1 ) is small, which indicates that
not much information was learned in
the previous round.
We initialize ϵt and σt conservatively.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 22 / 27
Theoretical analysis: uncertainty quantification

Provide probability bound for ∥Mr (D) − Mr (D̂)∥1 .

Only give guarantees for marginals in the workload W .
Two cases:
The easy case: Marginal r has been sleeted: we have unbiased estimate
of Mr (D) from yt
The hard case: Marginal r has not been sleeted: no unbiased estimate
of Mr (D)

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 23 / 27
Theoretical analysis: easy case

Unbiased estimates

Probability bound of Gaussian vector

Triangle inequality

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 24 / 27
Theoretical analysis: hard case

Key insight is that marginal queries not selected have relatively low error
compared to the marginal queries that were selected. We can easily bound
the error of selected queries and relate that to non-selected queries by
utilizing the guarantees of the exponential mechanism.

Triangle inequality

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 25 / 27
Experiments

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 26 / 27
Open Problems

Handling more general workloads.

Handling mixed data types.
Utilizing public data: design synthetic data mechanisms that
incorporate public data.

Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 27 / 27

AIM_ an Adaptive and Iterative Mechanism for Differentially Private Synthetic Data
No ratings yet
AIM_ an Adaptive and Iterative Mechanism for Differentially Private Synthetic Data
20 pages
privacy preserving machine learning
No ratings yet
privacy preserving machine learning
28 pages
Final Version on IEEE
No ratings yet
Final Version on IEEE
16 pages
Distributed DP in Mixnets
No ratings yet
Distributed DP in Mixnets
38 pages
Yaateh-Richardson-thesis-proposal-annotated
No ratings yet
Yaateh-Richardson-thesis-proposal-annotated
18 pages
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
No ratings yet
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
8 pages
WDS Unit 5 Notes
No ratings yet
WDS Unit 5 Notes
20 pages
Fake It Till You Make It Guidelines For Effective
No ratings yet
Fake It Till You Make It Guidelines For Effective
18 pages
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
4 pages
Privacy Amplification by Iteration for ADMM With (Strongly)
No ratings yet
Privacy Amplification by Iteration for ADMM With (Strongly)
41 pages
Federated Learning With Differential Privacy Algorithms and Performance Analysis
No ratings yet
Federated Learning With Differential Privacy Algorithms and Performance Analysis
16 pages
2401.01629v1
No ratings yet
2401.01629v1
9 pages
Lecture 1 - Novi Quadrianto
No ratings yet
Lecture 1 - Novi Quadrianto
57 pages
Privacy Preserving Decision Tree Learning PDF
No ratings yet
Privacy Preserving Decision Tree Learning PDF
12 pages
Rania Talbi Pres Soutn
No ratings yet
Rania Talbi Pres Soutn
197 pages
cs6359 hw1 With Hints
No ratings yet
cs6359 hw1 With Hints
2 pages
1Differentially_Private_Federated_Learning_With_an_Adaptive_Noise_Mechanism
No ratings yet
1Differentially_Private_Federated_Learning_With_an_Adaptive_Noise_Mechanism
14 pages
The Promise of Differential Privacy: Cynthia Dwork, Microsoft Research
No ratings yet
The Promise of Differential Privacy: Cynthia Dwork, Microsoft Research
50 pages
Literature Review Draft 1
No ratings yet
Literature Review Draft 1
22 pages
Worksheet 2 Data Literacy and AI Ethics
No ratings yet
Worksheet 2 Data Literacy and AI Ethics
2 pages
Siva Sankar
No ratings yet
Siva Sankar
6 pages
Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data
No ratings yet
Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data
20 pages
Final Project
No ratings yet
Final Project
15 pages
Synthetic Data - What, Why and How
No ratings yet
Synthetic Data - What, Why and How
57 pages
Privacy issues in data mining (Lecture Notes in Computer Science 6549) Isaac Cano, Vicenç Torra (auth.), Christos Dimitrakakis, Aris Gkoulalas-Divanis, Aikaterini Mitrokotsa, Vassilios S. Verykios, Yücel Saygin (eds.) - Privacy an
No ratings yet
Privacy issues in data mining (Lecture Notes in Computer Science 6549) Isaac Cano, Vicenç Torra (auth.), Christos Dimitrakakis, Aris Gkoulalas-Divanis, Aikaterini Mitrokotsa, Vassilios S. Verykios, Yücel Saygin (eds.) - Privacy an
148 pages
2105.02091
No ratings yet
2105.02091
11 pages
APPGM
No ratings yet
APPGM
10 pages
Kumar 2017
No ratings yet
Kumar 2017
7 pages
Week 7 - Solution
No ratings yet
Week 7 - Solution
3 pages
Ps 4
No ratings yet
Ps 4
6 pages
Privacy and Utility Tradeoff in Approximate Differential Privacy
No ratings yet
Privacy and Utility Tradeoff in Approximate Differential Privacy
15 pages
Differential Privacy
No ratings yet
Differential Privacy
56 pages
Fairness Lectures-21
No ratings yet
Fairness Lectures-21
63 pages
UNIT2
No ratings yet
UNIT2
20 pages
BD 09 Parallel MF
No ratings yet
BD 09 Parallel MF
37 pages
Differential Privacy Preserving Using TensorFlow DP-SGD and 2D-CNN For Large-Scale Image Data
No ratings yet
Differential Privacy Preserving Using TensorFlow DP-SGD and 2D-CNN For Large-Scale Image Data
9 pages
PrivTrace Differentially Private Trajectory Synthesis by Adaptive Markov Models
No ratings yet
PrivTrace Differentially Private Trajectory Synthesis by Adaptive Markov Models
18 pages
Towards_FAIR_Data_in_Distributed_Machine_Learning_Systems
No ratings yet
Towards_FAIR_Data_in_Distributed_Machine_Learning_Systems
6 pages
Templ et al (2017) Simulation of Synthetic Complex Data The R Package simPop
No ratings yet
Templ et al (2017) Simulation of Synthetic Complex Data The R Package simPop
38 pages
Challenges in Algorithmic Fairness When Using Multi Party Computation Models
No ratings yet
Challenges in Algorithmic Fairness When Using Multi Party Computation Models
16 pages
Lian Duke 0066D 13204
No ratings yet
Lian Duke 0066D 13204
117 pages
FAaCT (Fair Accountable Transparent ) Machine Learning
No ratings yet
FAaCT (Fair Accountable Transparent ) Machine Learning
9 pages
ECE 6504: Advanced Topics in Machine Learning: Probabilistic Graphical Models and Large-Scale Learning
No ratings yet
ECE 6504: Advanced Topics in Machine Learning: Probabilistic Graphical Models and Large-Scale Learning
40 pages
2011 Data Mining
No ratings yet
2011 Data Mining
5 pages
A Review of Machine Learning Methodologies For Network Intrusion Detection
No ratings yet
A Review of Machine Learning Methodologies For Network Intrusion Detection
4 pages
Project
No ratings yet
Project
2 pages
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
8 pages
Fast Generation of Accurate Synthetic Microdata
No ratings yet
Fast Generation of Accurate Synthetic Microdata
9 pages
Active Query of Private Demographic Data For Learning Fair Models
No ratings yet
Active Query of Private Demographic Data For Learning Fair Models
4 pages
Data_Privacy_Preservation_Using_Differential_Privacy_and_Re-Identification_Attacks
No ratings yet
Data_Privacy_Preservation_Using_Differential_Privacy_and_Re-Identification_Attacks
6 pages
Synthetic ECG Generation For Data Augmentation and Transfer Learning in Arrhythmia Classification
No ratings yet
Synthetic ECG Generation For Data Augmentation and Transfer Learning in Arrhythmia Classification
23 pages
DIFFERENTIALLY PRIVATE INFERENCE VIA NOISY OPTIMIZATION
No ratings yet
DIFFERENTIALLY PRIVATE INFERENCE VIA NOISY OPTIMIZATION
26 pages
CS 229, Public Course Problem Set #4: Unsupervised Learning and Re-Inforcement Learning
No ratings yet
CS 229, Public Course Problem Set #4: Unsupervised Learning and Re-Inforcement Learning
5 pages
Machine Learning Based Network Intrusion Detection For Big and Imbalanced Data Using Oversampling, Stacking Feature Embedding and Feature Extraction
No ratings yet
Machine Learning Based Network Intrusion Detection For Big and Imbalanced Data Using Oversampling, Stacking Feature Embedding and Feature Extraction
44 pages
2505.13362v1
No ratings yet
2505.13362v1
12 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Session 2 - Sem Mediation and Moderation 2014
No ratings yet
Session 2 - Sem Mediation and Moderation 2014
33 pages
BTMMS1 2 Nisa Nadiah Binti Mohd Shaifolazham B062010240 Forecasting Exercise PDF
No ratings yet
BTMMS1 2 Nisa Nadiah Binti Mohd Shaifolazham B062010240 Forecasting Exercise PDF
10 pages
CHAPTER 1 Random Variables and Probability Distributions
No ratings yet
CHAPTER 1 Random Variables and Probability Distributions
55 pages
Non Linear Regression
No ratings yet
Non Linear Regression
12 pages
BIOS 201 Course Book Fall Sem AY 2021-2022
No ratings yet
BIOS 201 Course Book Fall Sem AY 2021-2022
14 pages
Stat LAS 12
No ratings yet
Stat LAS 12
5 pages
Sampling and Sample Size Determination For DBIM, VNSGU
100% (1)
Sampling and Sample Size Determination For DBIM, VNSGU
34 pages
Binomial Distribution
No ratings yet
Binomial Distribution
25 pages
Lampiran SPSS Pengetahuan Dan Status Gizi
0% (1)
Lampiran SPSS Pengetahuan Dan Status Gizi
2 pages
Efa Medstat
No ratings yet
Efa Medstat
20 pages
Chapter 4 panel
No ratings yet
Chapter 4 panel
11 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
Evlilik-Kaygisi-Olcegi-Toad - PDF ÇELİK, ERKİLET
No ratings yet
Evlilik-Kaygisi-Olcegi-Toad - PDF ÇELİK, ERKİLET
11 pages
X Bin (4, 0.5) : January 2006 6684 Statistics S2 Mark Scheme
No ratings yet
X Bin (4, 0.5) : January 2006 6684 Statistics S2 Mark Scheme
8 pages
Role of Optimizer in Neural Network
No ratings yet
Role of Optimizer in Neural Network
2 pages
Stat Assignment
No ratings yet
Stat Assignment
11 pages
Statistics and Probability Module 7: Week 7: Third Quarter
No ratings yet
Statistics and Probability Module 7: Week 7: Third Quarter
7 pages
Linear_regression
No ratings yet
Linear_regression
10 pages
D2 Basic Stat
No ratings yet
D2 Basic Stat
53 pages
Lampiran 4
No ratings yet
Lampiran 4
24 pages
Univariate - Bivariate-Multivariate Analysis
No ratings yet
Univariate - Bivariate-Multivariate Analysis
10 pages
Introduction To Path Analysis and SEM With AMOS
0% (1)
Introduction To Path Analysis and SEM With AMOS
41 pages
TB_frq8
No ratings yet
TB_frq8
26 pages
Where Can Buy (Ebook PDF) Understandable Statistics 11th Edition by Charles Henry Brase Ebook With Cheap Price
100% (4)
Where Can Buy (Ebook PDF) Understandable Statistics 11th Edition by Charles Henry Brase Ebook With Cheap Price
41 pages
2nd Quarter PR 2 REVIEWER
No ratings yet
2nd Quarter PR 2 REVIEWER
5 pages
ITC Midesm
No ratings yet
ITC Midesm
2 pages
Forecasting
No ratings yet
Forecasting
4 pages
Error Propagation
No ratings yet
Error Propagation
4 pages
2 BA-STAT102 - Sample Computation
No ratings yet
2 BA-STAT102 - Sample Computation
8 pages
Chapter 05 IM Prob Solutions
No ratings yet
Chapter 05 IM Prob Solutions
14 pages

Aim

Uploaded by

Aim

Uploaded by

AIM: An Adaptive and Iterative Mechanism for

Differentially Private Synthetic Data

Presented by: Xizixiang Wei

1 Motivation and Method Overview

2 Concepts and Tools

3 Technical Details on AIM

Private synthetic data

Problem: Design an (ϵ, δ)-differentially private mechanism M such that

Workload: In this work (and a series of works), we focus on the special

Select a set of marginal queries to measure.

Measure marginals privately using a noise addition mechanism.

Generate synthetic data that best explains the noisy marginals.

Initialize estimate of data distribution

Select marginal query poorly approximated by current estimate

Measure selected marginal using noise-addition mechanism

Update estimate of data distribution from measured info

Generate synthetic data by estimated data distribution

Initialize estimate of data distribution How to initialize?

Must select marginal queries judiciously:

Budget-aware: should intelligently adapt to the available privacy

Budget-aware: should intelligently adapt to the available privacy

Initialize estimate of data distribution [New] Initialization method

A dataset D is a multiset of N records

It is easy to verify that the l2 sensitivity of any marginal query Mr (D)

Definition (Workload error)

Definition of (ϵ, δ) − DP, sensitivity and Gaussian Mechanism...

Definition (Exponential Mechanism)

where Dα is the Rényi divergence of order α.

The heart of Private-PGM is an optimization problem to find a

Spend a small fraction of the privacy

Which candidates in the workload W

New quality score function in line 14

Trade-off in quality score function

The annealing condition is activated if

Provide probability bound for ∥Mr (D) − Mr (D̂)∥1 .

Probability bound of Gaussian vector

Handling more general workloads.

You might also like