Aim
Aim
University of Virginia
[email protected]
November 2, 2023
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 1 / 27
Agenda
4 Theoretical Analysis
5 Experiments
6 Open Problems
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 2 / 27
Motivation
Pros
Has the same form as the original dataset, making it easy to work with.
Can be used in place of the original dataset for any downstream task.
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 3 / 27
Problem formulation
Given:
A sensitive dataset D
Privacy parameters (ϵ, δ)
A workload W
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 4 / 27
The select-measure-generate paradigm
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 5 / 27
Iterative select-measure-generate paradigm
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 6 / 27
Iterative select-measure-generate paradigm
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 7 / 27
Main considerations
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 9 / 27
AIM: method overview
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 11 / 27
Marginals
Definition (Marginal)
Let r ⊆ [d] be a subset of attributes, Ωr = Πi∈r Ωi , nr = |Ωr |, and
xr = (xi )i∈r . The marginal on r is a vector µ ∈ Rnr , indexed by domain
Pt ∈ Ωr , such that each entry is a count, i.e.,
elements
µ[t] = x∈D 1[xr = t]. We let Mr : D → Rnr denote the function that
computes the marginal on r , i.e., µ = Mr (D).
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 12 / 27
Workload
This work focuses on the special (but common) case where the
workload consists of a collection of weighted marginal queries.
Utility measure: workload error
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 13 / 27
Differential Privacy
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 14 / 27
zero-Concentrated Differential Privacy (zCDP)
Definition (zCDP)
A randomized mechanism M is ρ-zCDP if for any two neighboring
datasets Dand D ′ , and all α ∈ (1, ∞), we have:
Dα (M(D)||M(D ′ )) ≤ ρα,
1
The Gaussian Mechanism satisfies 2σ 2
-zCDP;
2
The Exponential Mechanism satisfies ϵ8 -zCDP;
Composition of two mechanisms with ρ1 -zCDP and ρ2 -zCDP satisfies
(ρ1 + ρ2 )-zCDP
If a mechanism M satisfies ρ-zCDP, it also satisfies (ϵ, δ)-DP for all
α
ϵ ≥ 0 and δ = minα>1 exp((α−1)(αρ−ϵ))
α−1 1 − α1 .
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 15 / 27
Private-PGM
Initialization: line 7
Iteration: line 10
Select: line 14
Measure: line 15
Generate: line 19
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 17 / 27
Intelligent initialization
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 18 / 27
New Candidates
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 19 / 27
Better Selection Criteria
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 20 / 27
Better Selection Criteria
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 21 / 27
Adaptive Rounds and Budget Split
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 22 / 27
Theoretical analysis: uncertainty quantification
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 23 / 27
Theoretical analysis: easy case
Unbiased estimates
Triangle inequality
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 24 / 27
Theoretical analysis: hard case
Key insight is that marginal queries not selected have relatively low error
compared to the marginal queries that were selected. We can easily bound
the error of selected queries and relate that to non-selected queries by
utilizing the guarantees of the exponential mechanism.
Triangle inequality
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 25 / 27
Experiments
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 26 / 27
Open Problems
Presented by: Xizixiang Wei (UVA) AIM for Synthetic Data November 2, 2023 27 / 27