0% found this document useful (0 votes)

38 views

Event Data Privacy

1) The document discusses applying differential privacy to event-level databases where individuals can contribute multiple rows of data. 2) A naive approach of treating each row independently can violate privacy, so the concept of "k-neighboring" databases is introduced to bound the distance between datasets. 3) The sensitivity of queries must account for the maximum number of rows an individual can contribute.

Uploaded by

pragathisai0912

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Event Data Privacy

Uploaded by

pragathisai0912

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Event level data and

Differential Privacy
By
1. Parthasarathy - 19PD25
2. Sriram Sidhartha - 19PD35
3. Shania Job - 19PD32
4. Pragathi - 19PD17
User-level Databases

A user-level database is a database where each row corresponds to a

unique individual in a database.

For such databases, neighboring databases are understood as databases

that differ by a single row.
Event-level Databases
Event-level database is a database where a single individual can contribute
multiple events.Usually, each row in the database corresponds to a single event.

For this type of database, neighboring databases are understood as databases

that differ by a single individual.

In event-level databases a single individual can be represented by a fixed number

of rows, or by a variable number of rows.
Applying DP to Event-level Databases: A Naive Approach
Example:

Suppose a data scientist wants to release the visit count to four websites. They
have access to the following dataset that contains browser logs of 1234
employees of a company, from August to December 2020. Sample of the data:
In this database, each row represents an event described by the following fields:
● Employee Id
● Date of the event
● Time the event occurred
● The address of the domain visited.

The data scientist needs to compute a differentially private count of visits to each
of the following websites:
● Mail.com
● Bank.com
● Social.com
● games.com

in each of the following months:

August, September, October, November, December
● To proceed with the task, the data scientist naively sets the sensitivity of the
COUNT query to 1 and starts making queries to the database to get the
counts of visits to each website per month.

● The data scientist uses the Laplace mechanism to privatize the count. Let’s
set the budget of the data release to Є = 0.1

● The data scientist proceeds to make the queries and budget calculations the
same way as they would if he had been using a user-level database. If the
same budget is spent on each query, each query will require Є = 0.005. The
data scientist feels confident this data release will not leak any information
about individuals, only overall browsing patterns. The following data is
published:
● When visit counts are released, the data scientist notices a significant drop in
the number of visits from October to November for games.com and
social.com.

● If you knew your co-worker took a leave of absence during this time, then you
could surmise your co-worker’s browsing habits, even though the data
scientist made a release utilizing the Laplace mechanism, i.e. with noise
added.
Privacy Issues When Using the Naive Approach
According to the definition of Є-differential privacy, the likelihood of observing any
given output of the mechanism is almost the same for every neighboring
database.

When Є = 0.1, the definition of Є-differential privacy ensures that

Privacy Issues When Using the Naive Approach
In the above example, it is easy to identify the contributions of an individual user
because the distance between neighboring datasets is unbounded when we
have event-level data.

In this example a user contributes to multiple rows, so we will need to bound the
distance between the databases.

Because of this we will need to use another metric to measure the distance
between two databases and calibrate the differential privacy mechanism
accordingly.
Defining “Neighboring”: Event-level Databases

● In general, two datasets as neighboring if their distance is 1

● But, this does not apply for Event Level Databases
● Generalized notion of adjacency is required
● For two event-level datasets and X and Y, and distance metric M,
where is a DM(X,Y) distance metric relative to M, and an integer k, we
say that X and Y are “k-neighboring” provided that DM(X,Y) ≤ k
Example - Consider a sample student marks data
Suppose a student drops out of class
Difference is 2, therefore 2-neighbors
Defining “Sensitivity”: Event-level Databases

● Multiple rows may correspond to the actions of a single user.

● Understand exactly how many rows correspond to each individual.
● The dataset distance is the maximum number of rows that a single
user can contribute.
● For two event-level datasets X and Y that are k-neighboring, we say
that our function f is dout-sensitive with respect to some distance
metric MO, provided that d(f(X),f(Y))≤dout
● In English, this just says the greatest amount that the function may
change is dout
Example
● Given two neighboring databases, x and y, the sensitivity of a COUNT query
is given by maxx,y||COUNT(x)-COUNT(y)||1. Given that the maximum number
of website visits an individual may contribute is 4000, the sensitivity of
COUNT queries, when applying it to the browser logs database, is 4000.
● Can use the Laplace mechanism, with the correct parameters for the data
release.
● MLap(x) = Lap(shift=x,scale = Δ/ε) = Lap(shift=x,scale = 4000/ε)
Result
Making Queries to a Database of Browser

Visits to Top 500 Domains

Dataset
Queries
● Which are the top 5 most-visited domains?
● How many visits are there to the top 5 domains?
● How many visits are there for each day of the week?
● Suppose the data scientist wants to find the k that will give the best utility for
the counts of visits per user. Doing so in a non-privacy preserving manner
would consist in looking at the distribution of events per user and choosing a k
that would include 90% or 95% of events.
● However, this process is not differentially private. In this case, the data
scientist reveals the number of events in the 95th or 90th percentiles.
Generate Histogram
● One way to make the above analysis differentially private is to use part of the
privacy budget to generate a differentially private histogram of the number of
events.
● The data scientist can use a small and a very large k. Ideally, the k chosen for
this analysis should be a value that is much larger than what would be
expected as a 95th percentile of k.
● For the browsing logs dataset, let’s choose k = 50 visits.
● To make this preliminary analysis, the data scientist will bound each individual
in the dataset to 50 visits.
Code
Results
Running query with Laplace mechanism for count:

Mechanism.laplace

Events e

0 0 - 10 273449

1 10 - 20 118348

2 20 - 30 43548

3 30 - 40 15783

4 40 - 50 5960
Histogram
Reservoir Sampling
● When looking at quantiles on the non-private data, 92% of users have less
than 26 events and 98% of users have less than 40 events. This shows that
histogram analysis, even with a small , can give powerful insights when the
data distribution is unknown.
● Take k = 40 now for reservoir sampling
Reservoir Sampling Code
Count Visit
● Consider that the utility of a count vector is measured as follows:
○ 1) order of the domains, and
○ 2) counts of visits for each domain.
● From this perspective, the values of Counts of Visits on the differentially
private vector are relatively close to the non-private vector.
● The second observation is regarding ranking order of the domains. The top 2
domains are the same in both rankings, so when analyzing the top 5 domains,
there are 4 domains in the intersection of the dp-top5 and non-private top5.
Count Visit code
Results
Average Visits per user for each day of the Week
● The most reliable way to compute means is by making two separate queries:
● One query for the numerator (total number of visits per day) and
● one query for the denominator (total number of unique users per day).
● The calculation of the average becomes a post-processing step of two
differentially private functions.
Average Visits per user for each day of the Week Code
Results
Summary
1. identify the browser logs data as an event-level dataset
2. recognize that the number of events per user is unbounded
3. estimate, in a privacy-preserving manner, a bound k of events per user
without previous knowledge on the data distribution
4. pre-process the database in order to make it ready for a differential privacy
analysis
5. make differentially private queries to the database taking into consideration
the necessary code changes to account for multiple events per user
6. evaluate the results of different queries
7. post-process the results.

Analytics and Big Data for Accountants
From Everand
Analytics and Big Data for Accountants
Jim Lindell
No ratings yet
Differential-Privacy - Copy
No ratings yet
Differential-Privacy - Copy
40 pages
Introduction To Differential Privacy
No ratings yet
Introduction To Differential Privacy
11 pages
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
No ratings yet
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
23 pages
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
No ratings yet
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
23 pages
The Promise of Differential Privacy: Cynthia Dwork, Microsoft Research
No ratings yet
The Promise of Differential Privacy: Cynthia Dwork, Microsoft Research
50 pages
Differential Privacy
No ratings yet
Differential Privacy
56 pages
5. Privacy Models Differential Privacy I
No ratings yet
5. Privacy Models Differential Privacy I
27 pages
Lvilhuber,+Journal+Manager,+Fulltext
No ratings yet
Lvilhuber,+Journal+Manager,+Fulltext
36 pages
w9 Differential Privacy
No ratings yet
w9 Differential Privacy
30 pages
09 - COE426-Differential Privacy II
No ratings yet
09 - COE426-Differential Privacy II
30 pages
CERIAS Presentation PDF
No ratings yet
CERIAS Presentation PDF
17 pages
13-4 GoogleDIfferntialPrivacy
No ratings yet
13-4 GoogleDIfferntialPrivacy
20 pages
08 - COE426-Differential Privacy I
No ratings yet
08 - COE426-Differential Privacy I
23 pages
Differentially Private Depth Functions and Their Associated Medians
No ratings yet
Differentially Private Depth Functions and Their Associated Medians
22 pages
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
100% (1)
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
9 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
WDS Unit 5 Notes
No ratings yet
WDS Unit 5 Notes
20 pages
The Algorithmic Foundations of Differential Privacy
No ratings yet
The Algorithmic Foundations of Differential Privacy
281 pages
Privacy Book
No ratings yet
Privacy Book
281 pages
Privacy Chapter
No ratings yet
Privacy Chapter
6 pages
Diffrential Privacy
No ratings yet
Diffrential Privacy
35 pages
Locally Differentially Private Frequent Itemset Mining
No ratings yet
Locally Differentially Private Frequent Itemset Mining
17 pages
Tipos de Ruido en Privacidad Diferencial
No ratings yet
Tipos de Ruido en Privacidad Diferencial
10 pages
A Statistical Framework For Differential Privacy
No ratings yet
A Statistical Framework For Differential Privacy
16 pages
Crowd-Sourced Data Publishing
No ratings yet
Crowd-Sourced Data Publishing
2 pages
Privacy in Online Social Networks: A Survey
No ratings yet
Privacy in Online Social Networks: A Survey
4 pages
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
From Everand
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
Zemelak Goraga
No ratings yet
Data Science and Ethical Issues
No ratings yet
Data Science and Ethical Issues
42 pages
Differential Privacy
No ratings yet
Differential Privacy
12 pages
Extending Partial Differential Private Mechanisms
No ratings yet
Extending Partial Differential Private Mechanisms
6 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
2008 02 Robust De-Anonymization of Large Sparse Datasets
No ratings yet
2008 02 Robust De-Anonymization of Large Sparse Datasets
15 pages
Data 102 Fall 2023 Lecture 24 - Privacy in Machine Learning
No ratings yet
Data 102 Fall 2023 Lecture 24 - Privacy in Machine Learning
46 pages
3 Census Bureau
No ratings yet
3 Census Bureau
12 pages
Privacy Axioms
No ratings yet
Privacy Axioms
36 pages
Research Paper 3
No ratings yet
Research Paper 3
20 pages
Differential Privacy: On The Trade-Off Between Utility and Information Leakage
No ratings yet
Differential Privacy: On The Trade-Off Between Utility and Information Leakage
26 pages
2.1 Differential Privacy
No ratings yet
2.1 Differential Privacy
12 pages
Differential Privacy For Non Technical Audience
No ratings yet
Differential Privacy For Non Technical Audience
68 pages
Differential Privacy
No ratings yet
Differential Privacy
22 pages
Count Distributions For Autoregressive Conditional Duration Model
No ratings yet
Count Distributions For Autoregressive Conditional Duration Model
2 pages
Privacy Preserving Data Mining
No ratings yet
Privacy Preserving Data Mining
10 pages
DP-TBART
No ratings yet
DP-TBART
15 pages
2017 Book DifferentialPrivacyAndApplicat PDF
No ratings yet
2017 Book DifferentialPrivacyAndApplicat PDF
243 pages
02 Synopsis
No ratings yet
02 Synopsis
16 pages
Aplication of Differential Privacy On A Medical Dataset of The Health System in Colombia
No ratings yet
Aplication of Differential Privacy On A Medical Dataset of The Health System in Colombia
35 pages
Data Science - g.scali (Lect1) (1)
No ratings yet
Data Science - g.scali (Lect1) (1)
22 pages
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
Security Problem of Statistical Databases and Combinatorics of Finite Sets
No ratings yet
Security Problem of Statistical Databases and Combinatorics of Finite Sets
52 pages
A Differential Privacy Protecting K-Means Clustering Algorithm Based On Contour Coefficients - PMC
No ratings yet
A Differential Privacy Protecting K-Means Clustering Algorithm Based On Contour Coefficients - PMC
22 pages
3342263.3342274
No ratings yet
3342263.3342274
14 pages
Lec1a-IntroDataMining
No ratings yet
Lec1a-IntroDataMining
42 pages
Research Proposal
No ratings yet
Research Proposal
17 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Frequent Itemsets Mining With Differential Privacy Over Large-Scale Data
No ratings yet
Frequent Itemsets Mining With Differential Privacy Over Large-Scale Data
34 pages
SPEML SS2023-Lecture2
No ratings yet
SPEML SS2023-Lecture2
116 pages
MCS-034: Software Engineering
From Everand
MCS-034: Software Engineering
Dr. DK Sukhani
No ratings yet
Distributed DP in Mixnets
No ratings yet
Distributed DP in Mixnets
38 pages
04 - Chapter 3 - Privacy
No ratings yet
04 - Chapter 3 - Privacy
61 pages
An LLM-Based Framework for Synthetic Data Generation
No ratings yet
An LLM-Based Framework for Synthetic Data Generation
11 pages
Science and Technology Art Installation
No ratings yet
Science and Technology Art Installation
12 pages
UT Dallas Syllabus For cs7301.002.08f Taught by Murat Kantarcioglu (mxk055100)
No ratings yet
UT Dallas Syllabus For cs7301.002.08f Taught by Murat Kantarcioglu (mxk055100)
7 pages
Machine Learning for Cyber Security 1st edition by Preeti Malik, Lata Nautiyal, Mangey Ram 3110766736Â 978-3110766738 - Get the ebook instantly with just one click
100% (6)
Machine Learning for Cyber Security 1st edition by Preeti Malik, Lata Nautiyal, Mangey Ram 3110766736Â 978-3110766738 - Get the ebook instantly with just one click
90 pages
Differential Privacy in Deep Learning: An Overview
No ratings yet
Differential Privacy in Deep Learning: An Overview
6 pages
Privacy-Preserving Machine Learning Techniques For Data in Multi Cloud Environments
No ratings yet
Privacy-Preserving Machine Learning Techniques For Data in Multi Cloud Environments
18 pages
A Decentralized Approach To Threat
No ratings yet
A Decentralized Approach To Threat
20 pages
March 29, 2020: Nicaragua Mobility Changes
No ratings yet
March 29, 2020: Nicaragua Mobility Changes
3 pages
Deep Learning With Differential Privacy
No ratings yet
Deep Learning With Differential Privacy
14 pages
Nist SP 800 226 1702733918
No ratings yet
Nist SP 800 226 1702733918
64 pages
NDPR and ISO 27701 - Privacy Management System (PIMS) 2024
No ratings yet
NDPR and ISO 27701 - Privacy Management System (PIMS) 2024
92 pages
Federated Graph Contrastive Learning
No ratings yet
Federated Graph Contrastive Learning
14 pages
Privacy-Preserving Social Media Data Publishing For Personalized Ranking-Based Recommendation
No ratings yet
Privacy-Preserving Social Media Data Publishing For Personalized Ranking-Based Recommendation
15 pages
Maintaining Privacy in Medical Imaging With Federated Learning Deep Learning Differential Privacy and Encrypted Computation
No ratings yet
Maintaining Privacy in Medical Imaging With Federated Learning Deep Learning Differential Privacy and Encrypted Computation
6 pages
A Survey On Differential Privacy For Unstructured Data Content
No ratings yet
A Survey On Differential Privacy For Unstructured Data Content
28 pages
Machine Learning for Cyber Security 1st edition by Preeti Malik, Lata Nautiyal, Mangey Ram 3110766736Â 978-3110766738 - Own the complete ebook set now in PDF and DOCX formats
100% (19)
Machine Learning for Cyber Security 1st edition by Preeti Malik, Lata Nautiyal, Mangey Ram 3110766736Â 978-3110766738 - Own the complete ebook set now in PDF and DOCX formats
79 pages
Achieving Privacy-Preserving Online Multi-Layer Perceptron Model in Smart Grid
No ratings yet
Achieving Privacy-Preserving Online Multi-Layer Perceptron Model in Smart Grid
12 pages
Plagiarism Report
No ratings yet
Plagiarism Report
22 pages
LNCS 10677 Theory of Cryptography
100% (1)
LNCS 10677 Theory of Cryptography
814 pages
Zhong 2019 J. Phys. Conf. Ser. 1168 032084
No ratings yet
Zhong 2019 J. Phys. Conf. Ser. 1168 032084
9 pages
Privacy-Preserving Data Analysis - A Survey
No ratings yet
Privacy-Preserving Data Analysis - A Survey
3 pages
Opinion 05/2014 On Anonymisation Techniques
No ratings yet
Opinion 05/2014 On Anonymisation Techniques
37 pages
Differential Privacy Protection On Weighted Graph in Wirel - 2021 - Ad Hoc Netwo
No ratings yet
Differential Privacy Protection On Weighted Graph in Wirel - 2021 - Ad Hoc Netwo
10 pages
Security Aspect in Iov: Alochana Chakra Journal Issn No:2231-3990
No ratings yet
Security Aspect in Iov: Alochana Chakra Journal Issn No:2231-3990
7 pages
DATABASE Sasta Sem Notes
No ratings yet
DATABASE Sasta Sem Notes
24 pages
Responsible AI
No ratings yet
Responsible AI
45 pages
Project Report
No ratings yet
Project Report
34 pages
Subsampling Suffices For Adaptive Data Analysis
No ratings yet
Subsampling Suffices For Adaptive Data Analysis
44 pages
Privacy-Preserving Machine Learning 1st Edition J. Morris Chang instant download
No ratings yet
Privacy-Preserving Machine Learning 1st Edition J. Morris Chang instant download
48 pages

Event Data Privacy

Uploaded by

Event Data Privacy

Uploaded by

Event level data and

A user-level database is a database where each row corresponds to a

For such databases, neighboring databases are understood as databases

For this type of database, neighboring databases are understood as databases

In event-level databases a single individual can be represented by a fixed number

in each of the following months:

When Є = 0.1, the definition of Є-differential privacy ensures that

● In general, two datasets as neighboring if their distance is 1

● Multiple rows may correspond to the actions of a single user.

Visits to Top 500 Domains

You might also like