0% found this document useful (0 votes)

64 views

Privacy Preserving Data Mining

This document summarizes a paper on privacy-preserving data mining. It discusses how two parties can jointly compute a data mining algorithm like decision tree learning on their private databases without revealing unnecessary private information. It presents an efficient protocol for privately computing the ID3 decision tree algorithm based on secret sharing and oblivious transfer. The parties learn only the output tree structure and not each other's private inputs or intermediate values. This allows meaningful data mining without privacy breaches.

Uploaded by

Prasanthi Prasu

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views

Privacy Preserving Data Mining

Uploaded by

Prasanthi Prasu

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

You are on page 1/ 80

Privacy Preserving Data Mining

Yehuda Lindell Benny Pinkas

Presenter: Justin Brickell

Mining Joint Databases

Parties P1 and P2 own databases D1 and D2 f is a data mining algorithm

Compute f(D1 D2) without revealing unnecessary information

Unnecessary Information
Intuitively, the protocol should function as if a trusted third party computed the output
P1 D1 f(D1 D2) D2 f(D1 D2) P2

TTP

Simulation
Let msg(P2) be P2s messages If S1 can simulate msg(P2) to P1 given only P1s input and the protocol output, then msg(P2) must not contain unnecessary information (and viceversa)

S1(D1,f(D1,D2)) =C msg(P2)

More Simulation Details

The simulator S1 can also recover r1, the internal coin tosses ofP1 Can extend to allow distinctf1(x,y) and f2(x,y)
- Complicates the definition

- Not necessary for data mining applications

The Semi-Honest Model

A malicious adversary can alter his input
- f( D2) = f(D2) !

A semi-honest adversary
- adheres to protocol

- tries to learn extra information from the message transcript

General Secure Two Party Computation

Any algorithm can be made private (in the semi-honest model)
- Yaos Protocol

So, why write this paper?

- Yaos Protocol is inefficient

- This paper privately computes aparticular algorithm more efficiently

Yaos Protocol (Basically)

Convert the algorithm to a circuit P1 hard codes his input into the circuit P1 transforms each gate so that it takes garbled inputs to garbled outputs

Using 1-out-of-2 oblivious transfer, P1 sends P2 garbled versions of his inputs

Garbled Wire Values

P1 assigns to each wire i two random values (Wi0,Wi1)
- Long enough to seed a pseudo-random function F

P1 assigns to each wire i a random permutation over {0,1}, i : bi ci

Wibi,ci is the garbled value of wire i

Garbled Gates
Gate g computes bk = g(bi,bj) Garbled gate is a table Tg computing Wibi,ci Wjbj,cj Wkbk,ck
- Tg has four entries:

- ci,cj: Wkg(bi,bj),ck F [Wibi](cj) F [Wjbj](ci)

Yaos Protocol
P1 sends
- P2s garbled input bits (1-out-of-2) - Tg tables - Table from garbled output values to output bits

P2 can compute output values, butP1s input and intermediate values appear random

Cost of circuit with n inputs and m gates

Communication: m gate tables
- 4m length of pseudo-random output

Computation: n oblivious transfers

- Typically much more expensive than them pseudo-random function applications

Too expensive for data mining

Classification by Decision Tree Learning

A classic machine learning / data mining problem Develop rules for when a transaction belongs to a class based on its attribute values Smaller decision trees are better

ID3 is one particular algorithm

A Database
Outlook Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Temp Humidity Hot High Hot High Mild High Mild High Cool Normal Cool Normal Cool Normal Mild High Cool Normal Mild Normal Mild Normal Mild High Hot Normal Wind Play Tennis Weak No Strong No Weak Yes Weak Yes Weak Yes Strong No Strong Yes Weak No Weak Yes Weak Yes Strong Yes Strong Yes Weak Yes

Rain

Mild

High

Strong

and its Decision Tree

Outlook Sunny Humidity High Normal Yes Rain Overcast Yes Wind Strong No Weak Yes

The ID3 Algorithm: Definitions

R: The set of attributes
- Outlook, Temperature, Humidity, Wind

C: the class attribute

- Play Tennis

T: the set of transactions

- The 14 database entries

The ID3 Algorithm

ID3(R,C,T) If R is empty, return a leaf-node with the most common class value in T If all transactions in T have the same class value c, return the leaf-nodec Otherwise,
- Determine the attribute A that best classifies T - Create a tree node labeled A, recur to compute child trees

- edge ai goes to tree ID3(R - {A},C,T(ai))

The Best Predicting Attribute

Entropy! l
HC T

( )=
(

T (c )
i

log

T ci

( )
T

i= i

T
m

HC T | A

)=
j= i

T aj T

( )

Gain(A) =def HC(T) - HC(T|A)

( )

HC T aj

( )

Find A with maximum gain

Why can we do better than Yao?

Normally, private protocols must hide intermediate values In this protocol, the assignment of attributes to nodes ispart of the output and may be revealed
- H values are not revealed, just the identity of the attribute with greatest gain

This allows genuine recursion

How do we do it?
Rather than maximize gain, minimize
- HC(T|A) =def HC(T|A)|T|ln 2

This has the simple formula

HC T | A

)=
m l j= 1i= i

T aj,ci

( ) ln(

) ()( )
T aj
j= 1

+
m

Terms have form (v1+v2)ln(v1+v2)

T aj,ci

( ) T(a )
j

- P1 knows v1, P2 knows v2

Private x ln x
Input: P1s value v1, P2s value v2 Auxiliary Input: A large field F Output: P1 obtains w1 F, P2 obtains w2 F
- w1 + w2 (v1 + v2)ln(v1+v2)

- w1 and w2 are uniformly distributed in F

Private x ln x: some intuition

Compute shares of x and ln x, then privately multiply Shares of ln x are actually shares of n and where x = 2n(1+)
- -1/2 1/2

Uses Taylor expansions

Using the x ln x protocol

For every attribute A, every attributevalue aj A, and every class ci C
- wA,1(aj), wA,2(aj), wA,1(aj,ci), wA,2(aj,ci) - wA,1(aj) + wA,2(aj) |T(aj)|ln(|T(aj)| - wA,1(aj,ci) + wA,2(aj,ci)

|T(aj,ci)|ln(|T(aj,ci)|

Shares of Relative Entropy

P1 and P2 can locally compute shares SA,1 + SA,2 HC(T|A)

Now, use the Yao protocol to find theA with minimum Relative Entropy!

A Technical Detail
The logarithms are only approximate
- ID3 algorithm - Doesnt distinguish relative entropies within

Complexity for each node

For |R| attributes, m attribute values, and l class values
- x ln x protocol is invoked O(m l |R|) times - Each requires O(log|T|) oblivious transfers - And bandwidth O(k log|T| |S|) bits
k depends logarithmically on

Depends only logarithmically on |T|

Only k|S| worse that non-private distributed ID3

Conclusion
Private computation of ID3(D1 D2) is made feasible Using Yaos protocol directly would be impractical

Questions?

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (82)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Stochastic Dynamics and Irreversibility: Tânia Tomé Mário J. de Oliveira
100% (2)
Stochastic Dynamics and Irreversibility: Tânia Tomé Mário J. de Oliveira
402 pages
Tutorial Part I Information Theory Meets Machine Learning Tuto - Slides - Part1
No ratings yet
Tutorial Part I Information Theory Meets Machine Learning Tuto - Slides - Part1
46 pages
Concrete Specification
No ratings yet
Concrete Specification
9 pages
Necessity For Forecasting Demand
No ratings yet
Necessity For Forecasting Demand
29 pages
Usermanual Swat Cup
100% (1)
Usermanual Swat Cup
100 pages
BS en 00459-2-2010
100% (1)
BS en 00459-2-2010
68 pages
Data Mining
No ratings yet
Data Mining
52 pages
Privacy, Integrity, and Incentive Compatibility in Computations With Untrusted Parties
No ratings yet
Privacy, Integrity, and Incentive Compatibility in Computations With Untrusted Parties
45 pages
2014 Practical 2
No ratings yet
2014 Practical 2
16 pages
Lab6
No ratings yet
Lab6
9 pages
A Course in Cryptography PDF
No ratings yet
A Course in Cryptography PDF
204 pages
Siva Sankar
No ratings yet
Siva Sankar
6 pages
PHD 2011 01
No ratings yet
PHD 2011 01
127 pages
Efficient Privacy-Preserving Biometric Identification
No ratings yet
Efficient Privacy-Preserving Biometric Identification
14 pages
Distributed DP in Mixnets
No ratings yet
Distributed DP in Mixnets
38 pages
Pointcheval D. (Coordinated) Asymmetric Cryptography...2022
No ratings yet
Pointcheval D. (Coordinated) Asymmetric Cryptography...2022
301 pages
IBE2ABE
No ratings yet
IBE2ABE
7 pages
(22990984 - Proceedings On Privacy Enhancing Technologies) SoK - Modular and Efficient Private Decision Tree Evaluation
No ratings yet
(22990984 - Proceedings On Privacy Enhancing Technologies) SoK - Modular and Efficient Private Decision Tree Evaluation
22 pages
Differential Privacy Preserving Using TensorFlow DP-SGD and 2D-CNN For Large-Scale Image Data
No ratings yet
Differential Privacy Preserving Using TensorFlow DP-SGD and 2D-CNN For Large-Scale Image Data
9 pages
Privacy Preserving Decision Tree Learning PDF
No ratings yet
Privacy Preserving Decision Tree Learning PDF
12 pages
Secure Tensor Decomposition Using Fully Homomorphic Encryption Scheme
No ratings yet
Secure Tensor Decomposition Using Fully Homomorphic Encryption Scheme
8 pages
Privately Learning Markov Random Fields: Huanyu Zhang Gautam Kamath Janardhan Kulkarni Zhiwei Steven Wu August 17, 2020
No ratings yet
Privately Learning Markov Random Fields: Huanyu Zhang Gautam Kamath Janardhan Kulkarni Zhiwei Steven Wu August 17, 2020
29 pages
Foundations of Cryptography: Lecturer: Moni Naor
No ratings yet
Foundations of Cryptography: Lecturer: Moni Naor
70 pages
Direct Anonymous Attestation: Cse, 7th Sem
No ratings yet
Direct Anonymous Attestation: Cse, 7th Sem
8 pages
Private Machine Learning in Tensorflow Using Secure Computation
No ratings yet
Private Machine Learning in Tensorflow Using Secure Computation
6 pages
Yao 1986
No ratings yet
Yao 1986
6 pages
Practical Secure Aggregation For Federated Learning On User Held Data
No ratings yet
Practical Secure Aggregation For Federated Learning On User Held Data
5 pages
(IJCST-V12I4P8) :pankaj Sarde, Vaishali Sarde
No ratings yet
(IJCST-V12I4P8) :pankaj Sarde, Vaishali Sarde
5 pages
A Distributed and Privacy-Preserving Random Forest
No ratings yet
A Distributed and Privacy-Preserving Random Forest
18 pages
On Differentially Private String Distances
No ratings yet
On Differentially Private String Distances
25 pages
Security Proof For The Improved Ryu-Yoon-Yoo Identity-Based Key Agreement Protocol
No ratings yet
Security Proof For The Improved Ryu-Yoon-Yoo Identity-Based Key Agreement Protocol
18 pages
Parker Thomas 2023 April MSC Homomorphic Encryption For Parallel Machine Learning Graphs
No ratings yet
Parker Thomas 2023 April MSC Homomorphic Encryption For Parallel Machine Learning Graphs
76 pages
Decision Tree Classifier On Private Data
No ratings yet
Decision Tree Classifier On Private Data
8 pages
electronics-10-01367-v2
No ratings yet
electronics-10-01367-v2
13 pages
Privacy-Preserving Data Mining in The Malicious Model: Murat Kantarcioglu
No ratings yet
Privacy-Preserving Data Mining in The Malicious Model: Murat Kantarcioglu
23 pages
7
No ratings yet
7
10 pages
Privacy-Preserving Machine Learning in TensorFlow With TF Encrypted Presentation PDF
No ratings yet
Privacy-Preserving Machine Learning in TensorFlow With TF Encrypted Presentation PDF
49 pages
Complexity, The Changing Minimum and Closest Pair: 1 Las Vegas and Monte Carlo Algorithms
No ratings yet
Complexity, The Changing Minimum and Closest Pair: 1 Las Vegas and Monte Carlo Algorithms
5 pages
2022 04 09-Accu-Vc
No ratings yet
2022 04 09-Accu-Vc
27 pages
pinkas2002
No ratings yet
pinkas2002
8 pages
Tracing Eurocrypt06
No ratings yet
Tracing Eurocrypt06
23 pages
Ijret - Privacy Preserving Data Mining in Four Group Randomized Response Technique Using Id3 and Cart Algorithm
No ratings yet
Ijret - Privacy Preserving Data Mining in Four Group Randomized Response Technique Using Id3 and Cart Algorithm
5 pages
Mini Project On Machine Learning: Support Vector Machine
No ratings yet
Mini Project On Machine Learning: Support Vector Machine
8 pages
2013 GGH
No ratings yet
2013 GGH
57 pages
Applications of Pailliers Cryptosystem
No ratings yet
Applications of Pailliers Cryptosystem
152 pages
Federated Learning With Differential Privacy Algorithms and Performance Analysis
No ratings yet
Federated Learning With Differential Privacy Algorithms and Performance Analysis
16 pages
thesis (1)
No ratings yet
thesis (1)
243 pages
2017 - Privacy-Preserving
No ratings yet
2017 - Privacy-Preserving
20 pages
Lecture Notes On Applied Cryptography
No ratings yet
Lecture Notes On Applied Cryptography
119 pages
Introduction To Privacy Preserving Data Publishing Concepts and Techniques Chapman Hall CRC Data Mining and Knowledge Discovery Series
No ratings yet
Introduction To Privacy Preserving Data Publishing Concepts and Techniques Chapman Hall CRC Data Mining and Knowledge Discovery Series
355 pages
Yao's Millionaires' Problem
No ratings yet
Yao's Millionaires' Problem
11 pages
Research Paper 2
No ratings yet
Research Paper 2
9 pages
Asymmetric Cryptography Primitives and Protocols 1st Edition David Pointcheval - Download the ebook now for full and detailed access
100% (4)
Asymmetric Cryptography Primitives and Protocols 1st Edition David Pointcheval - Download the ebook now for full and detailed access
80 pages
COM3030 Week 10 Slides (2)
No ratings yet
COM3030 Week 10 Slides (2)
63 pages
Project
No ratings yet
Project
69 pages
Data Anonymi
No ratings yet
Data Anonymi
34 pages
Graphs in Fields Poster 2019
No ratings yet
Graphs in Fields Poster 2019
1 page
New Cryptography Algorithm To Provide Security For Wireless Sensor Network
No ratings yet
New Cryptography Algorithm To Provide Security For Wireless Sensor Network
3 pages
1) Kruskal's Minimal Spanning Tree Ans: - The Edges Are Considered in The Non Decreasing Order. To Get The Minimum Cost
No ratings yet
1) Kruskal's Minimal Spanning Tree Ans: - The Edges Are Considered in The Non Decreasing Order. To Get The Minimum Cost
5 pages
Cryptography Assignment
100% (1)
Cryptography Assignment
14 pages
Design and Performance Analysis of New Cryptographic Algorithm For Wireless Sensor Networks & Broadcasting Applications Security
No ratings yet
Design and Performance Analysis of New Cryptographic Algorithm For Wireless Sensor Networks & Broadcasting Applications Security
4 pages
Mathematical Formulas for Economics and Business: A Simple Introduction
From Everand
Mathematical Formulas for Economics and Business: A Simple Introduction
K.H. Erickson
4/5 (4)
Analog Dialogue, Volume 47, Number 2
From Everand
Analog Dialogue, Volume 47, Number 2
Analog Dialogue
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Digital PDP15 Price List April, 1970
From Everand
Digital PDP15 Price List April, 1970
Archive Classics
1/5 (1)
The Following Are The Items Required For Chandi Homa
No ratings yet
The Following Are The Items Required For Chandi Homa
2 pages
Holographic Data Storage: Text Mining
No ratings yet
Holographic Data Storage: Text Mining
2 pages
Unit Wise Important Questions Unit - I Introduction To Management
No ratings yet
Unit Wise Important Questions Unit - I Introduction To Management
9 pages
Chola Dynasty or Chozhan Dynasty (
No ratings yet
Chola Dynasty or Chozhan Dynasty (
3 pages
Stevenson 14e Chap003
No ratings yet
Stevenson 14e Chap003
41 pages
Axdif
No ratings yet
Axdif
38 pages
FBI Gelatin Tests PDF
No ratings yet
FBI Gelatin Tests PDF
2 pages
Thermo 7e SM Chap03-1
100% (1)
Thermo 7e SM Chap03-1
80 pages
Robert Otis and Paul Haryott - Calibration of Uncertainty (P10/P90) in Exploration Prospects
No ratings yet
Robert Otis and Paul Haryott - Calibration of Uncertainty (P10/P90) in Exploration Prospects
23 pages
Extreme Value Distributions Theory and Applications PDF
No ratings yet
Extreme Value Distributions Theory and Applications PDF
2 pages
05-Forecasting V3
No ratings yet
05-Forecasting V3
68 pages
An Introduction To Categorical Data Analysis, 2Nd Ed
No ratings yet
An Introduction To Categorical Data Analysis, 2Nd Ed
13 pages
Andrade Eq
No ratings yet
Andrade Eq
10 pages
Aiv Vs Protodyakonov Method
No ratings yet
Aiv Vs Protodyakonov Method
31 pages
Group Invariance in Statistical Inference (Narayan)
100% (1)
Group Invariance in Statistical Inference (Narayan)
176 pages
9d8a26241705 PDF
No ratings yet
9d8a26241705 PDF
9 pages
Thermal Comfort Indices
No ratings yet
Thermal Comfort Indices
5 pages
Study Guide Chapter 1 (EC220)
No ratings yet
Study Guide Chapter 1 (EC220)
11 pages
Reasoning in Ai
No ratings yet
Reasoning in Ai
14 pages
Download full Introduction to Probability Models 9th Edition Sheldon M. Ross ebook all chapters
No ratings yet
Download full Introduction to Probability Models 9th Edition Sheldon M. Ross ebook all chapters
55 pages
WG - Calculated AIR DENSITY
No ratings yet
WG - Calculated AIR DENSITY
2 pages
Engle 2010 Garch 101 The Use of Arch Garch Models in Applied Econometrics
No ratings yet
Engle 2010 Garch 101 The Use of Arch Garch Models in Applied Econometrics
43 pages
AI UNIT 2
No ratings yet
AI UNIT 2
33 pages
Evans Analytics1e PPT 10
No ratings yet
Evans Analytics1e PPT 10
61 pages
El Sentimiento de Soledad
No ratings yet
El Sentimiento de Soledad
13 pages

Privacy Preserving Data Mining

Uploaded by

Privacy Preserving Data Mining

Uploaded by

Privacy Preserving Data Mining

Yehuda Lindell Benny Pinkas

Presenter: Justin Brickell

Mining Joint Databases

Compute f(D1 D2) without revealing unnecessary information

More Simulation Details

- Not necessary for data mining applications

The Semi-Honest Model

- tries to learn extra information from the message transcript

General Secure Two Party Computation

So, why write this paper?

- This paper privately computes aparticular algorithm more efficiently

Yaos Protocol (Basically)

Using 1-out-of-2 oblivious transfer, P1 sends P2 garbled versions of his inputs

Garbled Wire Values

P1 assigns to each wire i a random permutation over {0,1}, i : bi ci

Wibi,ci is the garbled value of wire i

- ci,cj: Wkg(bi,bj),ck F [Wibi](cj) F [Wjbj](ci)

Cost of circuit with n inputs and m gates

Computation: n oblivious transfers

Too expensive for data mining

Classification by Decision Tree Learning

ID3 is one particular algorithm

and its Decision Tree

The ID3 Algorithm: Definitions

C: the class attribute

T: the set of transactions

- The 14 database entries

The ID3 Algorithm

- edge ai goes to tree ID3(R - {A},C,T(ai))

The Best Predicting Attribute

Gain(A) =def HC(T) - HC(T|A)

Find A with maximum gain

Why can we do better than Yao?

This allows genuine recursion

This has the simple formula

Terms have form (v1+v2)ln(v1+v2)

- P1 knows v1, P2 knows v2

- w1 and w2 are uniformly distributed in F

Private x ln x: some intuition

Uses Taylor expansions

Using the x ln x protocol

Shares of Relative Entropy

Complexity for each node

Depends only logarithmically on |T|

Only k|S| worse that non-private distributed ID3

You might also like