Privacy Preserving Data Mining
Privacy Preserving Data Mining
Unnecessary Information
Intuitively, the protocol should function as if a trusted third party computed the output
P1 D1 f(D1 D2) D2 f(D1 D2) P2
TTP
Simulation
Let msg(P2) be P2s messages If S1 can simulate msg(P2) to P1 given only P1s input and the protocol output, then msg(P2) must not contain unnecessary information (and viceversa)
S1(D1,f(D1,D2)) =C msg(P2)
A semi-honest adversary
- adheres to protocol
Garbled Gates
Gate g computes bk = g(bi,bj) Garbled gate is a table Tg computing Wibi,ci Wjbj,cj Wkbk,ck
- Tg has four entries:
Yaos Protocol
P1 sends
- P2s garbled input bits (1-out-of-2) - Tg tables - Table from garbled output values to output bits
P2 can compute output values, butP1s input and intermediate values appear random
A Database
Outlook Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Temp Humidity Hot High Hot High Mild High Mild High Cool Normal Cool Normal Cool Normal Mild High Cool Normal Mild Normal Mild Normal Mild High Hot Normal Wind Play Tennis Weak No Strong No Weak Yes Weak Yes Weak Yes Strong No Strong Yes Weak No Weak Yes Weak Yes Strong Yes Strong Yes Weak Yes
Rain
Mild
High
Strong
No
( )=
(
T (c )
i
log
T ci
( )
T
i= i
T
m
HC T | A
)=
j= i
T aj T
( )
( )
HC T aj
( )
How do we do it?
Rather than maximize gain, minimize
- HC(T|A) =def HC(T|A)|T|ln 2
)=
m l j= 1i= i
T aj,ci
( ) ln(
) ()( )
T aj
j= 1
+
m
ln
T aj,ci
( ) T(a )
j
Private x ln x
Input: P1s value v1, P2s value v2 Auxiliary Input: A large field F Output: P1 obtains w1 F, P2 obtains w2 F
- w1 + w2 (v1 + v2)ln(v1+v2)
|T(aj,ci)|ln(|T(aj,ci)|
Now, use the Yao protocol to find theA with minimum Relative Entropy!
A Technical Detail
The logarithms are only approximate
- ID3 algorithm - Doesnt distinguish relative entropies within
Conclusion
Private computation of ID3(D1 D2) is made feasible Using Yaos protocol directly would be impractical
Questions?