04 - Chapter 3 - Privacy
04 - Chapter 3 - Privacy
2
INTRODUCTION
Data mining is the process of discovering interesting patterns
and knowledge from large amounts of data
Data mining has been successfully applied to many domains,
such as business intelligence, Web search, scientific discovery,
digital libraries, etc.
The term ``data mining'' is often treated as a synonym for
another term ``knowledge discovery from data'' (KDD) which
highlights the goal of the mining process.
3
4
2. Big data and privacy concerns
11
https://www.youtube.com/watch?v=uaaC57tcci0
Social Dillema
Communications technologies and Big Data analysis have
facilitated the intrusion of privacy by devising and strengthening
audio-visual surveillance and “dataveilance”
Governments have used these technologies for continuous and
massive collection and collation of data from our private spaces.
Big Data phenomena are a constellation of data storage and
processing extensions to modern communications technologies
that have given rise to further, new modes of privacy intrusions
that were not anticipated when much more primitive
communications and eavesdropping technologies gave rise to
the existing privacy laws.
13
Big data defined
Volume
The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This
can be data of unknown value, such as Twitter data feeds, clickstreams on a web page or a mobile app, or sensor-
enabled equipment. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of
petabytes.
Velocity
Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams
directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near
real time and will require real-time evaluation and action.
Variety
Variety refers to the many types of data that are available. Traditional data types were structured and fit neatly in a
relational database. With the rise of big data, data comes in new unstructured data types. Unstructured and
semistructured data types, such as text, audio, and video, require additional preprocessing to derive meaning and
support metadata.
https://www.oracle.com/ae/big-data/what-is-big-data/
https://www.marketsandmarkets.com/
Big data can be used:
- to identify more general trends and correlations
individuals.
It is not the volume, velocity, variety or veracity what
worries me, but the uses of the information.
- The uses of the data are not determined before collection.
17
Risks
Big Data may also pose significant risks for the protection of
personal data and the right to privacy:
a) the sheer scale of data collection, tracking and profiling;
b) the security of data;
c) the transparency, which implies sufficient information given to
individuals;
d) inaccuracy, discrimination, exclusion and economic
imbalance;
e) increased possibilities of government surveillance
18
THE CHALLENGE OF BIG DATA FOR DATA
PROTECTION
19
Big data also shows the importance of
harmonization, or even standardization, in data
protection standards. As personal data are universally
collected and shared across sectorial and national
boundaries, inconsistent data protection laws pose
increasing threats to individuals, institutions, and
society
20
Perhaps the greatest impact of big data is the pressure it
brings for new thoughtful, informed, multinational
debate about the key principles that should
undergird data protection. Most data protection
laws continue to rely on the 1980 OECD Guidelines
21
Data Mining and Society
How does data mining impact society? What steps can data mining take to preserve
the
privacy of individuals? Do we use data mining in our daily lives without even knowing
that we do? These questions raise the following issues:
Social impacts of data mining: With data mining penetrating our everyday lives, it is
important to study the impact of data mining on society.How can we use data mining
technology to benefit society? How can we guard against its misuse? The improper
disclosure or use of data and the potential violation of individual privacy and data
protection rights are areas of concern that need to be addressed.
Privacy-preserving data mining: Data mining will help scientific discovery, business
management, economy recovery, and security protection (e.g., the real-time discovery
of intruders and cyberattacks). However, it poses the risk of disclosing an
individual’s personal information. Studies on privacy-preserving data publishing and
data mining are ongoing. The philosophy is to observe data sensitivity and preserve
people’s privacy while performing successful data mining.
Invisible data mining: We cannot expect everyone in society to learn and master
data mining techniques. More and more systems should have data mining functions
built within so that people can perform data mining or use data mining results
simply by mouse clicking, without any knowledge of data mining algorithms.
Intelligent search engines and Internet-based stores perform such invisible data
mining by incorporating data mining into their components to improve their
functionality and performance. This is done often unbeknownst to the user. For
example, when purchasing items online, users may be unaware that the store is
likely collecting data on the buying patterns of its customers, which may be used to
recommend other items for purchase in the future.
Individual's privacy may be violated due to the
unauthorized access to personal data.
To deal with the privacy issues in data mining, a sub-field
of data mining, referred to as privacy preserving data
mining (PPDM) .
The aim of PPDM is to safeguard sensitive information
from unsanctioned disclosure, and preserve the utility of
the data.
24
The 4 type of users in Data Mining process-
Data Provider: the user who owns some data that are desired
data.
Decision Maker: the user who makes decisions based on
25
26
DATA PROVIDER
CONCERN
The major concern of a data provider is whether he can control
the sensitivity of the data he provides to others.
27
APPROACHES TO PRIVACY PROTECTION
28
2.TRADE PRIVACY FOR BENEFIT
29
3. PROVIDE FALSE DATA
30
DATA COLLECTOR
CONCERN
31
APPROACHES
34
3. ATTACK MODEL
35
PRIVACY MODELS
1
a= 2 mutual friends
a c
b=2 mutual friends
b
c=2 mutual friends
2 3 d=2 mutual friends
d e=2 mutual friends
f=2 mutual friends
f e
4
So 6-NMF
36
DATA MINER
CONCERN
37
APPROACHES
38
1. PRIVACY PRESERVING ASSOCIATION RULE MINING
Reconstruction-based approaches
39
2. PRIVACY PRESERVING CLASSIFICATION
41
DECISION MAKER
CONCERN
results
how to evaluate the credibility of the received mining results
42
APPROACHES
Legal measures.
For example, making a contract with the data miner to forbid
the miner from disclosing the mining results to a third party
43
DATA PROVENANCE
The information that helps determine the derivation history of
the data, starting from the original source
44
WEB INFORMATION CREDIBILITY
45
Privacy and Security Constraints
Individual Privacy
Nobody should know more about any entity after the data
mining than they did before
Approaches: Data Obfuscation, Value swapping
Organization Privacy
Protect knowledge about a collection of entities
Individual entity values may be known to all parties
Which entities are at which site may be secret
46
Privacy constraints don’t prevent data mining
47
Example:
Association Rules
49
Privacy-Preserving Data Mining: Who?
50
Privacy-Preserving Data Mining: Who?
Multinational Corporations
A company would like to mine its data for globally valid
results
But national laws may prevent transborder data sharing
Public use of private data
Data mining enables research studies of large populations
But these populations are reluctant to release personal
information
51
Outline
52
Technical Solutions
53
Individual Privacy:
Protect the “record”
54
Individually Identifiable Information
55
Collection Privacy
56
Collection Privacy Example:
Corporate Phone Book
Telephone Directory discloses
how to contact an individual
Intended use
Data Mining can find more
Relative sizes of departments
Use to predict corporate plans?
Data
Possible Solution: Obfuscation Mining
Fake entries in phone book
Doesn’t prevent intended use
Key: Define Intended Use
Not always easy! Unexpectedly High
Number of
Energy Traders
Sources of Constraints
Regulatory requirements
Contractual constraints
Posted privacy policy
Corporate agreements
Secrecy concerns
Secrets whose release could jeopardize plans
Public Relations – “bad press”
58
European Union Data Protection Directives
Directive 95/46/EC
Passed European Parliament 24 October 1995
Goal is to ensure free flow of information
Must preserve privacy needs of member states
Effective October 1998
GDPR - General Data Protection Regulation
seeks to regulate the use and disclosure of the personal data of all individuals within the 28 EU
member states. Though passed into law in May 2016, it does not become enforceable until May
25, 2018.
Unlike most privacy regulations in the U.S., the EU defines the term “personal data” broadly—
it includes “any information relating to an identified or identifiable natural person (the ‘data
subject’).”
This means that even the most basic contact information, such as business card details or simply a
name and email address, falls under the GDPR’s protections. Public sources of information, such
as a residential phone listing, are not exempted from the GDPR’s restrictions.
59
Technology Threats to Data Privacy
61
Thank you
62