1 Data Leakage
1 Data Leakage
Bhopal
University Institute of Technology,
Narsingharh bypass road,Near gandhi nagar,Bhopal(M.P)
SESSION:2019-20
PROJECT REPORT ON
DATA LEAKAGE DETECTION
In the course of doing business, sometimes sensitive data must be handed over to
supposedly trusted third parties. For example, a hospital may give patient records to
researchers who will devise new treatments. Similarly, a company may have partnerships
with other companies that require sharing customer data. Another enterprise may outsource
its data processing, so data must be given to various other companies. Our goal is to detect
when the distributor’s sensitive data has been leaked by agents, and if possible to identify
the agent that leaked the data. Perturbation is a very useful technique where the data is
modified and made “less sensitive” before being handed to agents
EXISTING SYSTEM
Perturbation is a very useful technique where the data is modified and made “less
sensitive” before being handed to agents. For example, one can add random noise to
However, in some cases it is important not to alter the original distributor’s data.
Traditionally, leakage detection is handled by watermarking, e.g., a unique code is
embedded in each distributed copy.
Creating a watermark
If that copy is later discovered in the hands of an unauthorized party, the leaker can be
identified.
Watermarks can be very useful in some cases, but again, involve some modification of the
original data.
Watermarks can be very useful in some cases, but again, involve some modification of the
original data. Furthermore, watermarks can sometimes be destroyed if the data recipient is
malicious. E.g. A hospital may give patient records to researchers who will devise new
treatments. Similarly, a company may have partnerships with other companies that require
sharing customer data. Another enterprise may outsource its data processing, so data must be
given to various other companies. We call the owner of the data the distributor and the
supposedly trusted third parties the agents.
PROPOSED SYSTEM:
Our goal is to detect when the distributor's sensitive data has been leaked by agents, and if
possible to identify the agent that leaked the data.
Perturbation is a very useful technique where the data is modified and made "less sensitive"
before being handed to agents. We develop unobtrusive techniques for detecting leakage of a
set of objects or records.
Unobstrusive Techniques: Unobtrusive technique is a technique of data collection. They
describe methodologies which do not involve direct elicitation of data from the research
subjects. The unobtrusive approach often seeks unusual data sources.
We develop a model for assessing the "guilt" of agents.
We also present algorithms for distributing objects to agents, in a way that improves our
chances of identifying a leaker.
Finally, we also consider the option of adding "fake" objects to the distributed set. Such
objects do not correspond to real entities but appear realistic to the agents.
In a sense, the fake objects acts as a type of watermark for the entire set, without modifying
any individual members. If it turns out an agent was given one or more fake objects that
were leaked, then the distributor can be more confident that agent was guilty.
Typical Block Diagram Showing the Process of Data Loss In Blocking spam.
Problem Setup and Notation:
A distributor owns a set T={t1,…,tm}of valuable data objects. The distributor wants to share
some of the objects with a set of agents U1,U2,…Un, but does not wish the objects be leaked
to other third parties. The objects in T could be of any type and size, e.g., they could be
tuples in a relation, or relations in a database. An agent Ui receives a subset of objects,
determined either by a sample request or an explicit request:
1. Sample request
2. Explicit request
our model parameters interact and to check if the interactions match our intuition, in this
section we study two simple scenarios as Impact of Probability p and Impact of Overlap
between Ri and S. In each scenario we have a target that has obtained all the distributor’s
objects, i.e., T = S.
Algorithms:
In the first place, the goal of these experiments was to see whether fake objects in the
distributed data sets yield significant improvement in our chances of detecting a guilty
agent. In the second place, we wanted to evaluate our e-optimal algorithm relative to a
random allocation.
With sample data requests agents are not interested in particular objects. Hence, object
sharing is not explicitly defined by their requests. The distributor is “forced” to allocate
certain objects to multiple agents only if the number of requested objects exceeds the
number of objects in set T. The more data objects the agents request in total, the more
recipients on average an object has; and the more objects are shared among different agents,
the more difficult it is to detect a guilty agent.
Hardware Requirements
SYSTEM : Pentium IV 2.4 GHz
HARD DISK : 40 GB
FLOPPY DRIVE : 1.44 MB
MONITOR : 15 VGA colour
MOUSE : Logitech.
RAM : 256 MB
KEYBOARD : 110 keys enhanced.
Software Requirements
Operating system :- Windows XP Professional
Front End :- Microsoft Visual Studio .Net 2005
Coding Language :- C#
Database :- SQL SERVER 2000
MODULE DESCRIPTION:
1) Login / Registration:
access the other modules of the project. Here a user can have the accessibility authority after
the registration.
2) DATA TRANSFER:
This module is mainly designed to transfer data from distributor to agents. The same
module can also be used for illegal data transfer from authorized to agents to other agents
This module is designed using the agent – guilt model. Here a count value(also
called as fake objects) are incremented for any transfer of data occurrence when agent
transfers data. Fake objects are stored in database.
4)AGENT-GUILT MODEL:
This module is mainly designed for determining fake agents. This module uses fake
objects (which is stored in database from guilt model module) and determines the guilt agent
along with the probability. A graph is used to plot the probability distribution of data which
is leaked by fake agents
ACTIVITY DIAGRAM :
LOGIN
FIND GUILT
AGENTS
FIND PROBABILITY OF
DATA LEAKAGE
ARCHITECTURE-DIAGRAM
CONCLUSION
The likelihood that an agent is responsible for a leak is assessed, based on the overlap
of his data with the leaked data and the data of other agents, and based on the
probability that objects can be “guessed” by other means. The algorithms we have
presented implement a variety of data distribution strategies that can improve the
distributor’s chances of identifying a leaker. We have shown that distributing objects
judiciously can make a significant difference in identifying guilty agents, especially in
cases where there is large overlap in the data that agents must receive.