0% found this document useful (0 votes)
141 views

1 Data Leakage

This document is a project report on data leakage detection that was submitted by Ranjana Singh Maravi to Prof. Dr. Shikha Agrawal. The report proposes techniques for detecting data leakage from agents without modifying the original data. It develops a guilt model to assess the likelihood an agent leaked data based on overlap between the agent's data and leaked data. Algorithms are presented for distributing data to agents in a way that improves chances of identifying leakers, such as distributing fake objects. The goal is to detect data leakage from agents and identify guilty agents without watermarking or altering original data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views

1 Data Leakage

This document is a project report on data leakage detection that was submitted by Ranjana Singh Maravi to Prof. Dr. Shikha Agrawal. The report proposes techniques for detecting data leakage from agents without modifying the original data. It develops a guilt model to assess the likelihood an agent leaked data based on overlap between the agent's data and leaked data. Algorithms are presented for distributing data to agents in a way that improves chances of identifying leakers, such as distributing fake objects. The goal is to detect data leakage from agents and identify guilty agents without watermarking or altering original data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Rajiv Gandhi Proudyogiki Vishwavidyalaya,

Bhopal
University Institute of Technology,
Narsingharh bypass road,Near gandhi nagar,Bhopal(M.P)

SESSION:2019-20

PROJECT REPORT ON
DATA LEAKAGE DETECTION

DEPARTMENT OF COMPUTER SCIENCE &


ENGINEERING

Submitted To- Submitted by:


Prof. Dr. Shikha Agrawal Ranjana Singh Maravi
Roll no. 0101CS171086
Semester- 6th
ABSTRACT
A data distributor has given sensitive data to a set of supposedly trusted agents (third
parties). Some of the data is leaked and found in an unauthorized place (e.g., on the web or
somebody’s laptop). The distributor must assess the likelihood that the leaked data came
from one or more agents, as opposed to having been independently gathered by other
means. We propose data allocation strategies (across the agents) that improve the probability
of identifying leakages. These methods do not rely on alterations of the released data (e.g.,
watermarks). In some cases we can also inject “realistic but fake” data records to further
improve our chances of detecting leakage and identifying the guilty party.
INTRODUCTION-

In the course of doing business, sometimes sensitive data must be handed over to
supposedly trusted third parties. For example, a hospital may give patient records to
researchers who will devise new treatments. Similarly, a company may have partnerships
with other companies that require sharing customer data. Another enterprise may outsource
its data processing, so data must be given to various other companies. Our goal is to detect
when the distributor’s sensitive data has been leaked by agents, and if possible to identify
the agent that leaked the data. Perturbation is a very useful technique where the data is
modified and made “less sensitive” before being handed to agents

EXISTING SYSTEM

 We consider applications where the original sensitive data cannot be perturbed.

Perturbation is a very useful technique where the data is modified and made “less

sensitive” before being handed to agents. For example, one can add random noise to

certain attributes, or one can replace exact values by ranges.

Graph showing perturbation

 However, in some cases it is important not to alter the original distributor’s data.
 Traditionally, leakage detection is handled by watermarking, e.g., a unique code is
embedded in each distributed copy.

Creating a watermark

 If that copy is later discovered in the hands of an unauthorized party, the leaker can be
identified.

 Watermarks can be very useful in some cases, but again, involve some modification of the
original data.

 Furthermore, watermarks can sometimes be destroyed if the data recipient is malicious.

Disadvantages of Existing Systems:

 Watermarks can be very useful in some cases, but again, involve some modification of the
original data. Furthermore, watermarks can sometimes be destroyed if the data recipient is
malicious. E.g. A hospital may give patient records to researchers who will devise new
treatments. Similarly, a company may have partnerships with other companies that require
sharing customer data. Another enterprise may outsource its data processing, so data must be
given to various other companies. We call the owner of the data the distributor and the
supposedly trusted third parties the agents.
PROPOSED SYSTEM:

 Our goal is to detect when the distributor's sensitive data has been leaked by agents, and if
possible to identify the agent that leaked the data.
 Perturbation is a very useful technique where the data is modified and made "less sensitive"
before being handed to agents. We develop unobtrusive techniques for detecting leakage of a
set of objects or records.
 Unobstrusive Techniques: Unobtrusive technique is a technique of data collection. They
describe methodologies which do not involve direct elicitation of data from the research
subjects. The unobtrusive approach often seeks unusual data sources.
 We develop a model for assessing the "guilt" of agents.
 We also present algorithms for distributing objects to agents, in a way that improves our
chances of identifying a leaker.
 Finally, we also consider the option of adding "fake" objects to the distributed set. Such
objects do not correspond to real entities but appear realistic to the agents.
In a sense, the fake objects acts as a type of watermark for the entire set, without modifying
any individual members. If it turns out an agent was given one or more fake objects that
were leaked, then the distributor can be more confident that agent was guilty.

Typical Block Diagram Showing the Process of Data Loss In Blocking spam.
Problem Setup and Notation:

 A distributor owns a set T={t1,…,tm}of valuable data objects. The distributor wants to share
some of the objects with a set of agents U1,U2,…Un, but does not wish the objects be leaked
to other third parties. The objects in T could be of any type and size, e.g., they could be
tuples in a relation, or relations in a database. An agent Ui receives a subset of objects,
determined either by a sample request or an explicit request:

1. Sample request
2. Explicit request

Guilt Model Analysis:

our model parameters interact and to check if the interactions match our intuition, in this
section we study two simple scenarios as Impact of Probability p and Impact of Overlap
between Ri and S. In each scenario we have a target that has obtained all the distributor’s
objects, i.e., T = S.

Algorithms:

1. Evaluation of Explicit Data Request Algorithms

In the first place, the goal of these experiments was to see whether fake objects in the
distributed data sets yield significant improvement in our chances of detecting a guilty
agent. In the second place, we wanted to evaluate our e-optimal algorithm relative to a
random allocation.

2. Evaluation of Sample Data Request Algorithms

With sample data requests agents are not interested in particular objects. Hence, object
sharing is not explicitly defined by their requests. The distributor is “forced” to allocate
certain objects to multiple agents only if the number of requested objects exceeds the
number of objects in set T. The more data objects the agents request in total, the more
recipients on average an object has; and the more objects are shared among different agents,
the more difficult it is to detect a guilty agent.
Hardware Requirements
 SYSTEM : Pentium IV 2.4 GHz
 HARD DISK : 40 GB
 FLOPPY DRIVE : 1.44 MB
 MONITOR : 15 VGA colour
 MOUSE : Logitech.
 RAM : 256 MB
 KEYBOARD : 110 keys enhanced.

Software Requirements
 Operating system :- Windows XP Professional
 Front End :- Microsoft Visual Studio .Net 2005
 Coding Language :- C#
 Database :- SQL SERVER 2000

MODULE DESCRIPTION:

1) Login / Registration:

This is a module mainly designed to provide the authority to a user in order to

access the other modules of the project. Here a user can have the accessibility authority after

the registration.

2) DATA TRANSFER:

This module is mainly designed to transfer data from distributor to agents. The same
module can also be used for illegal data transfer from authorized to agents to other agents

3) GUILT MODEL ANALYSIS:

This module is designed using the agent – guilt model. Here a count value(also
called as fake objects) are incremented for any transfer of data occurrence when agent
transfers data. Fake objects are stored in database.
4)AGENT-GUILT MODEL:
This module is mainly designed for determining fake agents. This module uses fake
objects (which is stored in database from guilt model module) and determines the guilt agent
along with the probability. A graph is used to plot the probability distribution of data which
is leaked by fake agents

ACTIVITY DIAGRAM :

LOGIN

DISTRIBUTE DATA VIEW DATA DISTRIBUTED


TO AGENTS TO AGENTS

FIND GUILT
AGENTS

FIND PROBABILITY OF
DATA LEAKAGE
ARCHITECTURE-DIAGRAM

CONCLUSION

 The likelihood that an agent is responsible for a leak is assessed, based on the overlap
of his data with the leaked data and the data of other agents, and based on the
probability that objects can be “guessed” by other means. The algorithms we have
presented implement a variety of data distribution strategies that can improve the
distributor’s chances of identifying a leaker. We have shown that distributing objects
judiciously can make a significant difference in identifying guilty agents, especially in
cases where there is large overlap in the data that agents must receive.

You might also like