Data Leakage Detection Complete Project Report
Data Leakage Detection Complete Project Report
INTRODUCTION
Page 1
CHAPTER 1
1. INTRODUCTION
1.1 Introduction to Data Leakage Detection
In the course of doing business, sometimes sensitive data must be handed over to supposedly trusted third parties. For example, a hospital may give patient records to researchers who will devise new treatments. Similarly, a company may have partnerships with other companies that Require sharing customer data. Another enterprise may outsource its data processing, so data must be given to various other companies. We call the owner of the data the distributor and the supposedly trusted third parties the agents. Our goal is to detect when the distributors sensitive Data have been leaked by agents, and if possible to identify the agent that leaked the data.
Page 2
Page 4
Page 5
LITERATURE SURVEY
Page 6
CHAPTER 2
LITRATURE SURVEY
Literature survey is the most important step in software development process. Before developing the tool it is necessary to determine the time factor, economy and company strength. Once these things are satisfied, ten next steps are to determine which operating system and language can be used for developing the tool. Once the programmers start building the tool the programmers need lot of external support. This support can be obtained from senior programmers, from book or from websites. Before building the system the above consideration are taken into account for developing the proposed system.
The guilt detection approach we present is related to the data provenance problem tracing the lineage of S objects implies essentially the detection of the guilty agents. And assume some prior knowledge on the way a data view is created out of data sources. Our problem formulation with objects and sets is more general As far as the data allocation strategies are concerned; our work is mostly relevant to watermarking that is used as a means of establishing original ownership of distributed objects. Finally, there are also lots of other works on mechanisms that allow only authorized users to access sensitive data through access control policies. Such approaches prevent in some sense data leakage by sharing information only with trusted parties. However, these policies are restrictive and may make it impossible to satisfy agents requests.
Page 7
Page 8
2.5 Algorithm
Allocation for Explicit Data Requests In this request the agent will send the request with appropriate condition. Allocation for Sample Data Requests In this request agent request does not have condition. The agent sends the request without condition as per his query he will get the data.
2.6 Deliverables
1. Client & Distributor Software (Data Detection System) installed at the server along with its database. 2. Database backup 3. Agent machine installed with the software and connected to the database at the server.
Page 9
Page 10
Sl. No
Module For
S/W
Period
for S/W
Period
for
Required for coding coding (Approx. weeks) 1. Requirement gathering 2. 3. Analysis Creating Application 4. N/A 1 N/A 1
N/A 1
5.
Final Testing
Page 11
REQUIREMENT ANALYSIS
Page 12
CHAPTER 3
REQUIREMENT ANALYSIS
3.1 SOFTWARE REQUIREMENTS
The module is written in ASP .net and C# .net. It is developed in Visual Basics Platform. Windows is the operating system chosen for the module. The database used in the project is MS-SQL server 2005 or higher.
3.2 HARDWARE REQUIREMENTS PROCESSOR: Pentium 4 or above. RAM: 256 MB or more. Hard disc Space: 500 MB to 1GB.
Page 13
SOFTWARE ENVIRONMENT
Page 14
CHAPTER 4
SOFTWARE ENVIRONMENT
4.1 INTRODUCTION TO VISUAL STUDIO 2008
To get the most out of Visual Studio .NET, you will most likely wish to tailor it to suit your style of working. With the wide variety of configuration options, both familiar and new, you'll want to take the time examine some of the various options you can set. In this document, you will be introduced to many of the different configurations and learn about the various settings in Visual Studio .NET. You will also learn about the different types of windows, including Tool windows, which can be docked to the environment or free floating, and you'll learn about Document windows.
4.2 CONFIGURATION
The first time you use Visual Studio .NET, you will be prompted for some configuration information about how you will use the environment most often. Figure 1 shows an example of the My Profile screen. The My Profile Screen allows you to set some overall environment defaults.
Page 15
Figure 3. The New Project dialog box Figure 3. The New Project dialog box allows you to create a new Solution of a particular project type ISE DEPT, CMRIT
Page 17
D:\MySamples\LoginTest\LoginTest.sln.
Page 18
Figure 4. Solution Explorer Figure 4. The Solution Explorer gives you a graphical representation of all of the files that make up your project(s)
Page 19
Figure 5. This shows us Property Window, used to set the parameters of our screen dialog box
Page 20
MODULE ANALYSIS
Page 21
CHAPTER 5
MODULE ANALYSIS
5.1 ARCHITECTURE DIAGRAM
Page 22
5.4.2
FAKE OBJECT
The distributor may be able to add fake objects to the distributed data in order to improve his effectiveness in detecting guilty agents. However, fake objects may impact the correctness of what agents do, so they may not always be allowable. The idea of perturbing data to detect leakage is not new, However, in most cases, individual objects are perturbed, e.g., by adding random noise to sensitive salaries, or adding a watermark to an image. In our case, we are perturbing the set of distributor objects by adding fake elements. In some applications, fake objects may cause fewer problems that perturbing real objects. Our use of fake objects is inspired by the use of trace records in mailing lists. For example In case, company A sells to company B a mailing list to be used once (e.g., to send advertisements). Company A adds trace records that contain addresses owned by company A. Thus, each time company B uses the purchased mailing list, A receives copies of the mailing. These records are a type of fake objects that help identify improper use of data. In many cases, the distributor may be limited in how many fake objects he can create. For example, objects may contain e-mail addresses, and each fake e-mail address may require the creation of an actual inbox (otherwise, the agent may discover that the object is fake). The inboxes can actually be monitored by the distributor: if e-mail is received from someone other than the agent who was given the address, it is ISE DEPT, CMRIT
Page 24
5.4.3
The distributors data allocation to agents has one constraint and one objective. The distributors constraint is to satisfy agents requests, by providing them with the number of objects they request or with all available objects that satisfy their conditions. His objective is to be able to detect an agent who leaks any portion of his data. We consider the constraint as strict. The distributor may not deny serving an agent request as and may not provide agents with different perturbed versions of the same objects. We consider fake object distribution as the only possible constraint relaxation.
Page 25
ALGORITHM 1: R Agents that can receive fake objects 2: for i = 1, . . . , n do 3: if bi > 0 then 4: R R {i} 5: Fi 6: while B > 0 do 7: i SELECTAGENT(R,R1, . . . , Rn) 8: f CREATEFAKEOBJECT(Ri, Fi, condi) 9: Ri Ri {f} 10: Fi Fi {f} 11: bi bi 1 12: if bi = 0 then 13: R R\{Ri} 14: B B 1
Page 26
ALGORITHM 1: a 0|T| a[k]:number of agents who have received object tk 2: R1 , . . . ,Rn 3: remaining 4: while remaining > 0 do 5: for all i = 1, . . . , n : |Ri| < mi do 6: k SELECTOBJECT(i,Ri) May also use additionalParameters 7: Ri Ri {tk} 8: a[k] a[k] + 1 9: remaining remaining 1
Page 27
SYSTEM DESIGN
Page 28
CHAPTER 6
SYSTEM DESIGN
6.1 INTRODUCTION
System design is a process through which requirements are translated into a representation of software. Initially the representation depicts a holistic view of software. Subsequent refinement leads to a design representation that is very close to source code. Design provides us with representation of software development. Design is the only phase where user requirements are accurately translated into finished software product or system. System design refers to modeling of a process. It is an approach to create a new system. It can be defined as a transition from users view to programmers view. The system design phase acts as a bridge between the required specification and the implementation phase. The design stage involves two sub stages namely: High - Level Design. Low - Level Design
Page 29
processing (structured design). On a DFD, data items flow from an external data source or an internal data store to an internal data store or an external data sink, via an internal process. Data flow diagrams are an intuitive way of showing how data is processed by a system At the analysis level, they should be used to model the way in which data is processed in the existing system. The notations used in these models represents functional processing, data stores and data movements between functions. Data flow models are used to show how data flows through a sequence of processing steps. The data is transferred at each step before moving on to the next stage. These processing steps or transformations are program functions when data flow diagrams are used to explain a software design.
Page 30
Login
Data Transfer
Logout
Page 31
Login
Page 32
LOGIN
DATA TRANSFER
Page 33
Page 34
Page 35
Login
Login as Store data into database Distributor Find probability of data transfer to View from database agents for data leakage
Figure 12. Sequence Diagram, this will brief us through the complete sequence in which our Project runs
Page 36
Page 37
Login as Distributor
Logout
Figure 14. Entity Relation Diagram of Data Leakage Detection. ISE DEPT, CMRIT
Page 38
IMPLEMENTATION
Page 39
Page 40
Page 41
Page 42
Figure 20. Distributor starting the server to send Data to an Agent to a selected path.
Page 43
Figure 24. Agent forwarding Data (leaking data) ISE DEPT, CMRIT
Page 44
Figure 26. Distributor calculating the Probability of each Agent being Guilty
Page 45
Page 46
SYSTEM TESTING
Page 47
CHAPTER 8
SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover every conceivable fault or weakness in a work product. It provides a way to check the functionality of components, sub-assemblies, assemblies and/or a finished product It is the process of exercising software with the intent of ensuring that the Software system meets its requirements and user expectations and does not fail in an unacceptable manner. There are various types of test. Each test type addresses a specific testing requirement.
Page 48
FUNCTIONAL TEST -Functional tests provide systematic demonstrations that functions tested are available as specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items: Valid Input Invalid Input Functions Output : identified classes of valid input must be accepted. : identified classes of invalid input must be rejected. : identified functions must be exercised. : identified classes of application outputs must be exercised.
Systems/Procedures: interfacing systems or procedures must be invoked. Organization and preparation of functional tests is focused on requirements, key functions, or special test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields, predefined processes, and successive processes must be considered for testing. Before functional testing is complete, additional tests are identified and the effective value of current tests is determined.
SYSTEM TESTING-System testing ensures that the entire integrated software system meets requirements. It tests a configuration to ensure known and predictable results. An example of system testing is the configuration oriented system integration test. System testing is based on process descriptions and flows, emphasizing pre-driven process links and integration points.
WHITE BOX TESTING -White Box Testing is a testing in which in which the software tester has knowledge of the inner workings, structure and language of the software, or at least its purpose. It is purpose. It is used to test areas that cannot be reached from a black box level.
Page 49
BLACK BOX TESTING -Black Box Testing is testing the software without any knowledge of the inner workings, structure or language of the module being tested. Black box tests, as most other kinds of tests, must be written from a definitive source document, such as specification or requirements document, such as specification or requirements document. It is a testing in which the software under test is treated, as a black box .you cannot see into it. The test provides inputs and responds to outputs without considering how the software works.
Features to be tested Verify that the entries are of the correct format No duplicate entries should be allowed All links should take the user to the correct page.
Page 50
03
Distributor Function
Pass
A sign up page where the user has to enter personal details Pass
A page with a set of hosts arrive and can send data to various agents Pass
Before sending data from the distributor this server has to be turned on Pass
07
A page with distributor login and agent login A page where the agent enters the agent details Pass Pass
Page 51
09
A page where agent can leak data arrives A page with distributor login and agent login Login page with
A page where agent can leak data arrives A page with distributor login and agent login Login page with user name and password arrives A Page for finding guilty agents arrive Find the guilty agent Graph shown Pass Pass Pass Pass Pass Pass
10
11 Distributor Login
12
Find Guilty Agents Find probability of guilty agent Show the probability graph
13 14
A Page for finding guilty agents arrive Find the guilty agent Graph shown
Software integration testing is the incremental integration testing of two or more integrated software components on a single platform to produce failures caused by interface defects. The
task of the integration test is to check that components or software applications, e.g. components in a software system or one step up software applications at the company level interact without error.
Test Results - All the test cases mentioned above passed successfully. No defects encountered.
User Acceptance Testing is a critical phase of any project and requires significant participation by the end user. It also ensures that the system meets the functional requirements. Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Page 52
FUTURE ENHANCEMENT
Page 53
CHAPTER 9
FUTURE ENHANCEMENT
Our future work includes the investigation of agentguilt models that capture leakage scenarios that are not studied in this project. For example, what is the appropriate model for cases where agents can collude and identify fake tuples? A preliminary discussion of such a model is available in. Another open problem is the extension of our allocation strategies so that they can handle agent requests in an online fashion (the presented strategies assume that there is a fixed set of agents with requests known in advance).
Any application does not end with a single version. It can be improved to include new features. Our application is no different from this. The future enhancements that can be made to Data Leakage Detection are: Providing support for other file formats. Creation of a web based UI for execution of the application. Improving the detection process based on user requirements. Provision of quality or accuracy variance parameter for the user to set.
Page 54
CONCLUSION
Page 55
CHAPTER 10
CONCLUSION
In a perfect world, there would be no need to hand over sensitive data to agents that may unknowingly or maliciously leak it. And even if we had to hand over sensitive data, in a perfect world, we could watermark each object so that we could trace its origins with absolute certainty. However, in many cases, we must indeed work with agents that may not be 100 percent trusted, and we may not be certain if a leaked object came from an agent or from some other source, since certain data cannot admit watermarks. In spite of these difficulties, we have shown that it is possible to assess the likelihood that an agent is responsible for a leak, based on the overlap of his data with the leaked data and the data of other agents, and based on the probability that objects can be guessed by other means. Our model is relatively simple, but we believe that it captures the essential trade-offs. The algorithms we have presented implement a variety of data distribution strategies that can improve the distributors chances of identifying a leaker. We have shown that distributing objects judiciously can make a significant difference in identifying guilty agents, especially in cases where there is large overlap in the data that agents must receive.
Page 56
BIBLIOGRAPHY
Page 57
CHAPTER 11
BIBLIOGRAPHY
Good Teachers are worth more than thousand books, we have them in Our Department.
Page 58
Page 59