0% found this document useful (0 votes)
12 views

Big Data Analytics in Cyber Security IJERTCONV5IS10032

Uploaded by

Priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Big Data Analytics in Cyber Security IJERTCONV5IS10032

Uploaded by

Priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Special Issue - 2017 International Journal of Engineering Research & Technology (IJERT)

ISSN: 2278-0181
ICCCS - 2017 Conference Proceedings

Big Data Analytics in Cyber Security


Aarushi Arya, Harshit Malhotra Dayanand Wilson Jeberson
Student, Research Scholar, Department of Professor, Department of Computer
Dept. of Computer Science Computer Science and Information Science and Information Technology,
Engineering , Technology, Sam Higginbottom Sam Higginbottom University of
HMR Institute of Technology and University of Agriculture, Technology Agriculture, Technology and Sciences,
Management, Hamidpur, New Delhi, and Sciences, Allahabad, Allahabad, Uttar Pradesh, India
India Uttar Pradesh, India
Abstract-Big data analytics in security involves the ability to 3. The types of structured and unstructured data (Variety) -
gather massive amounts of digital information to analyze, Technologies: tools and techniques that are been used to
visualize and draw insights that can make it possible to process a sizable or complex datasets is a crucial factor.
predict and stop cyber attacks. Along with security
technologies, it gives us stronger cyber defense posture. They
III. TECHNOLOGY MEGA TRENDS
allow organizations to recognize patterns of activity that
represent network threats. In this paper, we focus on how Big Big data is generating an enormous amount of attention
Data can improve information security best practices. among business, media and even the consumers, along with
the analytics, cloud based technologies. These all the part
Keywords: Big Data, Cyber Security, Privacy, Database of the current eco-system created by technology
megatrends.
I. INTRODUCTION
The term Big Data is defined for the data sets that are very Big data has become a major topic or the theme of the
large or complex that traditional data set processing technology media, it has also made its way into many
application software is inadequate or are unable to deal compliances and in internal audits. In EY's Global Forensic
with these complex or large data sets. The major difference Data Analysis Survey 2014, 72% of respondents believe
between tradition and big data is in terms of volume, that emerging big data technologies can play a key role in
velocity and variation. Volume means amount of data that fraud prevention and detection .yet only few about 7% of
is been generated; velocity refers to the speed with which respondents were aware about any specific big data
the data is been generated and variation means types of technologies, and only very few about 2%of them were
structured and non structured data. actually using them. FDA (Forensic data analysis)
technologies are available to help the companies to
Nowadays, big data is becoming an important topic for maintain the pace with increasing data at very high speed
research in almost every field especially cyber security. (volumes), as well as business complexities.
The main sources of generation of this data are social
media sites and smart devices. Generation of data at this Big Data is broad and encompasses many trends and new
speed leads to the various concern regarding the security of technology developments, the top ten emerging
the data that is been created as it is very important to keep technologies that are helping users cope with and handle
this data safe because this data also contain some important Big Data in a cost-effective manner.
and sensitive data such as bank account number passwords
credit card details etc so it is important to keep this data 1. Column oriented database
secure. Also, advances in Big Data analytics provide tools Traditional, row oriented database are excellent for the
to extract and utilize this data, making violations of privacy online transaction processing with the high update speeds,
easier. . As a result, along with developing Big Data tools, but they fall short in the query performance as more data
it is necessary to create safeguards to prevent abuse [2]. volume grows and as data becomes unstructured.

II. DEFINING AND ANALYTICS BIG DATA 2. Schema less database or No Sql database
The term big data is referred to massive amount there are various database types that fit into this category,
information that is been stored and transmitted in a such as key value storage and document stores, which
computer system. focus on storage and retrival of large volume of data which
is either unstructured, semi-structured, or even structured
Big Data is differentiated from traditional technology in 3 data.
ways:
3. Map Reduce
1. The amount of data (Volume) - Size: the volume of This is a programming paradigm that allows for massive
datasets is a critical factor, that is, how much amount of job execution scalability against thousands of servers or
data that is been generated clusters of servers. Any Map Reduce implementation
consists of two tasks:
2. The rate of data generation and transmission (Velocity) -
Complexity: the structure, behaviour and permutations of The "Map" task, where an input dataset is converted into a
datasets in critical factor. different set of key/value pairs, or tuples. The "Reduce"

Volume 5, Issue 10 Published by, www.ijert.org 1


Special Issue - 2017 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
ICCCS - 2017 Conference Proceedings

task, where several of the outputs of the "Map" task are Processing
combined to form a reduced set of tuples . In present day scenario we have extremely large volume of
data that have not been traditionally captured and processed
4. Hadoop for various reasons, mostly the reason is the cost to do the
Hadoop is the best and the most popular implementation of processing is far more greater than the value insights
map reduce, being an entirely an open source platform for companies can drive from its analysis. That is why large
handling of big data. It is flexible enough to be able to amount of data is left unprocessed because cost involved in
work with multiple data sources. It has several different processing that data is very high.
applications, but one of the top use cases is for large
volumes of constantly changing data, such as location- However now some new technologies have lowered the
based data from weather or traffic sensors cost and the technology barrier for effective data
processing, allowing companies of all sizes, to be able to
5. Hive unlock the value contained in different data sources. For
It is a SQL-LIKE bridge that allows conventional BI instance, it is difficult for conventional relational databases
application to run queries against a Hadoop cluster It was to handle the unstructured data.
developed originally by Facebook, but has been made open
source for some time now, and it's a higher-level Many organizations are looking for the cloud to provide the
abstraction of the Hadoop framework that allows anyone to storage solution. Cloud computing enables companies to
make queries against data stored in a Hadoop cluster just as use prebuilt big data solutions, or quickly build and deploy
if they were manipulating a conventional data store. a powerful array of servers, without the substantial costs
involved in owning physical hardware.
6. Pig
PIG was developed by Yahoo .PIG is bridge that tries to Output
bring Hadoop closer to the realities of developers and It is not easy and cheap to capture or gather data, store and
business users, similar to Hive. Unlike Hive, however, PIG process the data, it is not at all useful until the information
consists of a "Perl-like" language that allows for query is relevant; it must also be readily available when it is
execution over data stored on a Hadoop cluster, instead of a needed
SQL-like language [9].
There are three key enablers:
7. WibiData • Mobile — Established mobile networks have allowed for
Wibi data is a combination of web analytics with hadoop it easier distribution of information in real-time.
is been built on the top of Hbase which itself a database • Visual/interactive — Technologies have brought the
layer on hadoop. ability to review large and complex data sets into the realm
of the average business user.
8. Sky Tree • Human resource — There is a new breed of employees
It is a high performance machine learning and data with the knowledge to handle the complexities of big data
analytics platform focussed specially on the handling of big and with the ability to simplify the output for daily use.
data. machine learning is a very important part of big data,
since the data volume make manual exploration. V. BIG DATA ANALYTICS FOR CYBER
SECURITY
1. Big Data Analytics Used In Fraud Detection
IV. BIG DATA LIFE CYCLE Techniques used for fraud detection fall into two primary
The big data life consist of three stages classes: statistical techniques and artificial intelligence.
1. Creation
2. Processing Examples of statistical data analysis techniques are: 1. Data
3. Output pre-processing techniques for detection, validation, error
Creation correction, and filling up of missing or incorrect data.
Certain type of data is not been able to be captured, but this
type of data is rarely been used effectively until now(one of 2. Calculation of various statistical parameters such as
general example is ,the location of the person at any averages, quintiles, performance metrics, probability
particular movement of time, the number of steps a person distributions, and so on.
takes every day).New and Advance technologies such as
advanced sensor and specially customized software can 3. Models and probability distributions of various business
now record this type of information for the purpose of activities either in terms of various parameters or
analysis. Changes in the areas of communication in the way probability distributions.
we communicate (e.g., social media vs. Telephone vs.
text/SMS vs. email vs. letter) have also increased our 4. Computing user profiles.
ability to investigate areas such as consumer sentiment.
5. Time-series analysis of time-dependent data.

6. Clustering and classification to find patterns and


associations among groups of data.

Volume 5, Issue 10 Published by, www.ijert.org 2


Special Issue - 2017 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
ICCCS - 2017 Conference Proceedings

7. Matching algorithms to detect anomalies in the 5. Limited number of well trained and experienced data
behaviour of transactions or users as compared to scientists.
previously known models and profiles. Techniques are also 6. Security issues of big data.
needed to eliminate false alarms, estimate risks, and predict
future of current transactions or users. Fraud management CONCLUSION
is a knowledge intensive activity. The goal of Big Data analytics for security is to obtain
actionable intelligence in real time. Big Data can have a
The main AI techniques used for fraud management major impact on your current business in three ways. It can
include [AI]: help you:

1. Data mining to classify, cluster, and segment the data 1. Discover hidden insights – For example, if you consider
and automatically find associations and rules in the data customer survey data when investigating a high service
that may signify interesting patterns, including those cancellation rate, you may detect a pattern or root cause
related to fraud. that wasn’t visible before and that you can eliminate to
improve retention.
2. Expert systems to encode expertise for detecting fraud in
the form of rules. 2. Improve decisions, by enriching information for decision
makers – For example, if you consider a customer’s social
3. Pattern recognition to detect approximate classes, media profile, you can get a clearer picture of that customer
clusters, or patterns of suspicious behaviour either and their place in the world and you can use that
automatically (unsupervised) or to match given inputs. information to improve your response to service inquiries
or to prioritize fraud alerts.
4. Machine learning techniques to automatically identify
characteristics of fraud. 3. Automate business processes – For example, you can
look at detailed stock trading information to identify
5. Neural networks that can learn suspicious patterns from patterns that lead to poorly executed trades and automate
samples and used later to detect them. the process so that certain steps are taken when that pattern
occurs again.
2. Big Data Analytics Used To Detect Anamoly-
Based Intrusion REFERENCES
Anomaly detection algorithms are very simple to set and [1] CLOUD SECURITY ALLIANCE Big Data Analytics for Security
functions automatically. Some key performance indicators Intelligence
are for an event chosen and then thresholds are set. If a [2] Bryant, Katz, & Lazowska, 2008
[3] Big Data Analytics for Detection of Frauds in Matrimonial Websites
threshold is exceeded, then the event is signalled for further Vemula Geeta et al | International Journal of Computer Science
investigation. The effectiveness of this method is Engineering and Technology (IJCSET) | March 2015 | Vol 5, Issue
influenced by the choice of indicators to be monitored, of 3, 57-61
the analysis period, and of the threshold value settings. [4] Big Data and Specific Analysis Methods for Insurance Fraud
Detection Ana-Ramona BOLOGA, Razvan BOLOGA, Alexandra
Anomaly detection algorithms are very simple to set and FLOREA University of Economic Studies, Bucharest, Romania
functions without human intervention. The effectiveness of [5] Big Data Cyber security Analytics Research Report - Ponemon
this method is influenced by the choice of parameters to be Institute© Research Report Date: August 2016
monitored, of the analysis period, and of the threshold [6] Richard A.Derrig,”Insurance Fraud”, The Journal of Risk and
Insurance”,2002,Vol.69,No.3,271-287
value settings. [7] Bresfelean, Vasile Paul, Mihaela Bresfelean, Nicolae Ghisoiu, and
Calin-Adrian Comes. 2007. "Data Mining Clustering Techniques in
3. Provide Security Intelligence – They can reduce Academia." In ICEIS (2), pp. 407-410.
the time taken to correlate data for forensics purpose and [8] Bresfelean, V. P., Bresfelean, M., Ghisoiu, N., & Comes, C. A.
2008. Determining students’ academic failure profile founded on
generate actionable security response. data mining methods. In Information Technology Interfaces, IEEE,
pp. 317-322.
VI. CHALLENGES [9] Data electronically available at
1. Some organizations may not be data driven. They do http://www.ey.com/Publication/vwLUAssets/EY_Big_data:_changin
g_the_way_businesses_operate/%24FILE/EY-Insights-on-GRC-
not understand the benefits of analytics and hesitant Big-data.pdf
regarding big data analytics.
2. Organizations may think of big data analytics as a way
to create value from data. But it is more about finding
the right use case related to intended business
objective.
3. Analytics team and the users work together in the
various phases of analytics process from scope
definition to data extraction and delivery.
4. The management may not be able to trust the analytics
outcome as it is difficult to understand how data can
generate such outcomes.

Volume 5, Issue 10 Published by, www.ijert.org 3

You might also like