Big Data Analytics in Cyber Security IJERTCONV5IS10032
Big Data Analytics in Cyber Security IJERTCONV5IS10032
ISSN: 2278-0181
ICCCS - 2017 Conference Proceedings
II. DEFINING AND ANALYTICS BIG DATA 2. Schema less database or No Sql database
The term big data is referred to massive amount there are various database types that fit into this category,
information that is been stored and transmitted in a such as key value storage and document stores, which
computer system. focus on storage and retrival of large volume of data which
is either unstructured, semi-structured, or even structured
Big Data is differentiated from traditional technology in 3 data.
ways:
3. Map Reduce
1. The amount of data (Volume) - Size: the volume of This is a programming paradigm that allows for massive
datasets is a critical factor, that is, how much amount of job execution scalability against thousands of servers or
data that is been generated clusters of servers. Any Map Reduce implementation
consists of two tasks:
2. The rate of data generation and transmission (Velocity) -
Complexity: the structure, behaviour and permutations of The "Map" task, where an input dataset is converted into a
datasets in critical factor. different set of key/value pairs, or tuples. The "Reduce"
task, where several of the outputs of the "Map" task are Processing
combined to form a reduced set of tuples . In present day scenario we have extremely large volume of
data that have not been traditionally captured and processed
4. Hadoop for various reasons, mostly the reason is the cost to do the
Hadoop is the best and the most popular implementation of processing is far more greater than the value insights
map reduce, being an entirely an open source platform for companies can drive from its analysis. That is why large
handling of big data. It is flexible enough to be able to amount of data is left unprocessed because cost involved in
work with multiple data sources. It has several different processing that data is very high.
applications, but one of the top use cases is for large
volumes of constantly changing data, such as location- However now some new technologies have lowered the
based data from weather or traffic sensors cost and the technology barrier for effective data
processing, allowing companies of all sizes, to be able to
5. Hive unlock the value contained in different data sources. For
It is a SQL-LIKE bridge that allows conventional BI instance, it is difficult for conventional relational databases
application to run queries against a Hadoop cluster It was to handle the unstructured data.
developed originally by Facebook, but has been made open
source for some time now, and it's a higher-level Many organizations are looking for the cloud to provide the
abstraction of the Hadoop framework that allows anyone to storage solution. Cloud computing enables companies to
make queries against data stored in a Hadoop cluster just as use prebuilt big data solutions, or quickly build and deploy
if they were manipulating a conventional data store. a powerful array of servers, without the substantial costs
involved in owning physical hardware.
6. Pig
PIG was developed by Yahoo .PIG is bridge that tries to Output
bring Hadoop closer to the realities of developers and It is not easy and cheap to capture or gather data, store and
business users, similar to Hive. Unlike Hive, however, PIG process the data, it is not at all useful until the information
consists of a "Perl-like" language that allows for query is relevant; it must also be readily available when it is
execution over data stored on a Hadoop cluster, instead of a needed
SQL-like language [9].
There are three key enablers:
7. WibiData • Mobile — Established mobile networks have allowed for
Wibi data is a combination of web analytics with hadoop it easier distribution of information in real-time.
is been built on the top of Hbase which itself a database • Visual/interactive — Technologies have brought the
layer on hadoop. ability to review large and complex data sets into the realm
of the average business user.
8. Sky Tree • Human resource — There is a new breed of employees
It is a high performance machine learning and data with the knowledge to handle the complexities of big data
analytics platform focussed specially on the handling of big and with the ability to simplify the output for daily use.
data. machine learning is a very important part of big data,
since the data volume make manual exploration. V. BIG DATA ANALYTICS FOR CYBER
SECURITY
1. Big Data Analytics Used In Fraud Detection
IV. BIG DATA LIFE CYCLE Techniques used for fraud detection fall into two primary
The big data life consist of three stages classes: statistical techniques and artificial intelligence.
1. Creation
2. Processing Examples of statistical data analysis techniques are: 1. Data
3. Output pre-processing techniques for detection, validation, error
Creation correction, and filling up of missing or incorrect data.
Certain type of data is not been able to be captured, but this
type of data is rarely been used effectively until now(one of 2. Calculation of various statistical parameters such as
general example is ,the location of the person at any averages, quintiles, performance metrics, probability
particular movement of time, the number of steps a person distributions, and so on.
takes every day).New and Advance technologies such as
advanced sensor and specially customized software can 3. Models and probability distributions of various business
now record this type of information for the purpose of activities either in terms of various parameters or
analysis. Changes in the areas of communication in the way probability distributions.
we communicate (e.g., social media vs. Telephone vs.
text/SMS vs. email vs. letter) have also increased our 4. Computing user profiles.
ability to investigate areas such as consumer sentiment.
5. Time-series analysis of time-dependent data.
7. Matching algorithms to detect anomalies in the 5. Limited number of well trained and experienced data
behaviour of transactions or users as compared to scientists.
previously known models and profiles. Techniques are also 6. Security issues of big data.
needed to eliminate false alarms, estimate risks, and predict
future of current transactions or users. Fraud management CONCLUSION
is a knowledge intensive activity. The goal of Big Data analytics for security is to obtain
actionable intelligence in real time. Big Data can have a
The main AI techniques used for fraud management major impact on your current business in three ways. It can
include [AI]: help you:
1. Data mining to classify, cluster, and segment the data 1. Discover hidden insights – For example, if you consider
and automatically find associations and rules in the data customer survey data when investigating a high service
that may signify interesting patterns, including those cancellation rate, you may detect a pattern or root cause
related to fraud. that wasn’t visible before and that you can eliminate to
improve retention.
2. Expert systems to encode expertise for detecting fraud in
the form of rules. 2. Improve decisions, by enriching information for decision
makers – For example, if you consider a customer’s social
3. Pattern recognition to detect approximate classes, media profile, you can get a clearer picture of that customer
clusters, or patterns of suspicious behaviour either and their place in the world and you can use that
automatically (unsupervised) or to match given inputs. information to improve your response to service inquiries
or to prioritize fraud alerts.
4. Machine learning techniques to automatically identify
characteristics of fraud. 3. Automate business processes – For example, you can
look at detailed stock trading information to identify
5. Neural networks that can learn suspicious patterns from patterns that lead to poorly executed trades and automate
samples and used later to detect them. the process so that certain steps are taken when that pattern
occurs again.
2. Big Data Analytics Used To Detect Anamoly-
Based Intrusion REFERENCES
Anomaly detection algorithms are very simple to set and [1] CLOUD SECURITY ALLIANCE Big Data Analytics for Security
functions automatically. Some key performance indicators Intelligence
are for an event chosen and then thresholds are set. If a [2] Bryant, Katz, & Lazowska, 2008
[3] Big Data Analytics for Detection of Frauds in Matrimonial Websites
threshold is exceeded, then the event is signalled for further Vemula Geeta et al | International Journal of Computer Science
investigation. The effectiveness of this method is Engineering and Technology (IJCSET) | March 2015 | Vol 5, Issue
influenced by the choice of indicators to be monitored, of 3, 57-61
the analysis period, and of the threshold value settings. [4] Big Data and Specific Analysis Methods for Insurance Fraud
Detection Ana-Ramona BOLOGA, Razvan BOLOGA, Alexandra
Anomaly detection algorithms are very simple to set and FLOREA University of Economic Studies, Bucharest, Romania
functions without human intervention. The effectiveness of [5] Big Data Cyber security Analytics Research Report - Ponemon
this method is influenced by the choice of parameters to be Institute© Research Report Date: August 2016
monitored, of the analysis period, and of the threshold [6] Richard A.Derrig,”Insurance Fraud”, The Journal of Risk and
Insurance”,2002,Vol.69,No.3,271-287
value settings. [7] Bresfelean, Vasile Paul, Mihaela Bresfelean, Nicolae Ghisoiu, and
Calin-Adrian Comes. 2007. "Data Mining Clustering Techniques in
3. Provide Security Intelligence – They can reduce Academia." In ICEIS (2), pp. 407-410.
the time taken to correlate data for forensics purpose and [8] Bresfelean, V. P., Bresfelean, M., Ghisoiu, N., & Comes, C. A.
2008. Determining students’ academic failure profile founded on
generate actionable security response. data mining methods. In Information Technology Interfaces, IEEE,
pp. 317-322.
VI. CHALLENGES [9] Data electronically available at
1. Some organizations may not be data driven. They do http://www.ey.com/Publication/vwLUAssets/EY_Big_data:_changin
g_the_way_businesses_operate/%24FILE/EY-Insights-on-GRC-
not understand the benefits of analytics and hesitant Big-data.pdf
regarding big data analytics.
2. Organizations may think of big data analytics as a way
to create value from data. But it is more about finding
the right use case related to intended business
objective.
3. Analytics team and the users work together in the
various phases of analytics process from scope
definition to data extraction and delivery.
4. The management may not be able to trust the analytics
outcome as it is difficult to understand how data can
generate such outcomes.