0% found this document useful (0 votes)
12 views

Big_Data_Security_Management_Issues

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Big_Data_Security_Management_Issues

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1

Big-Data Security Management Issues


Marisa Paryasto, Andry Alamsyah, Budi Rahardjo, Kuspriyanto

Abstract—Big data phenomenon arises from the increasing uniform message digest may not work. As for availablity,
number of data collected from various sources, including the redundancy is difficult due to the size and distributed nature
internet. Big data posses characteristics that make it difficult to of big data.
manage from security point of view. This paper looks at NIST risk
management guidance and determines whether it is applicable NIST Special Publication 800-30 (2012) [4] is a guidance
to big data. documentation for conducting risk assessments of information
Index Terms—Big data, security, management
security. This guidance provides senior leader/executives in
organisations the information needed to determine appropriate
courses of action in response to identified risk. The objective of
I. I NTRODUCTION this paper is to map big data characteristics into steps outline

D ATA analytics is the key to understand information of


knowledge about some certain activities. The process
of data analysis includes checking, cleaning, modelling, and
in the NIST document. (See Figure 1.) The final goal is to see
whether the guidance is applicable to big data. Our roadmap
is outlined in Figure 2.
transformation of the data. The information gathered from this
process is then used for suggestion, summary and support for
decision making [5] [6]. II. B IG DATA
Big data phenomenon is triggered by the rapid growth of A. Definitions
various social network services. User generated content is re-
sponsible for generating a huge volume of data to be analyzed Big data compared to traditional relational databases in
for many purposes, from business to security. Machine to terms of requirements and architecture, is essentially different.
machine communication (M2M) and the Internet of Things Big data is often measured by 4V (volume, variety, velocity
also produce a vast amount of data. Data from other fields, and value).
eg. DNA sequencing, also contribute to big data. Referring to [1] some of the fundamental differences in Big
Big data implications in data analytics is significant, such Data architecture are listed below.
as in management business [8], the data gathered from online 1) Distributed architecture. Big data architecture is highly
conversations between members in a community can be used distributed, with the scale of thousands data and process-
as a consideration for marketing strategy, supply chain man- ing nodes. Big data architecture is generally highly re-
agement, customer relationship management, competitive ad- silient and fault tolerant because the data is horizontally
vantage and business intelligent of a company. In informatics, partitioned, replicated and distributed among multiple
a new learning system approach for artificial intelligence using data nodes available.
big data has already been generated. In research on astronomy, 2) Real-time, stream and continuous computations. Data
NASA has also use Big data to support research to map stars are produced in real-time and in streaming fashion.
formation on the sky. Computations to these data must be done continuously
Big data has certain characteristics that are called 4V [7]; (and hopefully can also be done in real-time).
volume, variety, velocity, and value. Currently, 2.5 quintillion 3) Ad-hoc queries. Since the content and value of data
bytes of data are produced daily. The format (and content) varies, the queries to the data also varies and ad-hoc.
of data varies and unstructured. The speed of data creation is The queries are done on the fly.
faster than the speed of analysis. The use and value of the 4) Parallel and powerful programming language. The
data also varies. This creates a problem in the analysis and computations performed in big data are much more
safe guarding of the big data. complex, highly parallel and computationally intensive
One of the biggest problems in big data is security. Some than ones that are done with traditional programming or
big data initiatives failed due to the unclear security controls. database languages.
Thus, security is important in big data implementation. 5) Move the code. Due to the size of data, it is easier to
Security can be seen from three aspects; confidentiality, move the code than to move the data. This makes it
integrity, and availability. Confidentiality is responsible for more difficult in terms of security control.
securing access of big data. Unfortunately, the massive size 6) Non-relational data. The data stored in big data is
of data sources and mixing of the sources make it difficult to mostly non-relational. The main advantage of non re-
decide who is granted to access and analyze the data [2]. Who lational data is that it can accomodate large volume and
can access the derived data? Can a third party access or sell variety of data.
the combined data? 7) Auto-tiering. In big data, it is exteremely difficult to
In terms of integrity, there is no single method to guarantee know precisely where the data is exactly located among
integrity of data in a variety unstructured format. Conventional the available data nodes because the hottest data blocks
2

Fig. 1. Generic Risk Model NIST

are tiered into higher performance media while the C. Security


coldest data is sent to lower cost high capacity drives. Big data has complexities that mostly people and companies
8) Variety of input data sources. Big data requires col- are unprepared to deal with. These complexities include secu-
lecting data from many sources such as logs, end to rity and governance of data in general. Information governance
point devices, social media, etc. It is more difficult to is the capability to create information resource that can be
determine who have access to what. trusted by employess, partners, and customers, as well as
government organizations [3].
B. Vulnerabilities Big data comes from many data sources that might have
different security and governance policies. A well-defined
The following is a list of vulnerabilities found in big data.
security strategy has to be applied on whatever information
1) Insecure computation. An insecure program can access management. Combination of security and governance strategy
sensitive data (personal profile, age credit cards, etc.), need collaboration and coordination to share responsibilities
can corrupt the data leading to incorrect results and can accross organizations/ parties involved to make sure the ac-
perform Denial of Service to big data solution leading countability is enforced to the data being used.
to financial loss. Common security solution to data is done by encrypting
2) End-point input validation/filtering. There are two the data. However, different kinds of data require different
fundamental challenges in data collection process: input forms of security protections. Applying the same kind of
validation and data filtering. The amount of data collec- encryption (choosing the highest one) may result in high cost
tion in big data makes it difficult to validate and filter and complicated procedures.
data on the fly. Some data-safeguarding techniques suggested in [3] are:
3) Granular access control. Existing solutions of big data
1) Data anonymization. The process of removing all data
are designed for performance and scalability, keeping
can be uniquely tied to an individual.
almost no security in mind.
2) Tokenization. Protecting sensitive data by replacing it
4) Insecure data storage and communication. These
with tokens or alias values meaningless to unauthorized
includes data storage at various distributed data nodes,
people. Data scrubbing is another term commonly used.
auto-tiering, real time analytics and continuous computa-
3) Cloud database controls. Setting up access controls to
tion, secure communication (among nodes, middlewares,
protect the database. However, this approach is very new.
and end users) and transactional logs of big data.
5) Privacy preserving data mining and analytics. There
are many concerns pertaining to monetizing and sharing III. M ETHODOLOGY
big data analytics in terms of invasion of privacy, inva- Our main questions is whether the risk management frame-
sive marketing and unintentional disclosure of sensitive work offered by NIST SP800-30 can be used in big data.
information. The approach that we use is to map big data characteristics
3

Fig. 2. Big Data Security Roadmap

TABLE I
B IG DATA V ULNERABILITY C LASSES

into steps outlined in the NIST document. In each step, NIST IV. C ONCLUSION
suggested methodology to obtain the data. NIST Risk Assessment framework described in NIST
There are three ways big data affected the NIST framework; SP800-30 [4] can be use for big data. The methodology
(1) no change, (2) the methodology is the same but the data is in obtaining the data for risk assessment is still the same,
larger, and (2) the methodology must be change. Using these, although we may have to deal with larger data.
we map the content as shown in Table II.
R EFERENCES
Looking at the tabel, we can see that big data has effect
to the methodology but not in a way that requires a new [1] Jitendra Chauchan. Top 5 big data vulnerability classes, 2013 July.
[2] K. Davis and D. GordonPatterson. Ethics of Big Data. O’Reilly, 2012.
methodology. At most, we have to deal with larger data. Thus, [3] Judith Hurwitz, Alan Nugent, Fern Halper, and Marcia Kaufman. Big
NIST SP800-30 framework is still viable for big data. Data for Dummies. 2013.
4

TABLE II
R ISK A SSESSMENT ACTIVITIES NIST AND E QUIVALENT R ISK S ECURITY IN B IG DATA

[4] Computer Security Division Information Technology Laboratory. Guide Budi Rahardjo is a researcher and lecturer at Bandung Institute of Technol-
to elliptic curve cryptography for conducting risk assessments. Technical ogy.
report, National Institute of Standards and Technology, 2012.
[5] A. MacAfee and E. Brynjolfsson. Big data: The management revolution.
Harvard Business Review Magazine, October 2012.
[6] S. Sagiroglu and D. Sinanc. Big data: A review. In International Confer-
ence on Collaboration Technology and System. International Conference
on Collaboration Technology and System, 2013.
[7] A. Sathi. Big Data Analytics: Disrupting Technologies for Changing
Game. MC Press, 2012.
[8] D. Zage, K. Glass, and R. Colbaugh. Improving supply chain security
using big data. In International Conference on Intelligence and Security
Informatics. IEEE, 2013.

Kuspriyanto is a professor and lecturer at Bandung of Institute of Technology

Marisa Paryasto is a researcher at Bandung of Institute of Technology and


lecturer at Telkom University

Andry Alamsyah is a lecturer at Telkom University

You might also like