OceanofPDF - Com Grid - Hamid R Arabnia
OceanofPDF - Com Grid - Hamid R Arabnia
WORLDCOMP’19
GRID, CLOUD, & CLUSTER COMPUTING
GRID, CLOUD, & CLUSTER COMPUTING
GCC’19
U.S. $49.95
ISBN 9781601324993
54995
Publication of the 2019 World Congress in Computer Science,
Computer Engineering, & Applied Computing (CSCE’19)
July 29 - August 01, 2019 | Las Vegas, Nevada, USA
https://americancse.org/events/csce2019
Arabnia
9 781601 324993
Copyright © 2019 CSREA Press
Copying without a fee is permitted provided that the copies are not made or distributed for direct
commercial advantage, and credit to source is given. Abstracting is permitted with credit to the source.
Please contact the publisher for other copying, reprint, or republication permission.
It gives us great pleasure to introduce this collection of papers to be presented at the 2019 International
Conference on Grid, Cloud, and Cluster Computing (GCC’19), July 29 – August 1, 2019, at Luxor Hotel (a
property of MGM Resorts International), Las Vegas, USA. The preliminary edition of this book (available
in July 2019 for distribution on site at the conference) includes only a small subset of the accepted research
articles. The final edition (available in August 2019) will include all accepted research articles. This is due
to deadline extension requests received from most authors who wished to continue enhancing the write-up
of their papers (by incorporating the referees’ suggestions). The final edition of the proceedings will be
made available at https://americancse.org/events/csce2019/proceedings .
An important mission of the World Congress in Computer Science, Computer Engineering, and Applied
Computing, CSCE (a federated congress to which this conference is affiliated with) includes "Providing a
unique platform for a diverse community of constituents composed of scholars, researchers, developers,
educators, and practitioners. The Congress makes concerted effort to reach out to participants affiliated
with diverse entities (such as: universities, institutions, corporations, government agencies, and research
centers/labs) from all over the world. The congress also attempts to connect participants from institutions
that have teaching as their main mission with those who are affiliated with institutions that have research
as their main mission. The congress uses a quota system to achieve its institution and geography diversity
objectives." By any definition of diversity, this congress is among the most diverse scientific meeting in
USA. We are proud to report that this federated congress has authors and participants from 67 different
nations representing variety of personal and scientific experiences that arise from differences in culture and
values. As can be seen (see below), the program committee of this conference as well as the program
committee of all other tracks of the federated congress are as diverse as its authors and participants.
The program committee would like to thank all those who submitted papers for consideration. About 70%
of the submissions were from outside the United States. Each submitted paper was peer-reviewed by two
experts in the field for originality, significance, clarity, impact, and soundness. In cases of contradictory
recommendations, a member of the conference program committee was charged to make the final decision;
often, this involved seeking help from additional referees. In addition, papers whose authors included a
member of the conference program committee were evaluated using the double-blinded review process.
One exception to the above evaluation process was for papers that were submitted directly to
chairs/organizers of pre-approved sessions/workshops; in these cases, the chairs/organizers were
responsible for the evaluation of such submissions. The overall paper acceptance rate for regular papers
was 18%; 20% of the remaining papers were accepted as poster papers (at the time of this writing, we had
not yet received the acceptance rate for a couple of individual tracks.)
We are very grateful to the many colleagues who offered their services in organizing the conference. In
particular, we would like to thank the members of Program Committee of GCC’19, members of the
congress Steering Committee, and members of the committees of federated congress tracks that have topics
within the scope of GCC. Many individuals listed below, will be requested after the conference to provide
their expertise and services for selecting papers for publication (extended versions) in journal special
issues as well as for publication in a set of research books (to be prepared for publishers including:
Springer, Elsevier, BMC journals, and others).
Prof. Emeritus Nizar Al-Holou (Congress Steering Committee); Professor and Chair, Electrical
and Computer Engineering Department; Vice Chair, IEEE/SEM-Computer Chapter; University of
Detroit Mercy, Detroit, Michigan, USA
Prof. Hamid R. Arabnia (Congress Steering Committee); Graduate Program Director (PhD, MS,
MAMS); The University of Georgia, USA; Editor-in-Chief, Journal of Supercomputing (Springer);
Editor-in-Chief, Transactions of Computational Science & Computational Intelligence (Springer);
Fellow, Center of Excellence in Terrorism, Resilience, Intelligence & Organized Crime Research
(CENTRIC).
Prof. Dr. Juan-Vicente Capella-Hernandez; Universitat Politecnica de Valencia (UPV),
Department of Computer Engineering (DISCA), Valencia, Spain
Prof. Emeritus Kevin Daimi (Congress Steering Committee); Director, Computer Science and
Software Engineering Programs, Department of Mathematics, Computer Science and Software
Engineering, University of Detroit Mercy, Detroit, Michigan, USA
Prof. Leonidas Deligiannidis (Congress Steering Committee); Department of Computer
Information Systems, Wentworth Institute of Technology, Boston, Massachusetts, USA; Visiting
Professor, MIT, USA
Prof. Mary Mehrnoosh Eshaghian-Wilner (Congress Steering Committee); Professor of
Engineering Practice, University of Southern California, California, USA; Adjunct Professor,
Electrical Engineering, University of California Los Angeles, Los Angeles (UCLA), California,
USA
Prof. Louie Lolong Lacatan; Chairperson, Computer Engineerig Department, College of
Engineering, Adamson University, Manila, Philippines; Senior Member, International Association
of Computer Science and Information Technology (IACSIT), Singapore; Member, International
Association of Online Engineering (IAOE), Austria
Prof. Hyo Jong Lee; Director, Center for Advanced Image and Information Technology, Division
of Computer Science and Engineering, Chonbuk National University, South Korea
Dr. Ali Mostafaeipour; Industrial Engineering Department, Yazd University, Yazd, Iran
Dr. Houssem Eddine Nouri; Informatics Applied in Management, Institut Superieur de Gestion de
Tunis, University of Tunis, Tunisia
Prof. Dr., Eng. Robert Ehimen Okonigene (Congress Steering Committee); Department of
Electrical & Electronics Engineering, Faculty of Engineering and Technology, Ambrose Alli
University, Edo State, Nigeria
Ashu M. G. Solo (Publicity), Fellow of British Computer Society, Principal/R&D Engineer,
Maverick Technologies America Inc.
Prof. Fernando G. Tinetti (Congress Steering Committee); School of Computer Science,
Universidad Nacional de La Plata, La Plata, Argentina; also at Comision Investigaciones
Cientificas de la Prov. de Bs. As., Argentina
Prof. Layne T. Watson (Congress Steering Committee); Fellow of IEEE; Fellow of The National
Institute of Aerospace; Professor of Computer Science, Mathematics, and Aerospace and Ocean
Engineering, Virginia Polytechnic Institute & State University, Blacksburg, Virginia, USA
Prof. Jane You (Congress Steering Committee); Associate Head, Department of Computing, The
Hong Kong Polytechnic University, Kowloon, Hong Kong
Dr. Farhana H. Zulkernine; Coordinator of the Cognitive Science Program, School of Computing,
Queen's University, Kingston, ON, Canada
We would like to extend our appreciation to the referees, the members of the program committees of
individual sessions, tracks, and workshops; their names do not appear in this document; they are listed on
the web sites of individual tracks.
We express our gratitude to keynote, invited, and individual conference/tracks and tutorial speakers - the
list of speakers appears on the conference web site. We would also like to thank the followings: UCMSS
(Universal Conference Management Systems & Support, California, USA) for managing all aspects of the
conference; Dr. Tim Field of APC for coordinating and managing the printing of the proceedings; and the
staff of Luxor Hotel (Convention department) at Las Vegas for the professional service they provided. Last
but not least, we would like to thank the Co-Editors of GCC’19: Prof. Hamid R. Arabnia, Prof. Leonidas
Deligiannidis, and Prof. Fernando G. Tinetti.
The Design and Implementation of Astronomical Data Analysis System on HPC Cloud 3
Jaegyoon Hahm, Ju-Won Park, Hyeyoung Cho, Min-Su Shin, Chang Hee Ree
SESSION
HIGH-PERFORMANCE COMPUTING - CLOUD
COMPUTING
Chair(s)
TBA
2 Related Works
1 Introduction There have been several examples of cloud applications
Recently, in the field of science and technology, more for astronomical research. The Gemini Observatory has been
and more data is generated through advanced data-capturing building a new archive using EC2, EBS, S3 and GLACIER from
sources [1]. And naturally, researchers are increasingly using the Amazon Web Services (AWS) cloud to replace the
cutting-edge data analysis techniques, such as big data existing Gemini Science Archive (GSA) [4]. In addition,
analysis and machine learning. Astronomy is a typical field Williams et al.(2018) have conducted studies to reduce the
of collecting and analyzing large amounts of data through Panchromatic Hubble Andromeda Treasury (PHAT)
various observation tools, such as astronomical telescopes, photometric data set using Amazon EC2 [5].
and data growth rate will increase rapidly in the near future.
As a notable example, Large Synoptic Survey Telescope Unlike the cases of using public clouds, there are also studies
(LSST) will start to produce large volume of datasets up to that use an private cloud environment to be built to perform
20TB per day from observing large area of the sky in full astronomical researches. AstroCloud [6] is a distributed
operations from 2023. Total database for ten years is cloud platform which integrates lots of data management and
expected to be 60 PB for the raw data, and 15 PB for the processing tasks for Chinese Virtual Observatory (China-
catalog database [2]. As another big data project, Square VO). In addition, Hahm et al.(2012) developed a platform for
Kilometer Array (SKA), which will be constructed as the constructing virtual machine-based condor clusters for
largest in the world radio telescope until 2024, is also analyzing astronomical time-series data in a private cloud
projected to generate and archive 130-300PB per year [3]. [7]. The purpose of this study was to confirm the possibility
of constructing a cluster type analysis platform to perform
In this era of data deluge, there is a growing demand for mass astronomical data analysis in a cloud environment.
utilizing cloud computing for data intensive sciences.
Fig. 5. Data Analytics System Architecture Fig. 6. HEAT Template for Analysis Platform with REDIS & DASK
A combination of task scheduler and data I/O environment processing in stream, and various experiments need be
can be created and configured automatically in an performed through the cloud. Based on the experiences of
orchestration environment through OPENSTACK HEAT or building astronomical big data processing environment in
KUBERNETES in our self-contained cloud. Figure 6 is the this study, we will provide more flexible and high
structure of one of the HEAT templates used in this performance cloud service and let researchers utilize it in
experiment. The template is structured with parameters and various fields of data-centric researches.
resources. The resource is composed of scheduler and
workers, and the required softwares are installed and 6 References
configured for each scheduler and workers after boot-up.
[1] T. Hey, S. Tansley and K. Tolle, The Fourth Paradigm:
Data-intensive Scientific Discovery, Microsoft Research,
5 Conclusion and Future Work 2009.
Through experiments, we have successfully analyzed [2] LSST Corporation. About LSST: Data Management.
about 5,300 galaxy brightness and color data in a parallel [Online]. Available from: https://www.lsst.org/about/dm/
distributed processing environment consisting of DASK or 2019.03.10
CELERY with REDIS. Figure 7 shows one of the example galaxy
from the GAMA data showing a result of the MAGPHYS [3] P. Diamond, SKA Community Briefing. [Online].
analysis in the cloud. With OPENSTACK-based cloud, we Available from https://www.skatelescope.org/ska-
confirmed that the research environment, especially data community-briefing-18jan2017/ 2019.03.10
analysis system with tools like task scheduler and in-memory [4] P. Hirest and R. Cardenes, “The new Gemini
DB, can be automatically configured and well-utilized. In Observatory archvieve: a fast and low cost observatory data
addition, we confirmed the availability of an elastic service archive running in the cloud”, Proc. SPIE 9913, Software
environment through the cloud to meet the demand for large- and Cyberinfrastructure for Astronomy IV, 99131E (8
scale data analysis with volatility. August 2016); doi: 10.1117/12.2231833
[5] B. F. Williams, K. Olsen, R. Khan, D. Pirone and K.
Rosema, “Reducing and analyzing the PHAT survey with the
cloud”, The Astrophysical Journal Supplemement Series,
Volume 236, Number 1
[6] C. Cui et al., “AstroCloud: a distributed cloud
computing and application platform for astronomy”, Proc.
WCSN2016
[7] J. Hahm et al., “Astronomical time series data analysis
leveraging sceince cloud”, Proc. Embedded and Multimedia
Computing Tehnology and Service, pp493-500, 2012
[8] S. P. Driver et al., “Galaxy And Mass Assembly
(GAMA): Panchromatic Data Release (far-UV-far-IR) and
the low-z energy budget”, MNRAS 455, 3911-3942, 2016.
[9] OpenStack Foundation. OpenStack Overview. [Online].
Available from: https://www.openstack.org/software/
Fig. 7. An example result of the MAGPHYS analysis on the cloud 2019.03.10
[10] Red Hat Inc. Ceph Introduction. [Online]. Available
In this study, we have identified some useful aspects of the from: https://ceph.com/ceph-storage/ 2019.03.10
cloud for data-driven research. First, we confirmed that it is
[11] The Kubernetes Authors. What is Kubernetes?. [Online].
easy to build an independent execution environment that
Available from: https://kubernetes.io/docs/concepts/overview/what-
provides the necessary software stack for research through
is-kubernetes/ 2019.03.10
the cloud. Also in a cloud environment, researchers can
easily reuse the same research environment and share [12] Dask Core Developers, Why Dask?. [Online]. Available
research experience by reusing virtual machines or container from: https://docs.dask.org/en/latest/why.html 2019.03.10
images deployed by the research community. [13] A. Solem, Celery - Distributed Task Queue. [Online].
Available from: http://docs.celeryproject.org/en/latest/index.html
In the next step, we will configure an environment for real- 2019.03.10
time processing of in-memory cache data. For practical real-
[14] S. Sanfilippo, Introduction to Redis. [Online]. Available
time data processing, it is necessary to construct an optimal
environment for data I/O as well as memory-based data from: https://redis.io/topics/introduction 2019.03.10.
SESSION
HIGH-PERFORMANCE COMPUTING - HADOOP
FRAMEWORK
Chair(s)
TBA
Lan Yang
Computer Science Department
California State Polytechnic University, Pomona
Pomona, CA 91768, USA
Abstract - MapReduce programming model and Hadoop processing and for batch processing only [5]. The
software framework are keys to big data processing on high Apache Spark [6] partially solved Hadoop’s real time
performance computing (HPC) clusters. The Hadoop and batch processing problems by introducing in-
Distributed File System (HDFS) is designed to stream large memory processing [7]. As a model of Hadoop
data sets at high bandwidth. However, Hadoop suffers from a ecosystem Spark doesn’t have its own distributed
set of drawbacks, particularly having issues with small files
filesystem, though it can use HDFS. Hadoop does not
as well as dynamic datasets. In this research we target big
data applications working with many on-demand datasets of suit for small data due to the factor that HDFS lacks the
varying sizes. We propose a speculation model that ability to efficiently support the random reading of
prefetches anticipated datasets for upcoming tasks in support small files because of its high capacity design. Small
of efficient big data processing on HPC clusters. files are the major problem in HDFS.
In this research, we study a special type of iterative
Keywords: Prefetching, Speculation, Hadoop, MapReduce, MapReduce tasks working on HDFS with input datasets
High performance computing cluster. coming from many small files dynamically, i.e. on-
demand. We propose a data prefetching speculation
1 Introduction model aiming at improving the performance and
flexibility of big data processing on Hadoop HDFS for
Along with the emerging technology of cloud that special type of MapReduce tasks.
computing, Google proposed the MapReduce
programming model [1] that allows for massive 2 Background
scalability of unstructured data across hundreds or
thousands of high performance computing nodes. 2.1 Description of a special type of MapReduce
Hadoop is an open source software framework that tasks
performs distributed processing for huge data sets across
the cluster of commodity servers simultaneously. [2] In today’s big data world, MapReduce programming
Now distributed as Apache Hadoop [3] many cloud model and Hadoop software framework remain as
services such as AWS, Cloudera, HortonWorks, and popular tools for big data processing. Based on a
IBM InfoSphere Insights employ Apace Hadoop to offer number of big data applications performed on Hadoop
big data solutions. The Hadoop Distributed File System we observed the following:
(HDFS) [2], inspired by Google File System (GFS) [4], (1) An HDFS file splits into chunks, typically of 64-
is a reliable filesystem of Hadoop designed for storing 128MB in size. To benefit from Hadoop’s parallel
very large files running on a cluster of commodity processing ability an HDFS file must be large enough to
hardware. To process big data in Apache Hadoop, the be divided into multiple chunks. Therefore, a file is
client submits data and program to Hadoop. HDFS considered as small if it is significantly smaller than the
stores the data while MapReduce processes the data. HDFS chunk size.
(2) While many big data applications use large data
While Hadoop is a powerful tool for processing files that could be pushed to HDFS input directory prior
massive data it suffers from a set of drawbacks to task execution, some applications use many small
including issues with small files, no real time data datasets distributed across a wide range.
(3) With the increasing demand of big data processing, 2.3 Computation time vs. data fetch time
more and more applications now require multiple
rounds (or iterations) of processing with each round In this research, we first tested and analyzed data
requiring new datasets determined on the outcome of accessing time ranging from 1K to 16MB on an HPC
previous computation. For example, in a data processing cluster which consists of 2 DL360 management nodes,
application for a legal system, the first round 20 DL160 compute nodes, 3.3 TB RAM, 40GBit
MapReduce computation uses prest ored case InfiniBand, 10GBit external Ethernet connection with
documents, while the second round might require overall system throughput at 36.6 Tflp at double
accessing to certain assets or utilities datasets based on prevision mode and 149.6 Tflp. Slurm job scheduler [8]
the case outcomes resulted from the first-round analysis. is the primary software we use for our testing. The
The assets or utilities datasets could consist of hundreds performance data shown in Figure 1 serve as our basis
to thousands of files ranging from 1KB to 10MB with for deriving the performance of our speculation
only dozens of files relevant depending on the outcome algorithms.
of the first round. It would be very inefficient or
inflexible if we have to divide these two rounds into
separate client requests. Also, if we could overlap
computation and data access time by speculating and
prefetching data we could reduce the overall processing
time significantly. Here we refer to big data applications
with one or more of the above characteristics (i.e.
requiring iterative or multiple passes of MapReduce
computation, using many small files to form a HDFS
chunk, dynamic datasets that are dependent on the
outcome of previous rounds of computation) as a special
type of MapReduce tasks.
2.2 Observation: execution time and HDFS Figure 1: Data Access Performance Base
chunks
We conducted several dozens of big data applications
using Hadoop on a high-performance computing cluster. 3 Speculation and Prefetching Models
Table 1 summarizes the MapReduce performance of 3.1 Speculation model
three relatively large big data analytics tasks.
We establish a connection graph (CG) to represent
! & relations of commonly used tasks with tasks as nodes
% and edges as links between tasks. For example, link a
! birthday party planning task to restaurant reservation
tasks as well as entertainment or recreation tasks. An
! ! , ,*, +. address change task is linked with moving or furniture
shopping tasks. The links on CG are prioritized, for
example, for birthday task, the restaurant task initially is
# '
set with higher priority than the movie ticketing task.
The priorities are in 0.0 to 1.0 range and are
# 0&-. --* -+
dynamically updated based on the outcome of our
"$' prediction. For example, based on the connection in CG
graph and priorities of the links we predict the top two
# /&.0 .0 . tasks following the birthday task are in order of
($ restaurant task and movie task. If for that particular
application it turns out movie task is the correct choice
Table 1: Performance data for some big data thus we will increase the priority by a small fraction,
applications (*requires multi-phase analysis) say 0.1 and capped to 1.0 maximum.
6 References
[1] Jeffrey Dean and Sanjay Ghemawat, MapReduce:
Simplified Data Processing on Large Clusters, Google
Research,
https://research.google.com/archive/mapreduce-
osdi04.pdf
[2] Konstantin Shvachko, Hairong Kuang, Sanjay
Radia, Robert Chansler, The Hadoop Distributed File
System, 2010 IEEE 26th Symposium on Mass Storage
Systems and Technologies (MSST)
[3] Apache Hadoop https://hadoop.apache.org/
[4] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak
Leung, The Google File System,
https://static.googleusercontent.com/media/research.goo
gle.com/en//archive/gfs-sosp2003.pdf
[5] DATAFLAIR Team, 13 Big Limitations of Hadoop
& Solution To Hadoop Drawbacks, https://data-
flair.training/blogs/13-limitations-of-hadoop/, March 7,
2019.
[6] Apache Spark https://spark.apache.org/
[7] Matei Zaharia, Mosharaf Chowdhury, Michael
Franklin, Scott Shenker, Ion Stoica, Spark: Cluster
Computing with Working Sets, Proceedings of the 2nd
USENIX conference on Hot topics in cloud computing,
2010.
[8] Slurm job scheduler, https://slurm.schedmd.com/
[9] Seung Woo Son, Mahmut Kandemir, Mustafa
Karakoy, Dhruva Chakrabarti, A compiler-directed data
prefetching scheme for chip multiprocessors,
Proceedings of the 14th ACM SIGPLAN symposium on
Principles and practice of parallel programming (PPoPP
'09)
[10] Ricardo Bianchinia, Beng-Hong Limb, Evaluating
the Performance of Multithreading and Prefetching in
Multiprocessors, https://doi.org/10.1006/jpdc.1996.0109
SESSION
LATE BREAKING PAPER: CLOUD MIGRATION
Chair(s)
TBA
Abstract - We identified that as private enterprises continue Figure 1. Quarterly revenue of AWS from Q1 to Q4 (in USD
to gravitate toward the cloud to benefit from cost savings, millions). Source: [1].
they may be unprepared to confront four major issues
inherent to cloud architecture. Mitigating risks will require
that migrating organizations properly recognize and
understand: the critical misalignment between service model
selection and consumer expectations within the cloud
architecture, the cloud-borne vulnerabilities and cloud-
specific threats that together create technological
challenges, the causal relationship between customer
misconfigurations and cloud spills, and the complexity of
implementing security controls. Collectively, the four
substantive issues cause risk management to manifests itself
in more complicated permutations in the cloud. To address
these vexing cybersecurity risks, this paper introduces the
unifying concept of transformational migration and
recommends decoding the cloud service model selection,
employing cryptographic erase for applicable use cases,
consulting the broadest cloud security control catalogs in For example, on June 1, 2017, the Washington Post reported
addressing cloud-negative controls, managing supply-chain that a large federal contractor for the Department of Defense
risk through cloud service providers, and adopting a (DoD) accidentally leaked government passwords on an
reconfigured Development Security Operations AWS server related to a work assignment for the National
(DevSecOps) workforce. Geospatial-Intelligence Agency [2]. Regrettably, this is not
an isolated episode but the third recently documented
Keywords: Cloud, Misconfigurations, Risk, Migration, instance of data mishandling by the well-established
Service model government contracting firm. The report went on to describe
a prevalence of government agencies pivoting to the cloud,
with industry leaders substantiating that this is, in fact,
1 Introduction indicative of a more universal shift toward cloud-centric
computing [2].
During the five-year period from 2014-18, the largest
cloud service provider, Amazon Web Services (AWS), a As private enterprises rush to the cloud to reap the
proxy of the accelerating technological migration, benefits of financial savings and increased services, they will
experienced revenue growth at a compound annual growth confront four major issues inherent to cloud architecture.
rate of 47.9% [1]. See Figure 1. This paper posits that the velocity of cloud adoption—
multiplied by the immaturity of the available cloud
This growth in revenue directly corresponds to a growing workforce pool—warrants a rigorous investigation into the
trend of data departing on-premises architectures to cloud sufficiency of risk management capabilities and
destinations. Cost may be a primary causal factor for this preparedness. Managing or mitigating risks will require that
uptick in cloud migration. Cloud service providers charge migrating organizations properly recognize and understand
fixed unitized fees for the work/cycle performed by each
instance of utilization. The tradeoff for these cost-savings, the critical misalignment between service model
however, is potentially magnified insecurity. selection and consumer expectations within the cloud
architecture,
the cloud-borne vulnerabilities and cloud-specific Figure 2. Cloud Security Responsibility Matrix (On-premises
threats that together create technological challenges, Application). Source: [8].
the causal relationship between customer
misconfigurations and cloud spills, and
the complexity of implementing security controls.
determines the level of involvement it must have in application this break in the event of a cyber-incident is the inefficiency of
development within the cloud service provider’s environment. locating stored media, which includes artifacts, log files, and
The service model selection in itself also determines a other evidentiary traces [12]. In on-premises systems, the
particular set of consumer challenges balanced against greater operating systems dependably and centrally manage the
autonomy in managing cloud-specific settings, configurations, consistent generation and storage of valuable evidence traces
and controls. Thankfully, the selection of the cloud service and the information is well documented. The NCC FSWG also
model can highlight the cloud layers for which the consumer is observed that in the cloud, “user based login and controls are
responsible and, therefore, which security controls to typically in the application rather than in the operating system”
implement. The security controls may essentially be the same. [12]. Cloud technologies decouple user identification
However, the implementer of the controls may (and probably credentials from a corresponding physical workstation [12].
will) differ by cloud service model. This suggests that any These idiosyncrasies of cloud architecture also create
meaningful discussion about cloud security will not refer to the inefficiencies in data retrieval.
ubiquitous cloud but will instead reference a specific selected
architecture instantiation, reflecting committed organizational Not only do organizations need to consider the primary
choices. and secondary consequences of diverging from traditional
operating security models, but they also must recognize that
Once organizations have selected the appropriate service the cloud exposes them to new vulnerabilities and threats.
model, they must then address the technological challenges Several cloud vulnerabilities are distinct and completely cloud-
inherent in the cloud with respect to their service model specific. Before designating a vulnerability as cloud-native, it
selection. The confounding aspect of interoperability is that the needs to meet a set of criteria; a litmus test to decide if a
cloud integrates multiple sophisticated technologies, cloud vulnerability should be assigned as cloud-specific.
service providers, servicing counterparties, logical layers, Determining whether a vulnerability is cloud-native is helpful
hardware, and endpoint devices. A cloud service provider’s in discussions with reluctant managers about the relative risk
trustworthiness is compromised if any of the multiple parties of the cloud. Published by the Institute of Electrical and
or technological interchanges is compromised. At the National Electronics Engineers (IEEE), “Understanding Cloud
Institute of Standards Technology (NIST), the NIST Cloud Computing Vulnerabilities” provides a rubric that helps
Computing Forensic Science Working Group (NCC FSWG) determine if vulnerabilities are cloud-specific [13]. According
shares an example of the enmeshed relationships a forensic to the rubric, a vulnerability is cloud-specific if it:
investigator may have to unwind: “A cloud Provider that
provides an email application (SaaS [software as a service]) is intrinsic to or prevalent in a core cloud computing
may depend on a third-party provider to host log files (i.e., technology,
PaaS [platform as a service]), which in turn may rely on a has its root cause in one of NIST’s essential cloud
partner who provides the infrastructure to store log files (IaaS characteristics,
[infrastructure as a service])” [12]. Therefore, technological is caused when cloud innovations make tried and tested
capabilities and limitations dictate the realties that cloud security controls difficult or impossible to implement, or
service providers must integrate. The remote delivery of cloud is prevalent in established state-of-the-art cloud offerings.
services and the cloud service provider’s capacity as an [13]
intermediary give rise to organizational boundary challenges.
The multi-geographical operations of cloud service providers The first bullet refers to web applications, virtualization,
create additional legal challenges, as consumers might fall and cryptography as the core cloud technologies [13]. The
under regulations in multiple jurisdictions if they do not limit second bullet alludes to the five essential characteristics
the location of servers to only organizationally acceptable attributed to NIST—on-demand self-service, broad network
jurisdictions. access, resource pooling, rapid elasticity, and measured service
[13]. The third bullet identifies instances when on-premises
The cloud’s remote delivery presents obstacles to data system security practices do not transfer to the cloud—for
retrieval that are foreign to on-premises systems. Unlike in on- example, the “cloud-negative controls” identified by [9] and
premises systems, cloud storage is neither local nor persistent; elaborated upon in section 3.1, which covers the
physically attached data storage is only temporary, but not after implementation of security controls. The fourth bullet
the abstraction that enables pooling and dynamic customer describes the cloud as pushing present technological
provisioning. The process of abstraction decouples the boundaries. If a vulnerability is identified in an advanced cloud
physical resources through a process called virtualization, offering—one that has not been previously identified—then it
which enables resource pooling. Furthermore, storage is must be a cloud-specific vulnerability. Although there is some
designated as a cloud service provider responsibility, as merit to the argument, the IEEE paper erroneously includes
depicted in Figure 2. The NCC FSWG characterized the weak authentication implementations, which are not
separation of a virtual machine from local persistent storage: technically exclusive to the cloud [13]. Due to the flaw in this
“Thus, the operational security model of the application, which interpretation, this fourth indicator can only be seen as partially
assumes a secure local log file store, is now broken when attributed, or a hybrid cloud-specific vulnerability.
moved into a cloud environment” [12]. The consequence of
Considering the vulnerabilities borne of cloud and mitigate. An informed service model selection can
architectures, it is important to determine which cloud-specific facilitate better prioritization of the pertinent cloud services,
threats could exploit those vulnerabilities. All organizations logical layers, and underlying data structures. The initial
must update their threat model to include cloud-generated benefit of focusing on service model selection is that doing so
threats. For example, bad actors are presently exploiting cloud raises the awareness of additional cloud security challenges,
services in their attacks by remaining anonymous enabling the consumer to abate these issues through a
inexpensively, decentralizing their operations by using combination of policy changes or contracts with additional
multiple cloud service providers, and provisioning superior security services. Data security considerations will directly
computing power for a fraction of the cost with pay-as-you-go address the cloud’s information structures, which is data either
pricing. In 2016, the Cloud Security Alliance released the to be stored or processed by computing processes. Application
“Treacherous 12: Cloud Computing Top Threats in 2016,” security considerations will directly address the cloud’s
which it compiled by surveying cloud industry experts. The application structures, which comprises the application
Treacherous 12 ranks a dozen security concerns in order of services used in building applications or the resultant cloud-
severity (Table 1). deployed application itself [6]. Infrastructure security
considerations will directly address the cloud’s scalable and
Table 1. Treacherous 12 Threats Summary. elastic infrastructure, which comprises the enormity of the
Adapted from [14]. cloud service provider’s pooled core computing, networking,
Rank Threat in Conventional Architectures Threat in the Cloud
and storage resources. Configuration, management, and
1 Data breaches Data breaches administrative security considerations will directly impact the
2
3
Weak access management Weak access management
Insecure APIs
cloud’s metastructures, which enable cohesive functioning of
4 System and application vulnerabilities System and application vulnerabilities communication interoperability protocols between the various
5
6
Account hijacking
Malicious insiders
Account hijacking
Malicious insiders
layer interfaces; critical configuration and management
7 Advance Persistent Threats Advance Persistent Threats settings are embedded in metastructure signals [6].
8 Data loss Data loss
9 Insufficient due diligence Insufficient due diligence
10 Nefarious use of cloud services The merit of this lower-level understanding is a firmer
11
12
Denial of service Denial of service
Shared technology vulnerabilities
comprehension of how standard cloud communication
functions at different layers within the cloud’s shared-
Note that from this analysis, of the 12 greatest estimated responsibilities model. Accordingly, security practitioners map
threats that experts say emanate from the cloud, only three their organizational responsibilities to their service model
point to truly cloud-specific vulnerabilities. Insecure selections. This approach maximizes information security
application programming interfaces (API) (no. 3), nefarious signal-to-noise ratios by only isolating the actionable logical
use of cloud services (no. 10), and shared technology layers. Migrating organizations can begin by replacing
vulnerabilities (no. 12) are the cloud-specific threats that merit applications with software as a service to abandon legacy code,
additional in-depth defense security measures. While not followed by rebuilding cloud native or refactoring backward-
cloud-specific, weak access management, account hijacking, compatible application code with platform as a service, and
malicious insiders, and insufficient due diligence are the next finally by re-hosting (lift and shift) applications to
tier of cloud threats to address. infrastructure as a service that will not benefit from either
current or future cloud capabilities [15]. Software-as-a-service
solutions lack customer customization and can lead to vendor
3 Risk Management Considerations lock-in by making the porting of data more challenging. A
To securely operate in the cloud, risk management vendor lock-in mitigant is service-oriented architecture
considerations must protect an organization’s assets against a development, which produces applications that are treated like
range of undesirable events and associated consequences. “services,” as in “anything as a service.” Once the application
Cloud spills are a compelling example of such events. A data can be treated as a service, it should be able to port or “plug”
spill is any event involving the unauthorized transfer of into any cloud service provider seamlessly and temper the fears
confidential data from an accredited information system to one of having to make large-scale changes to existing code bases
that is not accredited [8]. A cloud spill is a type of data spill, for interoperability with proprietary requirements of the new
specifically originating from a cloud environment. As early as cloud service provider. Service-oriented architecture is easily
2013, the government had investigated data spillage specific to reconfigurable.
the cloud, documented in a Department of Homeland Security
(DHS) presentation on February 14, 2013, “Spillage and Cloud The prevailing methods for either mitigating or responding
Computing.” Clearly, all migrating organizations, but to cloud data spills are insufficient in terms of consumer
especially agencies involved in national security matters, must autonomy and cloud confidentiality. In regard to autonomy,
effectively reduce cloud spills; however, they still have not cloud service providers have invented the concept of bring your
found a solution to this problem. own key (BYOK), which bolsters a false sense of security
regarding consumer encrypted data. BYOK solutions imply that
Instead of reacting to the aftereffects of cloud spills, the consumer’s key is the sole key involved in encrypting and
migrating organizations need to determine how to anticipate decrypting the customer’s data, which is not the case [16]. In
fact, the consumer key is an unnecessary input for the cloud should generate their requirements, map architecture, and
service provider to access the consumer’s data (e.g. responding conclude by diagnosing and then prioritizing the remaining
to subpoena requests). In practice, the cloud service provider security gaps of the cloud service provider [6]. The 2016 SANS
first uses their own key to encrypt the data and the customer key paper indicates that it is imperative for any organization’s
second to encrypt the cloud service provider key. The DoD has security architect to have the ability to discern how on-
recognized this deficiency of the BYOK construct and has premises networks differ from virtualized architecture. The
secured an alternate remediation: cryptographic erase. SANS paper categorizes security controls into cloud-positive,
cloud-negative, and cloud-neutral controls [9]. The three tiers
Cryptographic erase is credited by the DoD as “high- correspond to the ease of application within the cloud. The
assurance data destruction … [in which] media sanitization is SANS recommendation, based upon this awareness, allows the
performed by sanitizing the cryptographic keys used to encrypt security architect to direct greater attention to the cloud-
the data, as opposed to sanitizing the storage locations on media negative controls. Cloud-negative controls emerge when
containing the encrypted data itself” [8]. Sanitization is the implementation is more difficult or cost-prohibitive in the
process of making data unrecoverable. Cryptographic erase cloud [9]. The paper specifically identifies logging, boundary
achieves the goal of data destruction indirectly, by way of key defense, and incident response management as cloud-negative
erasure. Cryptographic erase also accommodates “partial controls.
sanitization,” in which a subset of the data is sanitized, but this
requires the use of unique keys for each subset [8]. NIST 800-53 is heralded as an exhaustive set of security
Cryptographic erase paired with deleting files is more expedient controls. However, the first revision of NIST 800-53 was
than physically sanitizing a cloud service provider environment. published in December 2006, predates widespread cloud
However, cryptographic erase is only effective for encrypted adoption and was better suited for on-premises environments.
data. Therefore, the DoD explicitly tasks its components and In response, FedRAMP (the Federal Risk and Authorization
agencies with ensuring that all DoD data at rest is encrypted. Management Program) is a 2011 federal policy that details the
This acknowledges that any data that is in unencrypted states is minimally required security authorization procedures with
data at risk. Furthermore, the DoD must have exclusive control which an agency must comply when engaging with a cloud
of both the encryption keys and key management; this facilitates service provider for contracted cloud services. FedRAMP was
the DoD’s ability to remediate unilaterally, high-assurance data specifically drafted to direct federal cloud computing
destruction, without any cloud service provider cooperation [8]. acquisitions, and its goal was to accelerate adoption of cloud
services and enforce standardized cybersecurity requirements
However, cryptographic erase is not a panacea. This government-wide. Cloud requirements for the DoD exceed
technology is an effective tool to resolve data spills due to requirements for other federal government agencies; for that
human error, but it would likely prove ineffective against data reason, the DoD issued the Cloud Computing Security
spills initiated by malicious code. Cryptographic erase would Requirements Guide [8], which describes FedRAMP+.
be unable to contain a running process while data is still in use. FedRAMP+ adds DoD-specific security controls to fulfill the
Additionally, cryptographic erase is only effective in DoD’s mission requirements. FedRAMP+ is the cloud-
infrastructure as a service—and some platform as a service— computing customized approach to NIST 800-53 security
cloud deployments when the consumer determines exactly how controls. These controls “were selected primarily because they
the data is stored. Although the DoD has been able to resort to address issues such as the Advanced Persistent Threat (APT)
cryptographic erase as a reactionary measure, private and/or Insider Threat, and because the DoD … must categorize
enterprise consumers now aware of the BYOK misnomer its systems in accordance with CNSSI 1253, beginning with its
should focus their attention on prevention. Customer baselines, and then tailoring as needed” [8]. CNSSI 1253 is the
misconfiguration prevention begins when the consumer Committee on National Security Systems Instruction No. 1253
directly maps security controls to the logical layers for which Security Categorization and Control Selection for National
they are explicitly responsible as a result of their service model Security Systems [18]. A comparison of security controls
selection. indicates that 32 CNSSI 1253 controls were added to the NIST
SP 800-53 moderate baseline and 88 NIST 800-53 moderate
controls were subtracted from the CNSSI 1253 moderate
3.1 Implementation of Security Controls
baseline [8]. Non-DoD entities also seeking security controls
When consumers transition from on-premises systems, that surpass federal government agency standards may refer to
they will find gaps within their existing security policies and CNSSI 1253 for more granular control options. Additionally,
how they interplay with the contracted terms and conditions of the Cloud Controls Matrix, published by the CSA, is a rational
an executed service-level agreement. A wide variability exists catalog to begin with because it maps its controls side-by-side
among cloud service providers with respect to defined terms with many other control catalogs for easy comparison.
and related metrics [17]. Consumers should focus on the
definitions used in each agreement. The goals of each specific 4 Transformational Migration
cloud project, service model, and cloud service provider
platform are the critical inputs in determining the additional Ultimately, there is a viable solution for the challenges
countermeasures the project should integrate. Organizations that migrating organizations face when transitioning to a
robust, secure cloud environment. However, the solution will effectively implement critical risk management controls while
require those organizations to reorganize people and processes avoiding detrimental misconfigurations when migrating to the
to minimize the existing gaps between how traditional cloud.
applications operate and how cloud computing applications are
configured. It will also require organizations to incorporate The research presented in this paper is part of Michael
broad uses of encryption, digital forensic incidence-response Atadika’s thesis conducted at and published for public
processes tailored to cloud architectures, practicable release by the Naval Postgraduate School [20].
workarounds that address cloud-negative security controls, and
continuous mandatory cloud training. Transformational 6 References
migration accounts for these requirements by better aligning
processes with how the cloud actually functions.
[1] Statista [Internet]. [date unknown]. Amazon Web
Transformational migration mandates the collocation of Services: quarterly revenue 2014-2018. Hamburg (Germany):
relevant data sets through secure application programming Statista; [cited 2019 Feb 22]. Available from:
interface calls. Additionally, it supports extending the https://www.statista.com/statistics/250520/forecast-of-
perimeter from the network boundary to include the boundary amazon-web-services-revenue
of specific chunks of data. Extending the perimeter enables the
migrating organization to leverage metadata tagging to [2] Gregg, A [Internet]. 2017, Jun. 1. Booz Allen Hamilton
administer stricter enforcement of file authorizations and legal employee left sensitive passwords unprotected online.
compliance. Transformational migration mandates security Washington (DC): Washington Post; [cited 2018 Mar 2].
through the complete data security lifecycle: creating, storing, Available from: https://www.washingtonpost.com/business/
processing, sharing, archiving, and destructing [6]. capitalbusiness/government-contractor-left-sensitive-
passwords-unprotected-online/2017/06/01/916777c6-46f8-
“Deliver Uncompromised” is a new strategy to address 11e7-bcde-624ad94170ab_story.html?utm_term=
cybersecurity lapses that extend to DoD contractors [19]. .6cad14ff8b95
Deliver Uncompromised encourages adding security
assessment attainment levels in the awarding of contracts along [3] [DoD] Department of Defense. [Internet]. [updated 2018
with traditional cost and performance considerations. The new Oct 2; cited 2019 Mar 14]. Defense innovation board do’s and
supply-chain risk management strategy believes the cloud can don’ts for software. Washington (DC): Department of Defense.
contribute to protecting the DoD supply-chain by specifically Available from:
encouraging its contractors “to shift information systems and https://media.defense.gov/2018/Oct/09/2002049593/-1/-
applications to qualified, secure cloud service providers” [19]. 1/0/DIB_DOS_DONTS_SOFTWARE_2018.10.05.PD
This strategy can also be applied to non-DoD supply-chains.
[4] Bommadevara N., Del Miglio A., Jansen S [Internet].
5 Conclusion 2018. Cloud adoption to accelerate IT modernization. New
York (NY): McKinsey Digital; [cited 2018 May 18]. Available
Transformational migration is a strategy to prevail the well- from: https://www.mckinsey.com/business-functions/digital-
worn pattern of human misunderstandings largely driving cloud mckinsey/our-insights/cloud-adoption-to-accelerate-it-
misconfigurations, which eventually become cloud data spills modernization
that require a digital forensic incident-response. A better
understanding of how the service model relates to the intent of [5] Odell, L., Wagner, R., & Weir, T [Internet]. 2015.
the application can reduce the risk of customer Department of Defense use of commercial cloud computing
misconfigurations, which produces a more robust cybersecurity capabilities and services. Alexandria (VA): Institute for
risk posture. Migrating organizations will also require Defense Analyses; [cited 2018 Aug 23]. Available from:
transitioning application professionals to a new dynamic: a http://www.dtic.mil/dtic/tr/fulltext/u2/1002758.pdf
transformational workforce with the dexterity to remediate issues
at multiple cloud logical layers. The DevSecOps model, [6] [CSA] Cloud Security Alliance [Internet]. 2017. Security
comprised of both the newly hired and the retrained workforce, guidance: For critical areas of focus in cloud computing v4.0.
is an integrated team of problem solvers with diverse experiences Seattle (WA): Cloud Security Alliance; [cited 2018 Apr 10].
from application development, engineering, and security Available from: https://cloudsecurityalliance.org/
disciplines. The DevSecOps teams are tasked with developing guidance/#_overview
and continuously tuning applications by addressing security at
multiple layers and for the complete data life cycle. The [7] van Eijk, P H J [Internet]. 2018. Cloud migration
DevSecOps model is endorsed by the Defense Innovation Board strategies and their impact on security and governance Seattle
for its comprehensive resolution of existing misalignments (WA): Cloud Security Alliance; [cited 2019 Mar 14].
between information security professionals and cloud Available from https://blog.cloudsecurityalliance.org/2018/06/
technologies [3]. Using the recommendations of transformational 29/cloud-migration-strategies-impact-on-security-
migration as a guide, DevSecOps teams will be able to more governance/
[8] [DISA] Defense Information Systems Agency [Internet]. [17] [CIO & CAO]. Chief Information Officer Council &
2017, Mar 6. Department of Defense Cloud Computing Chief Acquisition Officers Council [Internet]. 2012. Creating
Security Requirements Guide, version 1, release 3. effective cloud computing contracts for the federal
Washington (DC): Department of Defense; [cited 2018 Apr government: Best practices for acquiring IT as a service.
10]. Available from: https://www.complianceweek.com/ Washington (DC): Chief Information Officer Council & Chief
sites/default/files/department_of_defense_cloud_computing_s Acquisition Officers Council; [cited 2018 Dec 07]. Available
ecurity_requirements_guide.pdf from: https://www.cio.gov/2012/02/24/cloud-computing-
update-best-practices-for-acquiring-it-as-a-service/
[9] SANS Institute [Internet]. 2016. Implementing the
critical security controls in the cloud. North Bethesda (MD): [18] [CNSS] Committee on National Security Systems
SANS Institute; [cited 2017 Oct 20]. Available from: [Internet]. 2014. Security categorization and control selection
https://www.sans.org/reading-room/whitepapers/critical/ for national security systems, CNSSI No. 1253. Washington
implementing-critical-security-controls-cloud-36725 (DC): Department of Defense; [cited 2018 May 21]. Available
from: http://www.dss.mil/documents/CNSSI_No1253.pdf
[10] Clarke, G [Internet]. 2015, Apr. 13. Self preservation is
AWS security’s biggest worry, says gros fromage. London [19] Nakashima, E., Sonne, P [Internet]. 2018, Aug 13.
(UK): The Register; [cited 2017 Oct 9]. Available from: Pentagon is rethinking its multibillion-dollar relationship with
https://www.theregister.co.uk/2015/04/13/aws_security_sleep U.S. defense contractors to boost supply chain security.
less_nights/ Washington (DC): Washington Post; [cited 2018 Aug 13].
Available from: https://www.washingtonpost.com/world/
[11] [CIA] Central Intelligence Agency [Internet]. 2014, Dec. national-security/the-pentagon-is-rethinking-its-multibillion-
17. CIA creates a cloud: An interview with CIA’s chief dollar-relationship-with-us-defense-contractors-to-stress-
information officer, Doug Wolfe, on cloud computing at the supply-chain-security/2018/08/12/31d63a06-9a79-11e8-
agency. Washington (DC): Central Intelligence Agency; [cited b60b-1c897f17e185_story.html?utm_term=.60664aebdfb8
2018 Mar 8]. Available from: https://www.cia.gov/news-
information/featured-story-archive/2014-featured-story- [20] Atadika, M. Applying U.S. military cybersecurity
archive/cia-creates-a-cloud.html policies to cloud architectures [master’s thesis]. Monterey
(CA): Naval Postgraduate School. 2018. 102p.
[12] [NCC FSWG] NIST Cloud Computing Forensic Science
Working Group (NCC FSWG) [Internet]. 2014. NIST cloud
computing forensic science challenges, Draft NISTIR 8006.
Gaithersburg (MD): NIST; [cited 2018 May 7]. Available
from: https://csrc.nist.gov/publications/detail/nistir/8006/draft
Atadika, Michael - 15
Burke, Karen - 15
Cho, Hyeyoung - 3
Hahm, Jaegyoon - 3
Park, Ju-Won - 3
Ree, Chang Hee - 3
Rowe, Neil - 15
Shin, Min-Su - 3
Yang, Lan - 9