Chapter4-Cloud Storage
Chapter4-Cloud Storage
COMPUTING
COURSE
Chapter4-Cloud Storage
CLOUD COMPUTING COURSE
When you buy a server or storage array these days, you often have the choice between different
kinds of hard drives:
HDDs (Hard Disk Drive): are considered as an irreplaceable and important storage device that
uses a mechanical arm with a read/write head to move around and read information from the
right location on a storage platter.
HDD Types:
1
CLOUD COMPUTING COURSE
Currently, SATA and SAS hard drives are commonly used and distinguished by interface type.
1. PATA (Parallel advanced technology attachment): this transfers data in parallel mode
and it is not used any more.
2. SATA hard drive (Serial advanced technology attachment): it replaced all PATA
interfaces, SATA transfers data in serial mode. One of its distinctive features is that
SATA is faster than PATA and have a very high capacity.
SATA HDD
3. SAS
Serial attached SCSI (SAS), that is, small computer system interface. Similar to SATA,
SAS uses serial transfer to achieve higher performance. Compared with a SATA hard
drive, a SAS hard drive has a higher read/write speed, higher price, and smaller
capacity.
SAS HDD
In reliability, SAS disks are an order of magnitude safer than either NL-SAS or SATA disks. This
metric is measured in bit error rate (BER), or how often bit errors may occur on the media.
4. NL-SAS
Near Line SAS (NL-SAS) is a type of hard drive between SATA and SAS. Therefore, an NL-SAS
hard drive is developed, which consists of SAS interfaces and SATA platters (head, media, and
rotational speed).
2
CLOUD COMPUTING COURSE
NL_SAS HDD
NL-SAS has the capacity and price of the SATA hard drive and the reliability of the SAS hard
drive.
- SSDs offer up to 100 time’s greater throughput than traditional hard drives, which means
faster boot ups and better overall system performance.
- Unlike SATA and SAS hard drives, SSDs have no moving parts making them the most
reliable and efficient option.
- The lack of moving parts also means a reduced risk of failure and an increase in
power efficiency compared to traditional hard drives.
- SSDs, as storage media with higher read/write speeds than traditional HDDs, have
received widespread attention.
- Traditional HDDs are disk drives and data is stored in disk sectors. The common storage
medium of an SSD is flash memory.
- Because of the performance, speed, and reliability benefits, solid-state drives are
generally more expensive than SAS and SATA hard drives.
- SSDs are ideal for high frequency immediate transactional data like database, CRM
or bank transactions.
- SSDs are one of the most essential storage devices in the ultra-thin laptop and tablet
fields.
- In addition, SSDs have distinctive features such as shock resistance, small size, no
noise, and low cooling requirements.
- An SSD consists of a flash controller and memory chips. The flash controller controls
the coordination of the data read/write process and the memory chips are responsible for
storing data.A most common type is to use a flash memory chip as the storage
medium, and the other type is to use a dynamic random access memory (DRAM) chip
as the storage medium.
3
CLOUD COMPUTING COURSE
Flash-based SSDs
The most common SSDs use a flash memory chip as the storage
medium. Flash memory chips can be manufactured into various
electronic products, such as SSDs, memory cards, and USB drives.
These devices have small size and easy to use.
Disadvantage of SSD
4
CLOUD COMPUTING COURSE
Centralized Storage
Centralized storage means that all storage resources are centrally deployed and are provisioned
With centralized storage, all physical disks are centrally deployed in disk enclosures and are used
to provide storage services externally through the controller. Centralized storage typically refers
to disk arrays.
A disk array combines multiple physical disks into a single logical unit. Each disk array consists of
one controller enclosure and multiple disk enclosures. This architecture delivers an intelligent
storage space featuring high availability, high performance, and large capacity.
Disk Array
Examples of DAS include hard drives, solid-state drives, optical disc drives, and storage
on external drives. DAS is the most widely used storage system before SAN is
introduced.
5
CLOUD COMPUTING COURSE
DAS Types:
- Standalone
- JBOD (Spanned)
- RAID
centralized data center for access by different hosts and application servers.
- An NAS server contains storage devices, such as disk arrays, CD/DVD drives,
6
CLOUD COMPUTING COURSE
Simply speaking, a NAS is a device that is connected to the network with file storage
- The advantage of NAS is that it can deliver file storage services in a fast and cost-
effective manner using existing resources in the data center. The current solution
is compatible between UNIX, Linux, and Windows OSs, and can be easily
connected to users' TCP/IP networks. The following shows the NAS system.
CIFS vs NFS are protocols developed to permit a client system to access the file system
on remote computing devices like server or personal computers (File Sharing Protocols).
1. NFS is the “Network File System” specifically used for Unix and Linux operating systems
systems to share files and folders.. It allows files communication transparently between
servers and end users machines like desktops & laptops. NFS uses client- server
methodology to allow user to view read and write files on a computer system.
7
CLOUD COMPUTING COURSE
2. CIFS is abbreviation for “Common Internet File System” used by Windows operating systems
for file sharing. CIFS also uses the client-server methodology where a client makes a request
of a server program for accessing a file .The server takes the requested action and returns a
response. CIFS is an open standard version of the Server Message Block Protocol (SMB)
Now, data has become the most valuable asset since the Internet and e-commerce are growing
at an explosive rate. How to store, protect and manage data in an effective way is a big challenge
for IT technicians. Here, SAN, storage area network, has been developed to provide a creative
model for data storing in the data center.
SAN Definition: is a specialized, high-speed network that provides block-level network access to
storage.
SAN Component: SANs are typically composed of hosts, switches and storage devices that are
interconnected using a variety of technologies, topologies, and protocols.
8
CLOUD COMPUTING COURSE
SAN devices appear to servers as locally attached drives, eliminating traditional network bottlenecks.
A SAN is block-based storage, leveraging a high-speed architecture that connects servers to their
logical disk units (LUNs).
A LUN is a range of blocks provisioned from a pool of shared storage and presented to the server as a
logical disk.
The server partitions and formats those blocks, typically with a file system, so that it can store data on
the LUN just as it would on local disk storage.
SAN Types
SAN solutions are available as many types, the most common two types are:
FC-SAN (Fiber Channel SAN): Storage and servers are connected via a high-speed network of
interconnected fiber channel switches. This is used for mission-critical applications where
uninterrupted data access is required.
FC SAN Network
9
CLOUD COMPUTING COURSE
Internet Small Computer System Interface (iSCSI) Protocol/IP SAN: This infrastructure gives
the flexibility of a low-cost IP network. It uses iSCSI protocol, a transmission standard over
TCP/IP, to transfer block data over an Ethernet network. IP SAN allows different servers to
access pools of the shared block storage devices by storage protocols.
IP SAN network
10
CLOUD COMPUTING COURSE
IP SAN FC SAN
NAS SAN
Main Target Are methods of managing storage centrally and sharing that storage with
multiple hosts (servers) .
Network Ethernet-based Ethernet and Fibre Channel
Focus on Ease of use, manageability, scalability, and high performance and low
lower total cost of ownership (TCO). latency
Partitioning NAS storage controllers partition the The server partitions and
storage and then own the file system formats SAN storage blocks
with its file system
Protocols CIFS, NFS FCP, iSCSI Protocol
RAID
RAID (Redundant Array of Inexpensive Disks or Drives, or Redundant Array of
Independent Disks): A data storage virtualization technology that combines multiple physical
disk drive components into one or more logical units for the purposes of data reliability,
performance improvement, or both.
Benefits of RAID
RAID allows us to combine drives, increase speed and improve reliability so we can store more
data while our data is stored safer.
11
CLOUD COMPUTING COURSE
RAID allows us to bundle multiple hard-drives that your Operating System (Windows, MacOS X,
Linux, etc) will see as one single big drive.
Example
RAID is an umbrella name for different variations of combining hard-disks that would then appear
to your computer as one single hard-disk.
Multiple disks are combined with different goals in mind, hence the different variations. Goals can
be Faster access to data, more storage space and/or higher reliability.
These variations are named “RAID” followed by a number. For example RAID 0, RAID 1, etc.
Each fitting for a particular purpose.
RAID 0 – Striping
RAID 1 – Mirroring
12
CLOUD COMPUTING COURSE
RAID 0 – Striping
In RAID 0 data is written in parallel to 2 or more disks. Think of it as writing block 1 to disk 1,
block 2 to disk 2, block 3 to disk 3 etc.
Positives:
- The great advantage is SPEED as data is written or read in parallel.
- Another advantage is that all of the storage capacity will be available for storage.
Negatives:
The big downside is lack of reliability: If one disk crashes all data is gone, as half of the
information will be lost. RAID 0 is not fault-tolerant.
RAID 0 – Striping
Ideal use
RAID 0 is ideal for non-critical storage of data that have to be read/written at a high speed, such
as on an image retouching or video editing station.
RAID 1 – Mirroring
RAID 1 arrays mirrored data of one drive to another drive. Therefore, when your computer writes
data to the array, the controller will actually write the data twice: to disk 1 and to disk 2.
If a drive fails, the controller uses either the data drive or the mirror drive for data recovery and
continues operation.
13
CLOUD COMPUTING COURSE
Positives:
Increased reliability since data is always saved in duplicate on two different drives. If one drive
dies, the other will still be able to provide you the data. In case a drive fails, data do not have to
be rebuild, they just have to be copied to the replacement drive.
RAID 1 is a very simple technology.
Negatives:
The main disadvantage is that the effective storage capacity is only half of the total drive capacity
because all data get written twice.
RAID 1 – Mirroring
Ideal use
RAID-1 is ideal for mission critical storage, for instance for accounting systems.
- Data blocks are striped across the drives and on one drive a parity checksum of all the
block data is written.
- The parity data are not written to a fixed drive, they are spread across all drives, as the
drawing below shows.
- Using the parity data, the computer can recalculate the data of one of the other data
blocks, should those data no longer be available.
14
CLOUD COMPUTING COURSE
- That means a RAID 5 array can withstand a single drive failure without losing data.
Space efficiency: Sum of the capacity of all drives – (Sum of the capacity of all drives /
number of drives)
Advantages:
Good performance: Read data transactions are very fast while write data transactions
are somewhat slower (due to the parity that has to be calculated).
Very good data reliability: If a drive fails, you still have access to all data, even while the
failed drive is being replaced and the storage controller rebuilds the data on the new
drive.
Disadvantage:
This is complex technology. If one of the disks in an array using 4TB disks fails and is replaced,
restoring the data (the rebuild time) may take a day or longer, depending on the load on the array
and the speed of the controller. If another disk goes bad during that time, data are lost forever.
Ideal use
RAID 5 is a good all-round system that combines efficient storage with excellent security and
decent performance. It is ideal for file and application servers that have a limited number of data
drives.
RAID 6 – Striping and Double rotating Parity
RAID 6 can be seen as an extended RAID 5. It actually maintains 2 parity calculations which are
written to two drive, and like RAID 5 these are equally distributed offering the ability to even
recover from 2 failing drives.
Minimum number of required drives: 4 drives.
15
CLOUD COMPUTING COURSE
Space efficiency:
Sum of the capacity of all drives – ( 2 x Sum of the capacity of all drives / number of drives
)
Advantages
- Good speed performance: Like with RAID 5, read data transactions are very fast.
- Higher reliability: If two drives fail, you still have access to all data, even while the failed drives
are being replaced. Therefore, RAID 6 is more secure than RAID5.
Disadvantages
- Write data transactions are slower than RAID 5 due to the additional parity data that have
to be calculated.
- The remaining capacity will be lower than RAID 5.
Ideal use RAID 6 is a good all-round system that combines efficient storage with excellent
security and decent performance. It is preferable over RAID 5 in file and application servers that
use many large drives for data storage.
Below an example of RAID 10 or RAID 1 + 0 … Data is written in parallel (RAID 0) and then
written mirrored (RAID 1).
16
CLOUD COMPUTING COURSE
It is possible to combine the advantages (and disadvantages) of RAID 0 and RAID 1 in one single
system. This is a nested or hybrid RAID configuration. It provides security by mirroring all data on
secondary drives while using striping across each set of drives to speed up data transfers.
Other variations are RAID 0+1, RAID 50 (RAID 5 + RAID 0), RAID 100 (RAID 1 + RAID 0 + RAID
0), RAID 53 (RAID 5 + RAID 3) and RAID 60 (RAID 6 + RAID 0).
Positives
Custom RAID. You can configure your own fitting your own needs.
Negatives
Not all RAID controllers support this and the resulting storage capacity will be much lower than
other configurations.
17
CLOUD COMPUTING COURSE
Storing data on a multitude of standard servers (using servers local storage rather than using
dedicated storage device), which behave as one storage system although data is distributed
between these servers.
18
CLOUD COMPUTING COURSE
Centralized Storage
Distributed Storage
19
CLOUD COMPUTING COURSE
Replication Mechanism
Replication in Cloud Computing refers to multiple storage of the same data to several different
locations by usually synchronization of these data sources. Replication in Cloud Computing is
partly done for backup and on the other hand to reduce response times, especially for reading
data requests.
BigTable is a distributed storage system, which is designed by Google, just as its name, it is
proposed to deal with data in large scale.
20
CLOUD COMPUTING COURSE
Virtualized Storage
Storage Virtualization
Storage virtualization is the process of grouping the physical storage from multiple network
storage devices so that it looks like a single storage device.
The process involves abstracting and covering the internal functions of a storage device from
the host application, host servers or a general network in order to facilitate the application and
network-independent management of storage.
The management of storage and data is becoming difficult and time consuming. Therefore, the
storage virtualization helps to:
We know that physical disks reside at the bottom of the storage system, either centralized or
distributed. After RAID or replication is implemented, physical volumes are created on top of
these physical disks.
In most cases, the physical volumes are not directly mounted to upper-layer applications, for
example, OSs or virtualization systems.
21
CLOUD COMPUTING COURSE
Therefore, typically, multiple physical volumes are combined into a volume group, and then the
volume group is virtualized into multiple logical volumes (LVs). The upper-layer applications use
the spaces of the LVs.
2. Virtualized Storage
22
CLOUD COMPUTING COURSE
Cloud Storage
Cloud storage Definition
Cloud storage is a cloud computing model in which data is stored on remote servers accessed
from the internet, or "cloud." It is maintained, operated and managed by cloud storage service
provider on storage servers that are built on virtualization techniques.
Cloud storage works through data center virtualization, providing end users and applications with
a virtual storage architecture that is scalable according to application requirements.
The cloud storage system stores multiple copies of data on multiple servers, at multiple
locations. If one system fails, then it is required only to change the pointer to the location, where
the object is stored.
Vendors use different virtual file systems. For example, VMware uses Virtual Machine File
System (VMFS), and Huawei uses Virtual Image Manage System (VIMS). Both of them are
high-performance cluster file systems that deliver a capacity exceeding the limit of a single
system and allow multiple compute nodes to access an integrated clustered storage pool.
The minimum storage unit used by virtualization programs is logical unit number (LUN).
LUNs correspond to volumes. Each LUN represents an individual storage volume (it acts as
an identifier for a disk volume in the storage).
Storage Resource Pool: is a pool that pools the physical storage from different network storage
devices and makes it appear to be a single storage unit that is handled from a single console.
Capacity - Once being created, get full storage - Get storage capacity on demand.
capacity from the storage pool. - Higher storage capacity utilization
23
CLOUD COMPUTING COURSE
VM Disks
A VM consists of configuration files and disk files. Each VM disk corresponds to a disk file
where user data is stored.
If virtualized storage is used, all disk files are stored in the shared directory of the file system.
If non-virtualized storage is used, each disk file corresponds to a LUN.
From the perspective of users and OSs, either files or LUNs are the same as common hard
drives, which are displayed as hard drives among the hardware resources of the system.
When creating a VM, the administrator needs to create disks for the VM to store data. The disk
information corresponds to several lines in the configuration file.
VM disk files have their own fixed formats and there is many common VM disk format as shown
in the table:
Common VM disk formats
VM Disk File Format Supported Vendor, Product, or Platform
RAW All vendors
VMDK VMware
For example, Huawei Rainbow can convert third-party or open-source VM disks to the VHD
format.
24
CLOUD COMPUTING COURSE
25
CLOUD COMPUTING COURSE
2. Storage devices: In Huawei FusionCompute, these storage units are called storage
devices, include: LUNs, File systems, Storage pools, Local disks.
4. Virtual disks: After data stores are associated with hosts, virtual disks can be created
for VMs.
26
CLOUD COMPUTING COURSE
1. Before using datastores, you need to manually add storage resources. If the storage
resources are IP SAN, FusionStorage, or NAS storage, you need to add storage ports for
hosts in the cluster and use the ports to communicate with the service ports of centralized
storage controller or the management IP address of FusionStorage Manager. If the
storage resources are provided by FC SAN, you do not need to add storage ports.
2. After adding storage resources, you need to scan for these storage devices on the
FusionCompute portal to add them as datastores.
3. Datastores can be virtualized or non-virtualized.
4. You can use LUNs as datastores and connect them to VMs from the SAN without
creating virtual disks. This process is called raw device mapping (RDM). This
technology applies to scenarios requiring large disk space, for example, database server
construction. RDM can be used only for VMs that run certain OSs.
5. After adding data stores, you can create virtual disks for VMs.
Based on sharing type, VM disks are classified as non-shared disks and shared disks
− Shared: A shared disk can be used by multiple VMs. If multiple VMs that use a shared
disk write data into the disk at the same time, data may be lost. Therefore, you need to
use application software to control disk access permissions.
Based on the configuration mode, VM disks can be classified as common disks, thin
provisioning disks, and thick provisioning lazy zeroed disks.
− Common: The system allocates disk space based on the disk capacity.
The performance of the disks in this mode is better than that in the other two
modes, but the creation duration may be longer than that required in the other
modes.
− Thin provisioning: In this mode, the system allocates part of the configured
disk capacity for the first time, and allocates the rest disk capacity based on the
storage usage of the disk until the configured disk capacity is allocated. In this
mode, datastores can be overcommitted. It is recommended that the datastore
overcommit rate not exceed 50%. For example, if the total capacity is 100 GB,
27
CLOUD COMPUTING COURSE
the allocated capacity should be less than or equal to 150 GB. If the allocated
capacity is greater than the actual capacity, the disk is in thin provisioning mode.
− Thick provisioning lazy zeroed: The system allocates disk space based on
the disk capacity. In this mode, the disk creation speed is faster than that in the
Common mode, and the I/O performance is between the Common and Thin
provisioning modes. This configuration mode supports only virtualized local disks
or virtualized SAN storage.
After a snapshot is taken for a VM, if disks on the VM are detached from the VM
and not attached to any other VM, the disks will be attached to the VM after the
VM is restored using the snapshot.
However, data on the disks will not be restored. If a disk is deleted after a
snapshot is created for the VM, the disk will not be attached to the VM after the
VM is restored using the snapshot.
Some disk types cannot be changed once they are set and some can be
changed. For example, disk modes can be converted.
28