0% found this document useful (0 votes)
2 views

CCV Notes Unit 3

Cloud storage is a model for storing data on the internet via a cloud provider, offering scalability, cost-effectiveness, and accessibility without the need for managing physical infrastructure. It supports various use cases such as analytics, backup, and compliance, while ensuring security and durability of data. Key types include object, file, and block storage, each suited for different data management needs.

Uploaded by

syb9721
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CCV Notes Unit 3

Cloud storage is a model for storing data on the internet via a cloud provider, offering scalability, cost-effectiveness, and accessibility without the need for managing physical infrastructure. It supports various use cases such as analytics, backup, and compliance, while ensuring security and durability of data. Key types include object, file, and block storage, each suited for different data management needs.

Uploaded by

syb9721
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

What is cloud storage?

Cloud storage is a cloud computing model that enables storing data and files on the internet
through a cloud computing provider that you access either through the public internet or a
dedicated private network connection. The provider securely stores, manages, and maintains
the storage servers, infrastructure, and network to ensure you have access to the data when
you need it at virtually unlimited scale, and with elastic capacity. Cloud storage removes the
need to buy and manage your own data storage infrastructure, giving you agility, scalability,
and durability, with anytime, anywhere data access.

Why is cloud storage important?

Cloud storage delivers cost-effective, scalable storage. You no longer need to worry about
running out of capacity, maintaining storage area networks (SANs), replacing failed devices,
adding infrastructure to scale up with demand, or operating underutilized hardware when
demand decreases. Cloud storage is elastic, meaning you scale up and down with demand
and pay only for what you use. It is a way for organizations to save data securely online so
that it can be accessed anytime from any location by those with permission.

Whether you are a small business or a large enterprise, cloud storage can deliver the agility,
cost savings, security, and simplicity to focus on your core business growth. For small
businesses, you no longer must worry about devoting valuable resources to manage storage
yourself, and cloud storage gives you the ability to scale as the business grows.

Cost effectiveness

With cloud storage, there is no hardware to purchase, no storage to provision, and no extra
capital being used for business spikes. You can add or remove storage capacity on demand,
quickly change performance and retention characteristics, and only pay for storage that you
use. As data becomes infrequently and rarely accessed, you can even automatically move it
to lower-cost storage, thus creating even more cost savings. By moving storage workloads
from on premises to the cloud, you can reduce total cost of ownership by removing
overprovisioning and the cost of maintaining storage infrastructure.

Increased agility

With cloud storage, resources are only a click away. You reduce the time to make those
resources available to your organization from weeks to just minutes. This results in a dramatic
increase in agility for your organization. Your staff is largely freed from the tasks of
procurement, installation, administration, and maintenance. And because cloud storage
integrates with a wide range of analytics tools, your staff can now extract more insights from
your data to fuel innovation.

Faster deployment

When development teams are ready to begin, infrastructure should never slow them down.
Cloud storage services allow IT to quickly deliver the exact amount of storage needed,
whenever and wherever it's needed. Your developers can focus on solving complex
application problems instead of having to manage storage systems.

Efficient data management

By using cloud storage lifecycle management policies, you can perform powerful information
management tasks including automated tiering or locking down data in support of compliance
requirements. You can also use cloud storage to create multi-region or global storage for your
distributed teams by using tools such as replication. You can organize and manage your data
in ways that support specific use cases, create cost efficiencies, enforce security, and meet
compliance requirements.

Virtually unlimited scalability

Cloud storage delivers virtually unlimited storage capacity, allowing you to scale up as much
and as quickly as you need. This removes the constraints of on-premises storage capacity. You
can efficiently scale cloud storage up and down as required for analytics, data lakes, backups,
or cloud native applications. Users can access storage from anywhere, at any time, without
worrying about complex storage allocation processes, or waiting for new hardware.

Business continuity

Cloud storage providers store your data in highly secure data centers, protecting your data
and ensuring business continuity. Cloud storage services are designed to handle concurrent
device failure by quickly detecting and repairing any lost redundancy. You can further protect
your data by using versioning and replication tools to recover from both unintended user
actions or application failures more easily.

How does cloud storage work?

Cloud storage is delivered by a cloud services provider that owns and operates data storage
capacity by maintaining large datacentres in multiple locations around the world. Cloud
storage providers manage capacity, security, and durability to make data accessible to your
applications over the internet in a pay-as-you-go model. Typically, you connect to the storage
cloud either through the internet or through a dedicated private connection, using a web
portal, website, or a mobile app. When customers purchase cloud storage from a service
provider, they turn over most aspects of the data storage to the vendor, including capacity,
security, data availability, storage servers and computing resources, and network data
delivery. Your applications access cloud storage through traditional storage protocols or
directly using an application programming interface (API). The cloud storage provider might
also offer services designed to help collect, manage, secure, and analyze data at a massive
scale.

What are the types of cloud storage?

There are three main cloud storage types: object storage, file storage, and block storage.
Each offers its own advantages and has its own use cases.
Object storage

Organizations must store a massive and growing amount of unstructured data, such as
photos, videos, machine learning (ML), sensor data, audio files, and other types of web
content, and finding scalable, efficient, and affordable ways to store them can be a challenge.
Object storage is a data storage architecture for large stores of unstructured data. Objects
store data in the format it arrives in and makes it possible to customize metadata in ways that
make the data easier to access and analyze. Instead of being organized in files or folder
hierarchies, objects are kept in secure buckets that deliver virtually unlimited scalability. It is
also less costly to store large data volumes.

File storage

File-based storage or file storage is widely used among applications and stores data in a
hierarchical folder and file format. This type of storage is often known as a network-attached
storage (NAS) server with common file level protocols of Server Message Block (SMB) used in
Windows instances and Network File System (NFS) found in Linux.

Block storage

Enterprise applications like databases or enterprise resource planning (ERP) systems often
require dedicated, low-latency storage for each host. This is analogous to direct-attached
storage (DAS) or a storage area network (SAN). In this case, you can use a cloud storage service
that stores data in the form of blocks. Each block has its own unique identifier for quick
storage and retrieval.

What cloud storage requirements should you consider?


Ensuring your company’s critical data is safe, secure, and available when needed is essential.
There are several fundamental requirements when considering storing data in the cloud.

Durability and availability

Cloud storage simplifies and enhances traditional data center practices around data durability
and availability. With cloud storage, data is redundantly stored on multiple devices across one
or more data centers.

Security

With cloud storage, you control where your data is stored, who can access it, and what
resources your organization is consuming at any given moment. Ideally, all data is encrypted,
both at rest and in transit. Permissions and access controls should work just as well in the
cloud as they do for on-premises storage.
What are cloud storage use cases?
Cloud storage has several use cases in application management, data management, and
business continuity. Let’s consider some examples below.

Analytics and data lakes

Traditional on-premises storage solutions can be inconsistent in their cost, performance, and
scalability — especially over time. Analytics demand large-scale, affordable, highly available,
and secure storage pools that are commonly referred to as data lakes.

Data lakes built on object storage keep information in its native form and include rich
metadata that allows selective extraction and use for analysis. Cloud-based data lakes can sit
at the center of multiple kinds of data warehousing and processing, as well as big data and
analytical engines, to help you accomplish your next project in less time and with more
targeted relevance.

Backup and disaster recovery

Backup and disaster recovery are critical for data protection and accessibility, but keeping up
with increasing capacity requirements can be a constant challenge. Cloud storage brings low
cost, high durability, and extreme scale to data backup and recovery solutions. Embedded
data management policies can automatically migrate data to lower-cost storage based on
frequency or timing settings, and archival vaults can be created to help comply with legal or
regulatory requirements. These benefits allow for tremendous scale possibilities within
industries such as financial services, healthcare and life sciences, and media and
entertainment that produce high volumes of unstructured data with long-term retention
needs.

Software test and development

Software test and development environments often require separate, independent, and
duplicate storage environments to be built out, managed, and decommissioned. In addition
to the time required, the up-front capital costs required can be extensive.

Many of the largest and most valuable companies in the world create applications in record
time by using the flexibility, performance, and low cost of cloud storage. Even the simplest
static websites can be improved at low cost. IT professionals and developers are turning to
pay-as-you-go storage options that remove management and scale headaches.

Cloud data migration

The availability, durability, and low cloud storage costs can be very compelling. On the other
hand, IT personnel working with storage, backup, networking, security, and compliance
administrators might have concerns about the realities of transferring large amounts of data
to the cloud. For some, getting data into the cloud can be a challenge. Hybrid, edge, and data
movement services meet you where you are in the physical world to help ease your data
transfer to the cloud.
Compliance

Storing sensitive data in the cloud can raise concerns about regulation and compliance,
especially if this data is currently stored in compliant storage systems. Cloud data compliance
controls are designed to ensure that you can deploy and enforce comprehensive compliance
controls on your data, helping you satisfy compliance requirements for virtually every
regulatory agency around the globe. Often through a shared responsibility model, cloud
vendors allow customers to manage risk effectively and efficiently in the IT environment, and
provide assurance of effective risk management through compliance with established, widely
recognized frameworks and programs.

Cloud-native application storage

Cloud-native applications use technologies like containerization and serverless to meet


customer expectations in a fast-paced and flexible manner. These applications are typically
made of small, loosely coupled, independent components called microservices that
communicate internally by sharing data or state. Cloud storage services provide data
management for such applications and provide solutions to ongoing data storage challenges
in the cloud environment.

Archive

Enterprises today face significant challenges with exponential data growth. Machine learning
(ML) and analytics give data more uses than ever before. Regulatory compliance requires long
retention periods. Customers need to replace on-premises tape and disk archive
infrastructure with solutions that provide enhanced data durability, immediate retrieval
times, better security and compliance, and greater data accessibility for advanced analytics
and business intelligence.

Hybrid cloud storage

Many organizations want to take advantage of the benefits of cloud storage, but have
applications running on premises that require low-latency access to their data, or need rapid
data transfer to the cloud. Hybrid cloud storage architectures connect your on-premises
applications and systems to cloud storage to help you reduce costs, minimize management
burden, and innovate with your data.

Database storage

Because block storage has high performance and is readily updatable, many organizations use
it for transactional databases. With its limited metadata, block storage is able to deliver the
ultra-low latency required for high-performance workloads and latency sensitive applications
like databases.

Block storage allows developers to set up a robust, scalable, and highly efficient transactional
database. As each block is a self-contained unit, the database performs optimally, even when
the stored data grows.
ML and IoT

With cloud storage, you can process, store, and analyze data close to your applications and
then copy data to the cloud for further analysis. With cloud storage, you can store data
efficiently and cost-effectively while supporting ML, artificial intelligence (AI), and advanced
analytics to gain insights and innovate for your business.

Is cloud storage secure?

Security is our number one priority at AWS. AWS pioneered cloud computing in 2006, creating
cloud infrastructure that allows you to securely build and innovate faster. With AWS, you
control where your data is stored, who can access it, and what resources your organization is
consuming at any given moment. Fine-grain identity and access controls combined with
continual monitoring for near real-time security information ensures that the right resources
have the right access, wherever your information is stored. On AWS, you will gain the control
and confidence you need to securely run your business with the most flexible and secure
cloud computing environment available. As a result, the most highly regulated organizations
in the world trust AWS, every day.

A Distributed File System (DFS)


A Distributed File System (DFS) as the name suggests, is a file system that is distributed on
multiple file servers or multiple locations. It allows programs to access or store isolated files
as they do with the local ones, allowing programmers to access files from any networ or
computer.
The main purpose of the Distributed File System (DFS) is to allows users of physically
distributed systems to share their data and resources by using a Common File System. A
collection of wor stations and mainframes connected by a Local Area Networ (LAN) is a
configuration on Distributed File System. A DFS is executed as a part of the operating system.
In DFS, a namespace is created and this process is transparent for the clients.
DFS has two components:
Location Transparency
Location Transparency achieves through the namespace component.
Redundancy
Redundancy is done through a file replication component.
In the case of failure and heavy load, these components together improve data availability by
allowing the sharing of data in different locations to be logically grouped under one folder,
which is nown as the “DFS root”.
It is not necessary to use both the two components of DFS together, it is possible to use the
namespace component without using the file replication component and it is perfectly
possible to use the file replication component without using the namespace component
between servers.
File system replication:
Early iterations of DFS made use of Microsoft’s File Replication Service (FRS), which allowed
for straightforward file replication between servers. The most recent iterations of the whole
file are distributed to all servers by FRS, which recognises new or updated files.
Features of DFS :
Transparency:
Structure transparency
There is no need for the client to now about the number or locations of file servers and the
storage devices. Multiple file servers should be provided for performance, adaptability, and
dependability.
Access transparency
Both local and remote files should be accessible in the same manner. The file system should
be automatically located on the accessed file and send it to the client’s side.
Naming transparency
There should not be any hint in the name of the file to the location of the file. Once a name is
given to the file, it should not be changed during transferring from one node to another.
Replication transparency
If a file is copied on multiple nodes, both the copies of the file and their locations should be
hidden from one node to another.
User mobility:
It will automatically bring the user’s home directory to the node where the user logs in.
Performance:
Performance is based on the average amount of time needed to convince the client requests.
This time covers the CPU time + time ta en to access secondary storage + networ access
time. It is advisable that the performance of the Distributed File System be similar to that of a
centralized file system.
Simplicity and ease of use:
The user interface of a file system should be simple and the number of commands in the file
should be small.
High availability:
A Distributed File System should be able to continue in case of any partial failures li e a lin
failure, a node failure, or a storage drive crash.
A high authentic and adaptable distributed file system should have different and independent
file servers for controlling different and independent storage devices.
Scalability:
Since growing the networ by adding new machines or joining two networ s together is
routine, the distributed system will inevitably grow over time. As a result, a good distributed
file system should be built to scale quic ly as the number of nodes and users in the system
grows. Service should not be substantially disrupted as the number of nodes and users grows.
High reliability:
The li elihood of data loss should be minimized as much as feasible in a suitable distributed
file system. That is, because of the system’s unreliability, users should not feel forced to ma e
bac up copies of their files. Rather, a file system should create bac up copies of ey files that
can be used if the originals are lost. Many file systems employ stable storage as a high-
reliability strategy.
Data integrity:
Multiple users frequently share a file system. The integrity of data saved in a shared file must
be guaranteed by the file system. That is, concurrent access requests from many users who
are competing for access to the same file must be correctly synchronized using a concurrency
control method. Atomic transactions are a high-level concurrency management mechanism
for data integrity that is frequently offered to users by a file system.
Security:
A distributed file system should be secure so that its users may trust that their data will be
ept private. To safeguard the information contained in the file system from unwanted &
unauthorized access, security mechanisms must be implemented.
Heterogeneity:
Heterogeneity in distributed systems is unavoidable as a result of huge scale. Users of
heterogeneous distributed systems have the option of using multiple computer platforms for
different purposes.
History:
The server component of the Distributed File System was initially introduced as an add-on
feature. It was added to Windows NT 4.0 Server and was nown as “DFS 4.1”. Then later on it
was included as a standard component for all editions of Windows 2000 Server. Client-side
support has been included in Windows NT 4.0 and also in later on version of Windows.
Properties:
File transparency: users can access files without nowing where they are physically stored on
the networ .
Load balancing: the file system can distribute file access requests across multiple computers
to improve performance and reliability.
Data replication: the file system can store copies of files on multiple computers to ensure that
the files are available even if one of the computers fails.
Security: the file system can enforce access control policies to ensure that only authorized
users can access files.
Scalability: the file system can support a large number of users and a large number of files.
Concurrent access: multiple users can access and modify the same file at the same time.
Fault tolerance: the file system can continue to operate even if one or more of its components
fail.
Data integrity: the file system can ensure that the data stored in the files is accurate and has
not been corrupted.
File migration: the file system can move files from one location to another without
interrupting access to the files.
Data consistency: changes made to a file by one user are immediately visible to all other users.
Support for different file types: the file system can support a wide range of file types, including
text files, image files, and video files.

Applications:
NFS
NFS stands for Networ File System. It is a client-server architecture that allows a computer
user to view, store, and update files remotely. The protocol of NFS is one of the several
distributed file system standards for Networ -Attached Storage (NAS).
CIFS
CIFS stands for Common Internet File System. CIFS is an accent of SMB. That is, CIFS is an
application of SIMB protocol, designed by Microsoft.
SMB
SMB stands for Server Message Bloc . It is a protocol for sharing a file and was invented by
IMB. The SMB protocol was created to allow computers to perform read and write operations
on files to a remote host over a Local Area Networ (LAN). The directories present in the
remote host can be accessed via SMB and are called as “shares”.
Hadoop
Hadoop is a group of open-source software services. It gives a software framewor for
distributed storage and operating of big data using the MapReduce programming model. The
core of Hadoop contains a storage part, nown as Hadoop Distributed File System (HDFS), and
an operating part which is a MapReduce programming model.
NetWare
NetWare is an abandon computer networ operating system developed by Novell, Inc. It
primarily used combined multitas ing to run different services on a personal computer, using
the IPX networ protocol.
Working of DFS:
There are two ways in which DFS can be implemented:
Standalone DFS namespace
It allows only for those DFS roots that exist on the local computer and are not using Active
Directory. A Standalone DFS can only be acquired on those computers on which it is created.
It does not provide any fault liberation and cannot be lin ed to any other DFS. Standalone DFS
roots are rarely come across because of their limited advantage.
Domain-based DFS namespace
It stores the configuration of DFS in Active Directory, creating the DFS namespace root
accessible at \\<domainname>\<dfsroot> or \\<FQDN>\<dfsroot>
Advantages:
DFS allows multiple user to access or store the data.
It allows the data to be share remotely.
It improved the availability of file, access time, and networ efficiency.
Improved the capacity to change the size of the data and improves the ability to exchange the
data.
Distributed File System provides transparency of data even if server or dis fails.
Disadvantages:
In Distributed File System nodes and connections needs to be secured therefore we can say
that security is at sta e.
There is a possibility of loss of messages and data in the networ while movement from one
node to another.
Database connection in case of Distributed File System is complicated.
Also handling of the database is not easy in Distributed File System as compared to a single
user system.
There are chances that overloading will ta e place if all nodes try to send data at once.
Introduction to Hadoop Distributed File System (HDFS)

With growing data velocity the data size easily outgrows the storage limit of a machine. A
solution would be to store the data across a networ of machines. Such filesystems are called
distributed filesystems. Since data is stored across a networ all the complications of a
networ come in.
This is where Hadoop comes in. It provides one of the most reliable filesystems. HDFS (Hadoop
Distributed File System) is a unique design that provides storage for extremely large files with
streaming data access pattern and it runs on commodity hardware. Let’s elaborate the terms:
Extremely large files: Here we are tal ing about the data in range of petabytes (1000 TB).
Streaming Data Access Pattern: HDFS is designed on principle of write-once and read-many-
times. Once data is written large portions of dataset can be processed any number times.
Commodity hardware: Hardware that is inexpensive and easily available in the mar et. This
is one of feature which specially distinguishes HDFS from other file system.
Nodes: Master-slave nodes typically forms the HDFS cluster.
NameNode(MasterNode):
Manages all the slave nodes and assign wor to them.
It executes filesystem namespace operations li e opening, closing, renaming files and
directories.
It should be deployed on reliable hardware which has the high config. not on commodity
hardware.
DataNode(SlaveNode):
Actual wor er nodes, who do the actual wor li e reading, writing, processing etc.
They also perform creation, deletion, and replication upon instruction from the master.
They can be deployed on commodity hardware.
HDFS daemons: Daemons are the processes running in bac ground.
Namenodes:
Run on the master node.
Store metadata (data about data) li e file path, the number of bloc s, bloc Ids. etc.
Require high amount of RAM.
Store meta-data in RAM for fast retrieval i.e to reduce see time. Though a persistent copy of
it is ept on dis .
DataNodes:
Run on slave nodes.
Require high memory as data is actually stored here.
Data storage in HDFS: Now let’s see how the data is stored in a distributed manner.
Lets assume that 100TB file is inserted, then masternode(namenode) will first divide the file
into bloc s of 10TB (default size is 128 MB in Hadoop 2.x and above). Then these bloc s are
stored across different datanodes(slavenode). Datanodes(slavenode)replicate the bloc s
among themselves and the information of what bloc s they contain is sent to the master.
Default replication factor is 3 means for each bloc 3 replicas are created (including itself). In
hdfs.site.xml we can increase or decrease the replication factor i.e we can edit its
configuration here.
Note: MasterNode has the record of everything, it nows the location and info of each and
every single data nodes and the bloc s they contain, i.e. nothing is done without the
permission of masternode.
Why divide the file into blocks?
Let’s assume that we don’t divide, now it’s very difficult to store a 100 TB file on a single
machine. Even if we store, then each read and write operation on that whole file is going to
ta e very high see time. But if we have multiple bloc s of size 128MB then its become easy
to perform various read and write operations on it compared to doing it on a whole file at
once. So we divide the file to have faster data access i.e. reduce see time.
Why replicate the blocks in data nodes while storing?
Let’s assume we don’t replicate and only one yellow bloc is present on datanode D1. Now if
the data node D1 crashes we will lose the bloc and which will ma e the overall data
inconsistent and faulty. So we replicate the bloc s to achieve fault-tolerance.

Features of HDFS:
 Distributed data storage.
 Bloc s reduce see time.
 The data is highly available as the same bloc is present at multiple data nodes.
 Even if multiple data nodes are down, we can still do our wor , thus ma ing it highly
reliable.
 High fault tolerance.

Limitations of HDFS:
Low latency data access: Applications that require low-latency access to data i.e in the range
of milliseconds will not wor well with HDFS, because HDFS is designed eeping in mind that
we need high-throughput of data even at the cost of latency.
Small file problem: Having lots of small files will result in lots of see s and lots of movement
from one datanode to another datanode to retrieve each small file, this whole process is a
very inefficient data access pattern.

Cephfs
What is Ceph Storage?
Red Hat Ceph is essentially an open-source software that aims to facilitate highly scalable
object, bloc and file-based storage under one comprehensive system. As a powerful storage
solution, Ceph uses its own Ceph file system (CephFS) and is designed to be self-managed and
self-healing. It is equipped to deal with outages on its own and constantly wor s towards
reducing administration costs.
Another highlight of Ceph storage is that it is quite fault-tolerant and becomes so by easily
replicating data. What this means is that there are no bottlenec s as such in the process while
Ceph is operating.
Since its launch, there have been more than 15 Ceph releases, with Red Hat announcing a
major update recently as Red Hat Storage 4, which brings an array of improvements in
monitoring, scalability, management, and security improvements, thus ma ing it easier for
enterprises to get started with Ceph.
Features of Ceph are:
– High scalability
– Open-source
– High reliability through distributed data storage
– Robust data security through redundant storage
– Advantage of continuous memory allocation
– Convenient software-based increase in availability via an integrated algorithm for locating
data

Understanding the Working of Ceph Block Storage


Ceph primarily uses a Ceph bloc device, a virtual dis that can be attached to virtual machines
or bare-metal Linux-based servers.
One of the ey components in Ceph is RADOS (Reliable Autonomic Distributed Object Store),
which offers powerful bloc storage capabilities such as replication and snapshots which can
be integrated with OpenStac Bloc Storage.
Apart from this, Ceph also uses POSIX (Portable Operating System Interface), a robust Ceph
file system to store data in their storage clusters. The advantage is that the file system uses
the same clustered system as Ceph bloc storage and object storage to store a massive
amount of data.
Ceph Storage Architecture
Ceph needs several computers connected to one another in what is nown as a cluster. Each
of these connected computers within that networ is nown as a node.
Below are some of the tasks which must be distributed among the nodes within the
network:
Monitor nodes (ceph-mon): These cluster monitors are mainly to monitor the status of
individual nodes in the cluster, especially the object storage devices, managers and metadata
servers. To ensure maximum reliability, it is recommended to have at least three monitor
nodes.
Object Storage Devices (ceph-osd): Ceph-OSDs are bac ground applications for actual data
management and are responsible for storage, duplication, and data restoration. For a cluster,
it is recommended to have at least three -ODSs.
Managers (ceph-mgr): They wor in tandem with ceph monitors to manage the status of
system load, storage usage, and the capacity of the nodes.
Metadata servers (ceph-MDS): They help to store metadata such as file names, storage paths
and timestamps of files stored in the CephFS for several performance reasons.
The crux of the Ceph data storage is an algorithm called CRUSH (Controlled Replication Under
Scalable Hashing) that uses CRUSH Map-an allocation table- to find an OSD with the requested
file. CRUSH chooses the ideal storage location based on fixed criteria, post which of the files
are duplicated and then subsequently save them on physically separate media. The
administrator of the networ can set the relevant criteria.
The base of the Ceph data storage architecture is nown as RADOS, a completely reliable,
distributed object store composed of self-mapping, intelligent storage nodes.
Some of the ways to access stored data in Ceph include:
radosgw: In this gateway, data can either be read or written using the HTTP Internet protocol.
librados: Native access to stored data is possible using the librados software libraries through
APIs in programming and different scripting languages including Python, Java, C/C++ and PHP.
RADOS Block Device: Data access here requires integration using a virtual system such as
QEMU/ KVM or bloc storage through a ernel module.
Ceph Storage Performance
Ceph brings in various benefits to OpenStac -based private clouds. To understand Ceph
storage performance better, here is a loo at a few of them.
High availability & enhanced performance
The coding erasure feature of Ceph improves data availability manifolds simply by adding
resiliency and durability. At times, the writing speeds can almost be twice as high as the
previous bac end.
Robust security
Active directory integration, encryption features, LDAP etc., are some of the highlights in place
with Ceph which can limit unwanted access into the system.
Seamless adoption
Ma ing a shift to software-defined storage platforms can sometimes be complicated. Ceph
overcomes the issue by allowing bloc and object storage in the same cluster without you
having to worry about administering separate storage services using other APIs.
Cost-effectiveness
Ceph runs on commodity hardware, ma ing it a cost-effective solution without needing any
expensive and extra hardware.
Use Cases of Ceph Block and Ceph Object Storage
Use cases of Ceph Block
Deploy elastic bloc storage with on-premise cloud
Storage for smooth running VM dis volumes
Storage for SharePoint, S ype and other collaboration applications
Primary storage for MY-SQL and other similar SQL database apps
Dev/Test Systems
Storage for IT management apps

Use cases of Ceph Object Storage


VM dis volume snapshots
Video/audio/image repository services
ISO image store and repository service
Bac up and archive
Deploy services li e Dropbox within the enterprise
Deploy Amazon s3 object store li e services with on-premise cloud

Advantages and Disadvantages of Ceph Storage


Advantages

 Despite its limited development history, Ceph is free and is an established storage
method.
 The application has been extensively and well-documented by the manufacturer.
 A great deal of helpful information is available online for Ceph regarding its setup and
maintenance.
 The scalability and integrated redundancy of Ceph storage ensure data security and
flexibility within the networ .
 CRUSH algorithm of Ceph ensures high availability

Disadvantages

 To be able to fully use all Ceph’s functionalities, a comprehensive networ is required


due to the variety of components being provided.
 The set-up of Ceph storage is relatively time-consuming, and sometimes the user
cannot be entirely sure where the data is physically being stored.
 It requires additional engineering oversight to implement and manage

Why is Ceph Storage Not Enough for Modern Workloads?


While there is no denying the fact that Ceph storage is highly scalable and a one-size-fits-all
solution, it has some inherent architectural loopholes, mainly because it was not created for
today’s fast storage media–NAND flash and NVMe® flash.
Ceph storage is not suitable for modern workloads for the following reasons:
1. Enterprises either wor ing with the public cloud, using their own private cloud or moving
to modern applications require low latency and consistent response times. While BlueStore (a
bac -end object store for Ceph OSDs) helps to improve average and tail latency to an extent,
it cannot necessarily ta e advantage of the benefits of NVMe® flash.
2. Modern wor loads typically deploy local flash, (local NVMe® flash), on bare metal to get
the best possible performance and Ceph is not equipped to realize the optimized performance
of this new media. In fact, Ceph in a Kubernetes environment where local flash is
recommended, can be an order of magnitude slower than local flash.
3. Ceph has a comparatively poor flash utilization (15-25%). In case of a failure with Ceph or
the host, the rebuild time for shared storage needs can be very slow because of massive traffic
going over the networ for a long period.
DBaaS
There are two cloud database environment models – the traditional cloud model and
Database-as-a-Service (DBaaS).
In the traditional cloud model, the content database runs on the company’s infrastructure
and any oversight falls into the hands of the IT manager and team.
DBaaS runs on the service vendor’s infrastructure and they are responsible for any hitches or
glitches should they occur. The user can fully focus on operations, development, and business
goals.
Advantages of Working with Cloud Databases
Less Dependence on Hardware – With the cloud service provider covering the maintenance
and infrastructure aspects, companies can now invest less in hardware and resources, as well
as IT expenditure. There are also fewer complications and conflicts that often hinder
development.
Enhanced Scalability – Wor ing with a DBaaS allows for seamless and smooth scalability
during pea times or ahead of big releases with tight deadlines. This is a huge benefit for
growing companies that may not possess the budget and resources for on-premise
infrastructure.
Value for Money - Not worrying about operational costs or costly upgrades is only the tip of
the iceberg when it comes to cloud databases. Most DBaaS solutions are available in multiple
configurations today, something that ma es it easier for companies to pay for only what they
use.
Enjoy the Latest Technology - Companies no longer need to worry about shelling money on
buying new technologies because updated infrastructure is the headache (and sole
responsibility) of the cloud vendor. Companies also don’t need to hire dedicated staff for
training and onboarding purposes.
Security– Just li e the previous advantages, all top vendors today ta e care of the security
aspect and invest in the best available solutions to eep the databases safe. No solution is
bullet-proof, but it is proving to be a safer way to protect sensitive data and information with
less margin for error.
Top 7 Cloud Databases
1.Amazon Web Service (AWS)
Amazon has become the mar et leader in the DBaaS space. It offers supplementary data-
management services such as Redshift, a data warehouse and Data Pipeline, which is a data
integrating service for easier data management. Amazon’s current offerings include:
Amazon RDS– Amazon’s Relational Database Service runs on either Oracle, SQL or MySQL
server instances.
Amazon SimpleDB – This is primarily a schema-less database that is meant to handle smaller
wor loads.
Amazon DynamoDB – This fall on the NoSQL databases (SSD), capable of automatically
replicating wor loads across three availability zones.
Strengths: Lots of Features, Easy to Use, Good Support and Documentation
Weaknesses: Not Too Customizable, Downtimes as per Amazon’s Schedule
2.Oracle Database
Oracle Database provides companies with enterprise-scale database technology stored in the
cloud. Despite its first offering being quite comprehensive, the Generation 2 offering has
consistently higher performance with extensive governance and security controls.
Data migration is also covered with a dedicated solution and tight customer support in case
any technical issues or questions arise.
Strengths: Intuitive Interface, Easy to Use, Solid Customer Support
Weaknesses: No Free Version, No Mobile Access, Pricey for Small Companies
3. Microsoft Azure
In a nutshell, Azure is a cloud computing platform for VM creation, building and running web-
based applications, smart client applications, and XML web services. It currently boasts the
biggest and strongest global infrastructure, with 55 regions, more than any other cloud
provider.
Strengths: Comprehensive Solution, Good Security, Strong Ecosystem
Weaknesses: Iffy Customer Service, Not User Friendly
4.Google Cloud Platform
Surprisingly, Google is still playing catch-up with the big players in the mar et. But its solutions
are being adopted by more and more businesses of different sizes, than s to its no-nonsense
approach and comprehensive documentation which reduces stress on developers, IT
professionals, and other sta eholders.
The broad open-source compatibility also has its fair share of benefits, allowing you to scale
while doing more with analytics and integrations.
Strengths: Comprehensive Documentation, Good for Small and Big Businesses
Weaknesses: Not Yet at the Level of the Big Three (AWS, Oracle, Azure)
5.IBM DB2
This is a relational database that delivers advanced data management and analytics
capabilities for transactional and warehousing wor loads. IBM DB2 is designed to deliver high
performance, actionable insights, data availability and reliability, and it is supported across
Linux, Unix, and Windows.
However, it has fewer regional options, which may impact performance and compliance
requirements depending on your development project/s.
Strengths: Well Designed Product, Easy Migration Process
Weaknesses: Average Customer Service, Pricey, Mediocre Functionality
6.MongoDB Atlas
MongoDB Atlas is a popular open-source NoSQL database that offers powerful scaling,
sharding, and automation capabilities. Another advantage is that most developers using this
can speed through continuous delivery models without any database administrator (DBA)
hand holding.
On the negative side, some applications require SQL databases to function, which
automatically eliminates MongoDB Atlas from consideration.
Strengths: Strong Support Community, Quic Installation, Flexibility
Weaknesses: NoSQL Only, Can be Challenging for New/Inexperienced Devs
7.OpenStack
Another interesting open-source rival for Google is OpenStac . These databases come in
managed or hosted cloud databases. Rac space is highly customizable and its architecture is
easy to understand and implement. Many reviews have complimented the scaling capabilities
of this solution.
The OpenStac community collaborates around a six-month, time-based release cycle with
frequent development milestones.
Strengths: Good Value for Money, Easy to Use
Weaknesses: Cumbersome Interface, Some Stability Issues

What to look for when selecting a cloud database

There are many vendors and options available to organizations looking for a cloud database
solution for their enterprise. You will want to select a model that works best for your specific
business needs. The following are some key features to look for from any cloud database:

Performance

 Online and independent scaling of compute and storage, patching, and upgrade—
with uninterrupted data availability to applications—will ensure that your
database’s capacity meets your enterprise’s needs as they fluctuate, without
interrupting operations. Automated and online performance optimization, such as
auto-indexing, is a must. You’ll also want scale-out clustering for both read and
write to ensure that your mission-critical, real-time workloads run seamlessly.
Security

 Robust security features are paramount. Any database model you select should be
able to perform data encryption at rest and in flight and provide automated
security updates. It’s also essential to ensure a strict separation of duties so
operations cannot access customer data. External attack detection and prevention
driven by machine learning provides an additional layer of real-time security.
Lastly, for your most business-critical applications, you will want a dedicated cloud
infrastructure that includes hardware isolation from other tenants.

 Other qualities to look for include lower high-availability costs, and industry-
leading flashback technologies to help provide protection from user errors. Finally,
your database should have broad compatibility with third-party applications.

Migrate Your Database from On-Premises to the Cloud

Migrating a database to the cloud might sound like a daunting task, but it does not have to
be. Advance planning is the key. It is also important to remember that not all migration
methods apply to every scenario.

There are several factors to consider when choosing a migration method—including data
types, host operating systems, and database versioning. Here are a few things to think about
and prepare for as you approach the migration of your databases to the cloud.

 Is the target cloud database software compatible with what you are running on-
premises?
 Is the version compatible?
Some cloud providers do not offer database services that are compatible with on-
premises versions. Also, if your target cloud database only supports a higher
version of the software you are using, you must plan for an upgrade.

 What is the size and scale of your database, and does the target cloud support this
configuration?
Some cloud providers only offer smaller database configurations in terms of storage size
and number of cores. You will want to make sure in advance that your provider has the
capacity to meet your needs.
 Do you run adjacent scripts on the database servers themselves? If so, you would
need to contract for infrastructure as a service (IaaS) or automated services—and
these might not be available through your cloud provider.
 Do you need to migrate with little or no downtime to your existing
application? Leading cloud database providers, like Amazon, Microsoft, and
Oracle, are making database selection and migration easier than ever. Depending
on the circumstances, migrating to the cloud can take place in a matter of minutes.
Make migrating to a cloud database seamless

Oracle’s automated tools allow you to seamlessly move your on-premises database to Oracle
Cloud with virtually no downtime at all, because Oracle Cloud uses the same standards,
products, and skills you currently use on-premises.

Object Storage
What is object storage?
Object storage is a technology that stores and manages data in an unstructured format called
objects. Modern organizations create and analyze large volumes of unstructured data such as
photos, videos, email, web pages, sensor data, and audio files. Cloud object storage systems
distribute this data across multiple physical devices but allow users to access the content
efficiently from a single, virtual storage repository. Object storage solutions are ideal for
building cloud native applications that require scale and flexibility, and can also be used to
import existing data stores for analytics, backup, or archive.

Metadata is critical to object storage technology. With object storage, objects are kept in a
single bucket and are not files inside of folders. Instead, object storage combines the pieces
of data that make up a file, adds all the user-created metadata to that file, and attaches a
custom identifier. This creates a flat structure, called a bucket, as opposed to hierarchical or
tiered storage. This lets you retrieve and analyze any object in the bucket, no matter the file
type, based on its function and characteristics.

Object storage is the ideal storage for data lakes because it delivers an architecture for large
amounts of data, with each piece of data stored as an object, and the object metadata
provides a unique identifier for easier access. This architecture removes the scaling limitations
of traditional storage, and is why object storage is the storage of the cloud.

The major benefits of object storage are the virtually unlimited scalability and the lower cost
of storing large volumes of data for use cases such as data lakes, cloud native applications,
analytics, log files, and machine learning (ML). Object storage also delivers greater data
durability and resiliency because it stores objects on multiple devices, across multiple
systems, and even across multiple data centers and regions. This allows for virtually unlimited
scale and also improves resilience and availability of the data.

Why is object storage important?


As businesses grow, they're managing rapidly expanding but isolated pools of data from many
sources that are used by any number of applications and business processes and end users.
Today, much of this data is unstructured and ends up in multiple different formats and storage
media, and does not easily fit into a central repository. This adds complexity, and slows down
innovation because data is not accessible to be used for analysis, machine learning (ML), or
new cloud native applications. Object storage helps break down these silos by providing
massively scalable, cost-effective storage to store any type of data in its native format. Object
storage removes the complexity, capacity constraints, and cost barriers that plague
traditional storage systems because object storage delivers unlimited scalability at low per-
gigabyte prices.
You can manage unstructured data in one place with an user-friendly application interface.
You can use policies to optimize data storage costs and automatically switch your storage tier
when necessary. Cloud object storage makes it easier to perform analysis and gain insights,
allowing for faster decision-making.

While objects can be stored on premises, object storage is built for the cloud and delivers
virtually unlimited scalability, high durability, and cost-effectiveness. With cloud object
storage, data is readily accessible from anywhere.

What are the use cases for object storage?


Customers use object storage for a wide variety of solutions. Here are common use cases.

Analytics

You can collect and store virtually unlimited data of any type in cloud object storage and
perform big data analytics to gain valuable insights about your operations, customers, and
the market you serve.

Data lake

A data lake uses cloud object storage as its foundation because it has virtually unlimited
scalability and high durability. You can seamlessly and nondisruptively increase storage from
gigabytes to petabytes of content, paying only for what you use. It has scalable performance,
ease-of-use features, native encryption, and access control capabilities.

Cloud-native application data

Cloud-native applications use technologies like containerization and serverless to meet


customer expectations in a fast-paced and flexible manner. These applications are typically
made of small, loosely coupled, independent components called microservices that
communicate internally by sharing data or state. Cloud storage services provide data
management for such applications and provide solutions to ongoing data storage challenges
in the cloud environment. Object storage allows you to add any amount of content and access
it from anywhere, so you can deploy applications faster and reach more customers.

Data archiving

Cloud object storage is excellent for long-term data retention. You can use it to replace on-
premises tape and disk archive infrastructure with solutions that provide enhanced data
durability, immediate retrieval times, better security and compliance, and greater data
accessibility for advanced analytics and business intelligence. You can also cost-effectively
archive large amounts of rich media content and retain mandated, regulatory data for
extended periods of time.

Rich media

Accelerate applications and reduce the cost of storing rich media files such as videos, digital
images, and music. With object storage you can create cost-effective, globally replicated
architecture to deliver media to distributed users by using storage classes and replication
features.
Backup and recovery

You can configure object storage systems to replicate content so that if a physical device fails,
duplicate object storage devices become available. This ensures that your systems and
applications continue to run without interruption. You can also replicate data across multiple
datacenters and geographical regions.

ML

In machine learning (ML), you “teach” a computer to ma e predictions or inferences. You use
algorithms to train models and then integrate the model into your application to generate
inferences in real time and at scale. Machine learning requires object storage because of the
scale and cost efficiency, as a production model typically learns from millions to billions of
example data items and produces inferences in as little as 20 milliseconds.

How does cloud object storage compare to other types of storage?


There are three types of cloud storage: object, file, and block. Each is ideal for specific use
cases and storage requirements.

File storage

Many applications need shared file access. This has been traditionally served by network-
attached storage (NAS) services. Common file level protocols consist of Server Message Block
(SMB) used with Windows servers and Network File Systems (NFS) found in Linux instances.
File storage is suited for unstructured data, large content repositories, media stores, home
directories and other file-based data.

Comparing object storage and file storage

The primary differences between object and file storage are data structure and scalability.
File storage is organized into hierarchy with directories and folders. File storage also follows
strict file protocols, such as SMB, NFS, or Lustre. Object storage uses a flat structure with
metadata and a unique identifier for each object that makes it easier to find among potentially
billions of other objects.

With these differences in structure, file storage and object storage have different capacity to
scale. Object storage offers near-infinite scaling, to petabytes and billions of objects. Because
of the inherent hierarchy and pathing, file storage hits scaling constraints.

Block storage

Enterprise applications like databases or ERP systems often require dedicated, low-latency
storage for each host. This is analogous to direct-attached storage (DAS) or a storage area
network (SAN). Block-based cloud storage solutions are provisioned with each virtual server
and offer the ultra-low latency required for high-performance workloads.
Comparing object storage and block storage

Object storage is best used for large amounts of unstructured data, especially when durability,
unlimited storage, scalability, and complex metadata management are relevant factors for
overall performance.

Block storage provides low latency and high-performance values in various use cases. Its
features are primarily useful for structured database storage, VM file system volumes, and
high volumes of read and write loads.

How can AWS help with your cloud object storage needs?
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-
leading scalability, data availability, security, and performance. Customers of all sizes and
industries can use Amazon S3 to store and protect any amount of data for a range of use
cases, such as data lakes, websites, mobile applications, backup and restore, archive,
enterprise applications, IoT devices, and big data analytics. Amazon S3 provides management
features so that you can optimize, organize, and configure access to your data to meet your
specific business, organizational, and compliance requirements. The following are some
examples of Amazon S3 benefits.

Durability, availability, and scalability

Amazon S3 was built from the ground up to deliver 99.999999999% (11 9s) of data durability.
With Amazon S3, your objects are redundantly stored on multiple devices across a minimum
of three Availability Zones (AZs) in an Amazon S3 Region. Amazon S3 is designed to sustain
concurrent device failures by quickly detecting and repairing any lost redundancy, and it also
regularly verifies the integrity of your data using checksums.

Security and compliance

Amazon S3 protects your data with security, compliance, and audit capabilities. Amazon S3 is
secure by default. Upon creation, only you have access to Amazon S3 buckets that you create,
and you have complete control over who has access to your data. Amazon S3 supports user
authentication to control access to data. You can use access control mechanisms such as
bucket policies to selectively grant permissions to users and groups of users. Additionally, S3
maintains compliance programs, such as PCI DSS, HIPAA/HITECH, FedRAMP, SEC Rule 17 a-4,
EU Data Protection Directive, and FISMA, to help you meet regulatory requirements. AWS
also supports numerous auditing capabilities to monitor access requests to your Amazon S3
resources.

Flexible management

AWS offers the most flexible set of storage management and administration capabilities.
Storage administrators can classify, report, and visualize data usage trends to reduce costs
and improve service levels. Objects can be tagged with unique, customizable metadata so you
can see and control storage consumption, cost, and security separately for each workload.
The S3 Inventory tool delivers scheduled reports about objects and their metadata for
maintenance, compliance, or analytics operations. Amazon S3 can also analyze object access
patterns to build lifecycle policies that automate tiering, deletion, and retention. Finally, since
Amazon S3 works with AWS Lambda, customers can log activities, define alerts, and invoke
workflows, all without managing any additional infrastructure.

Cost-effective storage classes

Amazon S3 offers a range of storage classes that you can choose from based on data access,
resiliency, and cost requirements of your workloads. Amazon S3 storage classes are purpose-
built to provide the lowest cost storage for different access patterns. You pay only for what
you use. The rate you’re charged depends on the size of your objects, how long you stored
the objects during the month, and your chosen storage class. Find the best Amazon S3 storage
class for your workload.

Efficient analytics

Amazon S3 is the only cloud storage platform that lets customers run sophisticated analytics
on their data without requiring them to extract and move the data to a separate analytics
database. Customers with knowledge of SQL can use Amazon Athena to analyze vast amounts
of unstructured data in Amazon S3 on-demand. With Amazon Redshift Spectrum, customers
can run sophisticated analytics against exabytes of data in Amazon S3 and run queries that
span both the data you have in Amazon S3 and in your Amazon Redshift data warehouses.

You might also like