0% found this document useful (0 votes)
61 views

Unit 1

The document discusses cloud data management systems and provides an overview of the course syllabus which covers topics such as introduction to distributed file systems and cloud computing, cloud data management techniques and applications, cloud database platforms, and cloud data security and privacy. The syllabus also lists some textbooks and references.

Uploaded by

Giridhar Boyina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Unit 1

The document discusses cloud data management systems and provides an overview of the course syllabus which covers topics such as introduction to distributed file systems and cloud computing, cloud data management techniques and applications, cloud database platforms, and cloud data security and privacy. The syllabus also lists some textbooks and references.

Uploaded by

Giridhar Boyina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

CSE – 416

CLOUD DATA MANAGEMENT


SYSTEMS

Dr. Sambit Kumar Mishra


Syllabus

UNIT I
Introduction to Distributed File Systems and Cloud: Introduction to Distributed
File Systems, Cloud Computing, Cloud Data Management and its Goals &
Challenges, Models of Cloud Data Management, Cloud Data Management Basics,
Cloud Data Storage, Reasons to Use Cloud Data Management.
UNIT II
Cloud Data Management & its Applications: Large data processing using Map-
Reduce; big data technologies and tools; data modelling, storage, indexing, and
query processing for big data; key-value storage systems, columnar databases,
NoSQL systems; big data applications. Multi-tenant database systems: Multitenancy,
Scalable, Consistent, database elasticity in the cloud.
UNIT III
Azure database service platform: Understanding the Service, Designing SQL
Database, Migrating an Existing Database, Using SQL Database, Scaling SQL
Database, Governing SQL Database. MySQL and PostgreSQL.
Syllabus

UNIT IV
Cloud Data Management Techniques: Hybrid cloud features, migrate databases to
Azure IaaS, Run SQL Server on Microsoft Azure Virtual Machines, Considerations on
High Availability and Disaster Recovery Options with SQL Server on Hybrid Cloud
and Azure IaaS, Working with NoSQL Alternatives.

UNIT V
Cloud Data Security and Privacy: Aspects of Data Security, Defining Organizational
Cloud Security Responsibilities, Assessing Risk in the Cloud, Existing Security Tools,
Building a Security Strategy.
Syllabus
TEXTBOOKS
1. Faithe Wempen. Cloud Data Management For Dummies®, Druva Special Edition. John Wiley & Sons, Inc.,
2017.
2. Lawrence Miller. Cloud Security & Compliance For Dummies®, Palo Alto Networks® Special Edition. John
Wiley & Sons, Inc., 2019.
3. Data management in the cloud: challenges and opportunities: Divyakant Agrawal, Sudipto das, Amr EI
Abbadi, 2013.
4. Cloud data design, Orchestration and Management using Microsoft Azure, Francesco Diaz Roberto Freato,
Apress, Springer publications, 2018.

REFERENCES
5. Andrew S. Tanenbaul, Maarten Van Steen, Distributed Systems, Principles and Paradigms, Pearson
publications, 2nd edition.
6. Cloud data design, Orchestration and Management using Microsoft Azure, Francesco Diaz Roberto Freato,
Apress, Springer publications, 2018.
7. Cloud database development and Management, Lee chao, CRC Press, Taylor and Francis group. 2014.
8. Cloud data management, Liang Zhao, Sherif Sakr, Anna Liu, Athman Bouguettaya, Springer publications,
2014.
Unit - 1
Introduction to
Distributed File
Systems and Cloud
Storage Models

• Various Types of Storage Models:


• Manual file system – paper ledgers
• Centralized file system
• Distributed file system (DFS)
• Decentralized file system
Distributed File System Requirements

• First needs were: access transparency and location transparency.


• Location Transparency: Components can be accessed without knowing where
they are physically located.
• Access transparency: The ways in which access takes place to local and
remote components are identical.
• Later on, performance, scalability, concurrency control, fault
tolerance and security requirements emerged and were met in the
later phases of DFS development.
• Concurrent file updates is protected (record locking).
Distributed file system (DFS)

• Distributed file systems allow multiple processes to share data over


long period of time in a secure and reliable way.
• The distributed file systems is considered as a paradigm for general
purpose distributed systems.
• A DFS enables programs to store and access remote files/storage
exactly as they do local ones.
• Distributed file system is implemented using client/server model.
Simple Distributed File System

Read (RPC)
Return (Data)
Client C)
(R P
i te Server cache
Wr
K
AC

Client
• Remote Disk: Reads and writes forwarded to server
• Advantage: Server provides completely consistent view of file system to multiple clients
• Problems? Performance!
• Going over network is slower than going to local memory
• Lots of network traffic/not well pipelined
• Server can be a bottleneck
DISTRIBUTED FILE SYSTEM ARCHITECTURE

Architecture

• Client-Server Architectures
• Cluster-Based Distributed File Systems
• Symmetric Architectures
Distributed Computing Vs Cloud Computing
Distributed computing Cloud computing
• Distributed computing is the use of • Cloud computing is the use of network
distributed systems to solve single large hosted servers to do several tasks like
problems by distributing tasks to single storage, process and management of
computers in the distributing systems. data.
• In simple, distributed computing can be • In simple, cloud computing can be said
said as a computing technique which allows as a computing technique that delivers
to multiple computers to communicate and hosted services over the internet to its
work to solve a single problem. users/customers.
• Distributed computing helps to achieve • Cloud computing provides services
computational tasks more faster than using such as hardware, software,
a single computer as it takes a lot of time. networking resources through internet.
• The goal of distributed computing is to • The goal of cloud computing is to
distribute a single task among multiple provide on demand computing services
computers and to solve it quickly by over internet on pay per use model.
maintaining coordination between them.
Cloud Computing

Cloud Computing is a service model that provides on demand services


to the user with minimal management efforts and the services are
regulated by Quality of Service (QoS) and Service Level Agreement
(SLA).
• Cloud is well known for Pay-as-you-go model (renting a property than
owning one).
• Cloud Computing is famous for its 5-4-3 model (National Institute of
Standards and Technology (NIST)).
• 5 Essential Characteristics
• 4 Deployment Models
• 3 Service Models
5-4-3 Model
Essential Characteristics

1. On Demand Self-Service
Customers can self-provision computing resources like server time, storage,
network, applications as per their demands without human intervention, i.e., cloud
service provider.
2. Broad Network Access
Computing resources are available over the network and can be accessed using
heterogeneous client platforms like mobiles, laptops, desktops, PDAs, etc.
3. Rapid Elasticity
Computing resources such as storage, processing, network, etc., are pooled to serve
multiple clients. For this, cloud computing adopts a multitenant model where the
computing resources of service providers are dynamically assigned to the customer
on their demand.

The customer is not even aware of the physical location of these resources.
However, at a higher level of abstraction, the location of resources can be specified.
Essential Characteristics

4. Resource Pooling
Computing resources for a cloud customer often appear limitless because
cloud resources can be rapidly and elastically provisioned. The resource can be
released at an increasingly large scale to meet customer demand.

Computing resources can be purchased at any time and in any quantity


depending on the customers' demand.
5. Measured Service
Monitoring and control of computing resources used by clients can be done by
implementing meters at some level of abstraction depending on the type of
Service.

The resources used can be reported with metering capability, thereby


providing transparency between the provider and the customer.
Deployment Models

1. Private Cloud
• A cloud environment deployed for the exclusive use of a single organization is
a private cloud. An organization can have multiple cloud users belonging to
different business units.
• Private cloud infrastructure can be either on or off, depending on the
organization need. The organization may unilaterally own and manage the
private cloud. It may assign this responsibility to a third party, i.e., cloud
providers, or a combination of both.
2. Public Cloud
• The cloud infrastructure deployed for the use of the general public is the
public cloud. This public cloud model is deployed by cloud vendors, Govt.
organizations, or both.
• The public cloud is typically deployed at the cloud vendor's premises.
Deployment Models

3. Community Cloud
• A cloud infrastructure shared by multiple organizations that form a
community and share common interests is a community cloud. Community
Cloud is owned, managed, and operated by organizations or cloud vendors,
i.e., third parties.
• Communications may take place on the premises of cloud community
organizations or the cloud provider's premises.
4. Hybrid Cloud
• Cloud infrastructure includes two or more distinct cloud models such as
private, public, and community, so that cloud infrastructure is a hybrid cloud.
• While these distinct cloud structures remain unique entities, they can be
bound together by specialized technology enabling data and application
portability.
Service Models

1. Software as a Service (SaaS)


• Here, the cloud service provider offers to use applications running on cloud infrastructure over the
Internet on a subscription basis. Service providers provide servers, storage, networks, virtualization,
operating systems, running environments, and software with this capability.
• Users can access cloud applications on or off-premises. The customer can extend or extend the offered
services based on their demands. The customer need not worry about the maintenance and updates as
it is the service provider's responsibility.
• Examples of SaaS are Google Dropbox, Microsoft OneDrive, and Slack.
2. Platforms as a Service (PaaS)
• Here, the cloud service providers provide their consumers with the infrastructure a runtime
environment that leverages web-based development and deployment of software or applications.
• The PaaS customer is not required to manage or control the cloud infrastructure, although they have
full control over the deployed software.
• The most popular PaaS services are Google App Engine, Windows Azure, and Heroku.
3. Infrastructure as a Service (IaaS)
• Here, the cloud service provider provides server, storage, network services to its end users through
virtualization. The consumer can access these virtualized computing resources over the Internet.
• The IaaS customer is not required to manage or control the cloud infrastructure, although the customer
has control over the run time environment, middleware, operating system, and deployed applications.
• Most popular IaaS services: Google Compute Engine, Rackspace, and Amazon Web Services (AWS).
Everything-as-a-service (XaaS)

• Also known as Anything-as-a-Service, XaaS facilitates the flexibility for


users and companies to customize their computing environments to
craft the experiences they desire, all on demand.
• XaaS is evolving from technology-as-a-service to business-as-a-
service.
Data as a service (DaaS)

• Data as a service (DaaS) is a data management strategy that uses the


cloud to deliver data storage, integration, processing, and/or
analytics services via a network connection.

• DaaS is similar to software as a service, or SaaS, a cloud computing


strategy that involves delivering applications to end-users over the
network, rather than having them run applications locally on their
devices. Just as SaaS removes the need to install and manage
software locally, DaaS outsources most data storage, integration, and
processing operations to the cloud.
Principles to scaleup cloud computing
Principles to scaleup cloud computing
1. Federation
• Cloud resources are always unlimited for customers, but each cloud has a limited capacity.
If customer demand continues to grow, the cloud will have to exceed its potential, for
which the form federation of service providers enables collaboration and resource sharing.
• A federated cloud must allow virtual applications to be deployed on federated sites.
Virtual applications should not be location-dependent and should be able to migrate easily
between sites.
2. Freedom
• Cloud computing services should provide end-users complete freedom that allows the
user to use cloud services without depending on a specific CSP.
• Even the CSP should be able to manage and control the computing service without sharing
internal details with customers or partners.
3. Isolation
• We are all aware that a CSP provides its computing resources to multiple end-users. The
end-user must be assured before moving his computing cloud that his data or information
will be isolated in the cloud and cannot be accessed by other members sharing the cloud.
Principles to scaleup cloud computing

4. Elasticity
• Cloud computing resources should be elastic, which means that the user
should be free to attach and release computing resources on their demand.
5. Business Orientation
• Companies must ensure the QoS that offer before moving mission-critical
applications to the cloud.
• The CSP should develop a mechanism to understand the exact business
requirement of the customer and customize the service parameters as per
the customer's requirement.
6. Trust
• Trust is the most important factor that drives any customer to move their
computing to the cloud. For the cloud to be successful, trust must be
maintained to create a federation between the cloud customer, the cloud
vendor, and the various cloud providers.
Advantages of Cloud Computing
Disadvantages of Cloud Computing

1. Internet Connectivity
2. Vendor lock-in
3. Limited Control
4. Security
Cloud Infrastructure
Virtualization

• Virtualization is the "creation of a virtual (rather than actual) version of


something, such as a server, a desktop, a storage device, an operating
system or network resources".
• The ability to run multiple operating systems on a single physical system
and shared the underlying hardware resources.
• Virtualization facilitates the running of several service requests on a single
physical resource concurrently.
• Virtual machine is the basic unit to execute a service request.
• Types of Virtualization:
1. Hardware Virtualization.
2. Operating system Virtualization.
3. Server Virtualization.
4. Storage Virtualization.
Hardware Virtualization

• When the virtual machine software or virtual machine manager


(VMM) is directly installed on the hardware system is known as
hardware virtualization.
• The main job of hypervisor is to control and monitoring the
processor, memory and other hardware resources.
• After virtualization of hardware system, we can install different
operating system on it and run different applications on those OS.

• Usage: Hardware virtualization is mainly done for the server


platforms, because controlling virtual machines is much easier than
controlling a physical server.
Operating System Virtualization

• When the virtual machine software or virtual machine manager


(VMM) is installed on the Host operating system instead of directly
on the hardware system is known as operating system virtualization.

• Usage: Operating System Virtualization is mainly used for testing the


applications on different platforms of OS.
Server Virtualization

• When the virtual machine software or virtual machine manager


(VMM) is directly installed on the Server system is known as server
virtualization.

• Usage: Server virtualization is done because a single physical server


can be divided into multiple servers on the demand basis and for
balancing the load.
Storage Virtualization

• Storage virtualization is the process of grouping the physical storage


from multiple network storage devices so that it looks like a single
storage device.
• Storage virtualization is also implemented by using software
applications.

• Usage: Storage virtualization is mainly done for back-up and recovery


purposes.
VMM/Hypervisor

• Enables the creation and management of VMs.


• Manages allocation of system resources for VMs.
• Allow several operating systems to run concurrently on a single
hardware platform.
• Some examples of VMM are :
• Xen
• VMWare
• UML
• Denali
Internet of Things vs. Cloud Computing
Multi-Cloud Vs Hybrid Cloud

• Hybrid clouds always include a private cloud and are typically


managed as one entity.
• Multi-clouds always include more than one public cloud service,
which often perform different functions. Multi-clouds do not have to
include a private cloud component, but they can, in which case they
can be both multi-cloud and hybrid cloud.
Cloud data management

• Cloud data management is the practice of storing an organization’s


data at an offsite data center that is typically owned and overseen by
a vendor who specializes in public cloud infrastructure, such as AWS
or Microsoft Azure.
• Cloud data management is a way to manage the data across cloud
platforms.
• Managing data in the cloud provides an automated backup strategy,
and ease of access from any location.
Cloud Data Management Vs Traditional Data
Management
Traditional data management tools work well for on-premises
workloads but tend to struggle when cloud-based workloads are
involved.
Other key tenets of cloud data management platforms include:
• They can support data across various cloud ecosystems (multi-cloud).
• They are API-driven and delivered as microservices.
• They use modern constructs like containers and serverless for faster
and scalable deployment.
• They are simple to install and set up.
• They are easy to manage, with automatic upgrades and patch
management.
• They are priced based on service utilization.
Key Capabilities for Data Management in the
Cloud
• Some important capabilities to consider when creating your cloud
data strategy:
1. Cloud Integration
2. Cloud Data Quality and Governance
3. Cloud Data Privacy and Security
4. Cloud Master Data Management
5. Cloud Metadata Management and Data Cataloging
6. AI-Driven Enhanced Intelligence
Cloud Integration

• The cloud can drive innovation, uncover efficiencies, and help redefine business
processes. But you can only achieve these benefits when your cloud infrastructure
allows you to integrate, synchronize, and relate all data, applications, and processes—
on-premises or in any part of your multi-cloud environment.

• At a more granular level, businesses may be looking to design, run, and automate
business processes that span applications. They might want to integrate applications
in real time using APIs, and messaging, or run extract transform load batch
integration jobs to keep application data synchronized.

• For these situations, organizations need intelligent data and application integration
and API management tools, as well as a broad set of connectivity capabilities—all of
which form the core components of a modern integration platform as a service
(iPaaS).
iPaaS is a hosted service offering in which a third-party provider delivers infrastructure
and middleware to manage, develop and integrate data and applications.
Cloud Data Quality and Governance

• As companies put data at the heart of their business processes, the


most successful organizations recognize the role of high-quality,
trusted data in their digital transformation initiatives. In addition,
data regulations have become increasingly complex and dynamic.

• To move their initiatives forward, an organization must ensure that


people across the enterprise are able to easily locate, access,
understand, and use data. An automated, cloud-based data quality
and governance process provides the clean, high-quality data that
business and IT users need to quickly realize business value.
Cloud Data Privacy and Security

• With the rise of cloud, data is becoming more exposed to the possibility of
abuse and attacks beyond the traditional firewall.
• Privacy assurance helps you to use safe data, accelerate and unblock cloud
workload migration, and deliver innovative products and services that
build on customer trust.
• Integrated cloud data privacy and protection tools can help you:
• Automate discovery and classification of sensitive data.
• Map identities for clear ownership and support data access rules.
• Operationalize privacy policies.
• Model and analyze data risk exposure across data stores and locations.
• An integrated approach to cloud data privacy based on metadata-driven
intelligence and automation helps you take quick action by providing data
use transparency, protecting personal information with data masking, and
monitoring for the effectiveness of controls in place for audit reporting.
Cloud Master Data Management

• With all the data being generated across business lines, you need a
complete, 360-degree view of any domain and any relationship in the
cloud. Furthermore, there is a push for intelligent data stewardship
and improved search and visualization of data, as well as improved
verification and enrichment.
• Cloud master data management (MDM) capabilities synchronize the
most critical data across various systems in your organization into a
single, validated record, enabling AI and analytics teams to derive
deep insights from that data to power your business.
• A modern cloud-based MDM has to apply AI and ML to automate
data stewardship processes as well as provide actionable insights to
business users.
Cloud Metadata Management and Data
Cataloging
• All business transformations depend on good, trusted data. But as
the data landscape grows more complex, diverse, and distributed,
across many different departments, applications, data warehouses,
and data lakes (some on-premises, others in the cloud), it becomes
difficult to know exactly what data you have, where it resides, and
how best to manage it.
• By leveraging a combination of technical, business, operational, and
usage metadata, intelligent data catalogs help build a robust data
foundation to support cloud modernization, data governance, and
other business priorities.
• A comprehensive enterprise data catalog solution uses machine
learning-based data discovery to scan and catalog data assets across
the enterprise.
AI-Driven Enhanced Intelligence

Modern cloud data management platforms provide AI capabilities that


allow you to:
• Automatically discover and catalog data across various systems such
as ERP, CRM, and so on.
• Automatically discover relationships between customer data and
match insights to specific people.
• Automate data integration and data quality tasks, intelligent policy
management and enforcement, and more.
Benefits of cloud data management

• Security
• Scalability and savings
• Anywhere access
• Automated backups and disaster recovery
• Improved data quality
• Automated updates
• Sustainability
• Pay as you go pricing
• Zero maintenance
Benefits of cloud data management

• Security
• Although cloud security has improved dramatically over the last several
years, it's ultimately up to each organization to establish data access
policies that ensure that only authorized users are able to access the data.
• Modern cloud data management often delivers better data protection than
on-premises solutions. In fact, 94% of cloud adopters report security
improvements. Why?
• First of all, cloud data management reduces the risk of data loss due to
device damage or hardware failure.
• Second, companies specializing in cloud hosting and data management
employ more advanced security measures and practices to protect sensitive
data than companies that invest in their on-premises data.
Benefits of cloud data management

• Scalability and savings


• Cloud data management lets users scale services up or down as needed.
• More storage or compute power can be added when needed to
accommodate changing workloads.
• Companies can then scale back after the completion of a big project to
avoid paying for services they don’t need.
• The cloud storage providers have a nearly unlimited amount of storage that
is readily available at any time.
• Anywhere access
• The very nature of the cloud means that data is accessible from anywhere.
• This access also supports a collaborative work culture, as employees can
work together on a dataset, easily share insights, and more.
Benefits of cloud data management

• Automated backups and disaster recovery


• Almost all cloud providers automatically back up data stored in the cloud.
• Some cloud backup services even provide immutable point-in-time data
backup capabilities, which can help keep data protected.
• Having an up-to-date backup at all times also speeds up the process of
disaster recovery.
• Improved data quality
• Many cloud data management platforms are designed centralize data,
thereby enabling a single data set to be used throughout the organization.
• This approach helps eliminate duplicate data, driving down storage costs
while also eliminating the inconsistencies that so often exist across data
sets.
• Data remains clean, consistent, up-to-date, and accessible for every use
case, from real-time data analytics to advanced machine learning
applications to external sharing via APIs.
Benefits of cloud data management

• Automated updates
• Cloud data management providers are committed to providing the best
services and capabilities.
• When applications need updating, cloud providers run these updates
automatically. That means the IT team doesn’t need to pause work while
they wait for IT to update everyone’s system.
• Sustainability
• For companies committed to decreasing their environmental impact, cloud
data management is a key step in the process.
• The cloud data management providers always maintain a certain level of
QoS to have a sustainable system.
• It allows organizations to reduce the carbon footprint created by their own
facilities and to extend telecommuting options to their teams.
Benefits of cloud data management

• Pay as you go pricing


• Cloud service providers generally bill subscribers on a per gigabyte, per
month basis.
• This means that organizations don't have to endure the costs of purchasing
storage hardware. Instead, organizations pay only for the storage they
consume.
• Zero maintenance
• Public clouds providers handle all required maintenance, meaning that
organizations never have to worry about replacing failed hard disks,
performing hardware refreshes or installing firmware updates.
Best practices for a cloud data management
strategy
Managing data in the cloud is going to look very similar to managing
an on-premises data store, with a few extra considerations.
• Start with a plan
• Maintain healthy data
• Back up the data (often)
• Don’t forget about data governance
Best practices for a cloud data management
strategy
Managing data in the cloud is going to look very similar to managing an on-
premises data store, with a few extra considerations.
• Start with a plan/Define the goals of the project
• Dumping data into the cloud is not “cloud data management.”
• Will you move all data to the cloud, or create a hybrid environment?
• Who needs access to what data?
• Where should different processing tasks take place?
• The first step in any cloud data management project is to define what the
organization hopes to accomplish. Without clear goals, it's impossible to
implement a cloud data management strategy that's well suited to the
organization's unique needs.
• Maintain healthy data
• This is incredibly important, as other data management practices depend upon it.
• Keeping data healthy means ensuring that it’s valid, complete, and of sufficient
quality to produce analytics that decision-makers can feel comfortable relying on
for business decisions.
Best practices for a cloud data management
strategy
Managing data in the cloud is going to look very similar to managing
an on-premises data store, with a few extra considerations.
• Back up the data (often)
• Most cloud software-as-a-services (SaaS) providers will automatically run
regular backups.
• If a company is hosting its own cloud, however, make sure the IT
department is running regular backups.
• Don’t forget about data governance
• An existing data governance policy for on-premises data can be updated for
a hybrid or cloud data management architecture.
• Moving data to the cloud, however, often means extra compliance issues
need to be considered, so make sure those don’t slip through the cracks.
Real-world examples of successful cloud
data management
1. Cloud data management for cybersecurity
• Imperva is a cybersecurity leader whose mission is to protect data and all
paths to it — internally and for its clients.
• Recently, Imperva created a unified data warehouse stack using Talend,
AWS, Snowflake, and Tableau.
• This central hub integrates multiple data sources and is now used across
almost all of the company’s departments and lines of business.
• Implementing this single source of data across the organization has created
an environment where data is clean, healthy, and accessible for multiple
uses and fresh, agile business insights.
2. Cloud data management for services and media
• Infopro Digital
Reasons to Use Cloud Data Protection

• Assurance of Comprehensive Data Collection


• Simplifies Backup and Recovery
• Works across Locations Worldwide
• Easy to Analyze Data for Trends
• Makes Malware/Ransomware Recovery Easier
• Early Warning for Potential Data Access Anomalies
• Ensures Compliance with Regulations
• Makes E-Discovery Quicker and Easier
• Invisible to End-Users
• Saves Money Compared to Other Options
Reasons to Use Cloud Data Protection

• Assurance of Comprehensive Data Collection

• Simplifies Backup and Recovery


Reasons to Use Cloud Data Protection

• Works across Locations Worldwide

• Easy to Analyze Data for Trends


Reasons to Use Cloud Data Protection

• Makes Malware/Ransomware Recovery Easier

• Early Warning for Potential Data Access Anomalies


Reasons to Use Cloud Data Protection

• Ensures Compliance with Regulations

• Makes E-Discovery Quicker and Easier


Reasons to Use Cloud Data Protection

• Invisible to End-Users
Reasons to Use Cloud Data Protection

• Saves Money Compared to Other Options


Data Model

• The primary abstraction is a table of items (or records) where each item is a
key-value pair or a row.
• In this abstraction, each record is identified by a unique key, and the value
can vary in its structure.
• The simplest, Blob Data Model, is one where the value is an uninterpreted
binary string object, i.e., a blob.
• A more structured Relational Data Model approach for the value is a flat
row-like structure similar to the relational model, where the value is
structured into multiple columns, each with its own attribute (or key) name.
• Finally, the Column Family Data Model is one where the columns in the
value field are grouped together into column families, each consisting of a
set of columns.
• Multiple versions of each record in the key-value store can be maintained
and indexed by a system or a user-defined timestamp.
Data Model

• In general, the systems allow large rows, thus allowing the logical
entity to be represented as a single row. However, a single row
typically can reside in a single server.
• The systems can scale to billions of key-value pairs using horizontal
partitioning, where the rows of the key-value store are distributed
among multiple servers.
• This is different from RDBMSs that consider data as a cohesive whole
and a failure in one component results in overall system
unavailability.

You might also like