Unit 1
Unit 1
UNIT I
Introduction to Distributed File Systems and Cloud: Introduction to Distributed
File Systems, Cloud Computing, Cloud Data Management and its Goals &
Challenges, Models of Cloud Data Management, Cloud Data Management Basics,
Cloud Data Storage, Reasons to Use Cloud Data Management.
UNIT II
Cloud Data Management & its Applications: Large data processing using Map-
Reduce; big data technologies and tools; data modelling, storage, indexing, and
query processing for big data; key-value storage systems, columnar databases,
NoSQL systems; big data applications. Multi-tenant database systems: Multitenancy,
Scalable, Consistent, database elasticity in the cloud.
UNIT III
Azure database service platform: Understanding the Service, Designing SQL
Database, Migrating an Existing Database, Using SQL Database, Scaling SQL
Database, Governing SQL Database. MySQL and PostgreSQL.
Syllabus
UNIT IV
Cloud Data Management Techniques: Hybrid cloud features, migrate databases to
Azure IaaS, Run SQL Server on Microsoft Azure Virtual Machines, Considerations on
High Availability and Disaster Recovery Options with SQL Server on Hybrid Cloud
and Azure IaaS, Working with NoSQL Alternatives.
UNIT V
Cloud Data Security and Privacy: Aspects of Data Security, Defining Organizational
Cloud Security Responsibilities, Assessing Risk in the Cloud, Existing Security Tools,
Building a Security Strategy.
Syllabus
TEXTBOOKS
1. Faithe Wempen. Cloud Data Management For Dummies®, Druva Special Edition. John Wiley & Sons, Inc.,
2017.
2. Lawrence Miller. Cloud Security & Compliance For Dummies®, Palo Alto Networks® Special Edition. John
Wiley & Sons, Inc., 2019.
3. Data management in the cloud: challenges and opportunities: Divyakant Agrawal, Sudipto das, Amr EI
Abbadi, 2013.
4. Cloud data design, Orchestration and Management using Microsoft Azure, Francesco Diaz Roberto Freato,
Apress, Springer publications, 2018.
REFERENCES
5. Andrew S. Tanenbaul, Maarten Van Steen, Distributed Systems, Principles and Paradigms, Pearson
publications, 2nd edition.
6. Cloud data design, Orchestration and Management using Microsoft Azure, Francesco Diaz Roberto Freato,
Apress, Springer publications, 2018.
7. Cloud database development and Management, Lee chao, CRC Press, Taylor and Francis group. 2014.
8. Cloud data management, Liang Zhao, Sherif Sakr, Anna Liu, Athman Bouguettaya, Springer publications,
2014.
Unit - 1
Introduction to
Distributed File
Systems and Cloud
Storage Models
Read (RPC)
Return (Data)
Client C)
(R P
i te Server cache
Wr
K
AC
Client
• Remote Disk: Reads and writes forwarded to server
• Advantage: Server provides completely consistent view of file system to multiple clients
• Problems? Performance!
• Going over network is slower than going to local memory
• Lots of network traffic/not well pipelined
• Server can be a bottleneck
DISTRIBUTED FILE SYSTEM ARCHITECTURE
Architecture
• Client-Server Architectures
• Cluster-Based Distributed File Systems
• Symmetric Architectures
Distributed Computing Vs Cloud Computing
Distributed computing Cloud computing
• Distributed computing is the use of • Cloud computing is the use of network
distributed systems to solve single large hosted servers to do several tasks like
problems by distributing tasks to single storage, process and management of
computers in the distributing systems. data.
• In simple, distributed computing can be • In simple, cloud computing can be said
said as a computing technique which allows as a computing technique that delivers
to multiple computers to communicate and hosted services over the internet to its
work to solve a single problem. users/customers.
• Distributed computing helps to achieve • Cloud computing provides services
computational tasks more faster than using such as hardware, software,
a single computer as it takes a lot of time. networking resources through internet.
• The goal of distributed computing is to • The goal of cloud computing is to
distribute a single task among multiple provide on demand computing services
computers and to solve it quickly by over internet on pay per use model.
maintaining coordination between them.
Cloud Computing
1. On Demand Self-Service
Customers can self-provision computing resources like server time, storage,
network, applications as per their demands without human intervention, i.e., cloud
service provider.
2. Broad Network Access
Computing resources are available over the network and can be accessed using
heterogeneous client platforms like mobiles, laptops, desktops, PDAs, etc.
3. Rapid Elasticity
Computing resources such as storage, processing, network, etc., are pooled to serve
multiple clients. For this, cloud computing adopts a multitenant model where the
computing resources of service providers are dynamically assigned to the customer
on their demand.
The customer is not even aware of the physical location of these resources.
However, at a higher level of abstraction, the location of resources can be specified.
Essential Characteristics
4. Resource Pooling
Computing resources for a cloud customer often appear limitless because
cloud resources can be rapidly and elastically provisioned. The resource can be
released at an increasingly large scale to meet customer demand.
1. Private Cloud
• A cloud environment deployed for the exclusive use of a single organization is
a private cloud. An organization can have multiple cloud users belonging to
different business units.
• Private cloud infrastructure can be either on or off, depending on the
organization need. The organization may unilaterally own and manage the
private cloud. It may assign this responsibility to a third party, i.e., cloud
providers, or a combination of both.
2. Public Cloud
• The cloud infrastructure deployed for the use of the general public is the
public cloud. This public cloud model is deployed by cloud vendors, Govt.
organizations, or both.
• The public cloud is typically deployed at the cloud vendor's premises.
Deployment Models
3. Community Cloud
• A cloud infrastructure shared by multiple organizations that form a
community and share common interests is a community cloud. Community
Cloud is owned, managed, and operated by organizations or cloud vendors,
i.e., third parties.
• Communications may take place on the premises of cloud community
organizations or the cloud provider's premises.
4. Hybrid Cloud
• Cloud infrastructure includes two or more distinct cloud models such as
private, public, and community, so that cloud infrastructure is a hybrid cloud.
• While these distinct cloud structures remain unique entities, they can be
bound together by specialized technology enabling data and application
portability.
Service Models
4. Elasticity
• Cloud computing resources should be elastic, which means that the user
should be free to attach and release computing resources on their demand.
5. Business Orientation
• Companies must ensure the QoS that offer before moving mission-critical
applications to the cloud.
• The CSP should develop a mechanism to understand the exact business
requirement of the customer and customize the service parameters as per
the customer's requirement.
6. Trust
• Trust is the most important factor that drives any customer to move their
computing to the cloud. For the cloud to be successful, trust must be
maintained to create a federation between the cloud customer, the cloud
vendor, and the various cloud providers.
Advantages of Cloud Computing
Disadvantages of Cloud Computing
1. Internet Connectivity
2. Vendor lock-in
3. Limited Control
4. Security
Cloud Infrastructure
Virtualization
• The cloud can drive innovation, uncover efficiencies, and help redefine business
processes. But you can only achieve these benefits when your cloud infrastructure
allows you to integrate, synchronize, and relate all data, applications, and processes—
on-premises or in any part of your multi-cloud environment.
• At a more granular level, businesses may be looking to design, run, and automate
business processes that span applications. They might want to integrate applications
in real time using APIs, and messaging, or run extract transform load batch
integration jobs to keep application data synchronized.
• For these situations, organizations need intelligent data and application integration
and API management tools, as well as a broad set of connectivity capabilities—all of
which form the core components of a modern integration platform as a service
(iPaaS).
iPaaS is a hosted service offering in which a third-party provider delivers infrastructure
and middleware to manage, develop and integrate data and applications.
Cloud Data Quality and Governance
• With the rise of cloud, data is becoming more exposed to the possibility of
abuse and attacks beyond the traditional firewall.
• Privacy assurance helps you to use safe data, accelerate and unblock cloud
workload migration, and deliver innovative products and services that
build on customer trust.
• Integrated cloud data privacy and protection tools can help you:
• Automate discovery and classification of sensitive data.
• Map identities for clear ownership and support data access rules.
• Operationalize privacy policies.
• Model and analyze data risk exposure across data stores and locations.
• An integrated approach to cloud data privacy based on metadata-driven
intelligence and automation helps you take quick action by providing data
use transparency, protecting personal information with data masking, and
monitoring for the effectiveness of controls in place for audit reporting.
Cloud Master Data Management
• With all the data being generated across business lines, you need a
complete, 360-degree view of any domain and any relationship in the
cloud. Furthermore, there is a push for intelligent data stewardship
and improved search and visualization of data, as well as improved
verification and enrichment.
• Cloud master data management (MDM) capabilities synchronize the
most critical data across various systems in your organization into a
single, validated record, enabling AI and analytics teams to derive
deep insights from that data to power your business.
• A modern cloud-based MDM has to apply AI and ML to automate
data stewardship processes as well as provide actionable insights to
business users.
Cloud Metadata Management and Data
Cataloging
• All business transformations depend on good, trusted data. But as
the data landscape grows more complex, diverse, and distributed,
across many different departments, applications, data warehouses,
and data lakes (some on-premises, others in the cloud), it becomes
difficult to know exactly what data you have, where it resides, and
how best to manage it.
• By leveraging a combination of technical, business, operational, and
usage metadata, intelligent data catalogs help build a robust data
foundation to support cloud modernization, data governance, and
other business priorities.
• A comprehensive enterprise data catalog solution uses machine
learning-based data discovery to scan and catalog data assets across
the enterprise.
AI-Driven Enhanced Intelligence
• Security
• Scalability and savings
• Anywhere access
• Automated backups and disaster recovery
• Improved data quality
• Automated updates
• Sustainability
• Pay as you go pricing
• Zero maintenance
Benefits of cloud data management
• Security
• Although cloud security has improved dramatically over the last several
years, it's ultimately up to each organization to establish data access
policies that ensure that only authorized users are able to access the data.
• Modern cloud data management often delivers better data protection than
on-premises solutions. In fact, 94% of cloud adopters report security
improvements. Why?
• First of all, cloud data management reduces the risk of data loss due to
device damage or hardware failure.
• Second, companies specializing in cloud hosting and data management
employ more advanced security measures and practices to protect sensitive
data than companies that invest in their on-premises data.
Benefits of cloud data management
• Automated updates
• Cloud data management providers are committed to providing the best
services and capabilities.
• When applications need updating, cloud providers run these updates
automatically. That means the IT team doesn’t need to pause work while
they wait for IT to update everyone’s system.
• Sustainability
• For companies committed to decreasing their environmental impact, cloud
data management is a key step in the process.
• The cloud data management providers always maintain a certain level of
QoS to have a sustainable system.
• It allows organizations to reduce the carbon footprint created by their own
facilities and to extend telecommuting options to their teams.
Benefits of cloud data management
• Invisible to End-Users
Reasons to Use Cloud Data Protection
• The primary abstraction is a table of items (or records) where each item is a
key-value pair or a row.
• In this abstraction, each record is identified by a unique key, and the value
can vary in its structure.
• The simplest, Blob Data Model, is one where the value is an uninterpreted
binary string object, i.e., a blob.
• A more structured Relational Data Model approach for the value is a flat
row-like structure similar to the relational model, where the value is
structured into multiple columns, each with its own attribute (or key) name.
• Finally, the Column Family Data Model is one where the columns in the
value field are grouped together into column families, each consisting of a
set of columns.
• Multiple versions of each record in the key-value store can be maintained
and indexed by a system or a user-defined timestamp.
Data Model
• In general, the systems allow large rows, thus allowing the logical
entity to be represented as a single row. However, a single row
typically can reside in a single server.
• The systems can scale to billions of key-value pairs using horizontal
partitioning, where the rows of the key-value store are distributed
among multiple servers.
• This is different from RDBMSs that consider data as a cohesive whole
and a failure in one component results in overall system
unavailability.