0% found this document useful (0 votes)

21 views45 pages

09 - Cloud-Enabling Technologies - v2

HKBU - COMP7940

Uploaded by

christopherhkrita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views45 pages

09 - Cloud-Enabling Technologies - v2

HKBU - COMP7940

Uploaded by

christopherhkrita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Cloud-

Enabling
Technologies:
Storage and
Computing

Slides are modified from several

sources.
Contents

1. Data storage in the age of cloud computing.

2. Evolution of storage systems.
3. Storage and data models.
4. Database management systems.
5. Unix file system (IFS).
6. Network file system (NFS).
7. General parallel file system (GPFS).
8. Google file system (GFS).
9. Apache Hadoop.
10. Locks; Chubby a locking service.
11. Online transaction processing.

2
Big data

• No single standard definition…

“Big Data” is data whose scale, complexity,

and speed require new architecture,
techniques, algorithms, and analytics to
manage it and extract value and hidden
knowledge from it…
• New concept ➔ reflects the fact that many
applications use data sets that cannot be
stored and processed using local resources.

3
4Vs of Big Data

• Size of Data • Different

type of
Data
Volume Variety

Veracity Velocity

• Trustworthiness • Speed of data is

of Data generated

4
Characteristics of Big Data:
1-Scale (Volume)

• Data Volume
• 44x increase from 2009 to 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially

Exponential increase in
collected/generated data

5
Data storage on a cloud
• An ever increasing number of cloud-based services collect detailed data about
their services and information about the users of these services. The service
providers use the clouds to analyze the data.
• Humongous amounts of data - in 2013
• The Internet video will generate over 18 EB/month.
• Global mobile data traffic will reach 2 EB/month.
(1 EB = 1018 bytes, 1 PB = 1015 bytes, 1 TB = 1012 bytes, 1 GB = 1012 bytes)

6
Characteristics of Big Data:
2-Complexity (Variety)
• Various formats, types, and structures
• Numerical, text, images, audio, video,
sequences, time series, social media data,
multi-dim arrays, etc…
• A single application can be
generating/collecting many types of data

To extract knowledge➔ all these types of

data need to linked together

7
Characteristics of Big Data:
3-Speed (Velocity)
• Data is being generated fast and need to be processed fast
• Static data ➔ Streaming data
• Online Data Analytics
• Late decisions ➔ missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase history, what
you like ➔ send promotions right now for store next to you

• Healthcare monitoring: sensors monitoring your activities and body ➔ any

abnormal measurements require immediate reaction

8
Characteristics of Big Data:
4-Uncertainty (Veracity)*
• Some make it 4V’s
• In the big data era, data can be in doubt
• Uncertainty due to data inconsistency & incompleteness
• For example, in the concurrency scenario
• Model approximations

9
Data storage in the age of
cloud computing

• The volume of data generated by human

activities is growing about 40% per year;
• The network-centric data storage model is
particularly useful for mobile devices (e.g. iCloud
vs local storage)
• Big Data reflects the reality that many
applications use data sets so large that local
computers.
• The management of the large collection of
storage systems poses significant challenges and
requires novel approaches to system design.
• Effective data replication and storage
management strategies are critical to the
computations performed on the cloud.

10
Major challenges
• The storage system design philosophy has shifted from performance-
at-any-cost to reliability-at-the-lowest-possible-cost.
• Important implications on software complexity.
• Maintaining consistency among multiple copies of data records
• increases the data management software complexity
• could negatively affect the storage system performance if data is frequently
updated.
• Sophisticated strategies to reduce the access time for data streaming
and content delivery.
• Data replication allows concurrent access to data from multiple
processors and decreases the chances of data loss.

11
Data Base Management System (DBMS)
• Database ➔ a collection of logically-related records.
• Data Base Management System (DBMS) ➔ the software that controls the
access to the database.
• Query language ➔ a dedicated programming language used to develop
database applications.
• Most cloud application do not interact directly with the file systems, but
through a DBMS.
• Database models ➔ reflect the limitations of the hardware available at the
time and the requirements of the most popular applications of each period.
• navigational model of the 1960s.
• relational model of the 1970s.
• object-oriented model of the 1980s.
• NoSQL model of the first decade of the 2000s.

16
Storage
requirements of
cloud applications
• Most cloud applications are data-intensive and
test the limitations of the existing infrastructure.
Requirements:
• Rapid application development and short-
time to the market.
• Low latency.
• Scalability.
• High availability.
• Consistent view of the data.
• These requirements cannot be satisfied
simultaneously by existing database models; e.g.,
relational databases are easy to use for
application development but do not scale well.
• Joining tables takes time!

17
CAP Theorem

• A storage system can achieve at

most two out of the three
properties
• Consistency – All reads receive
the most recent write or an error Consistence
• Availability – All reads contain
data, but might not be the most
recent one
• Partition tolerance – The system
will work to operate despite in
the existence of failure Partition
Availability
Tolerance

18
https://toppertips-bx67a.ondigitalocean.app/cap-theorem/ 19
CAP Theorem Proof

update update update

A A A

D B D B A A

C C A

C+P ? A+P Out-dated C+A

20
Case study: SQL vs NoSQL

21
ACID property of Relational Database –
SQL

• Atomicity - All changes to data are performed as if they are a single operation. That is, all the
changes are performed, or none of them are.
• For example, in an application that transfers funds from one account to another, the
atomicity property ensures that, if a debit is made successfully from one account, the
corresponding credit is made to the other account.
• Consistency - Data is in a consistent state when a transaction starts and when it ends.
• For example, in an application that transfers funds from one account to another, the
consistency property ensures that the total value of funds in both the accounts is the same
at the start and end of each transaction.
• Isolation - The intermediate state of a transaction is invisible to other transactions. As a result,
transactions that run concurrently appear to be serialized.
• For example, in an application that transfers funds from one account to another, the
isolation property ensures that another transaction sees the transferred funds in one
account or the other, but not in both, nor in neither.
• Durability - After a transaction successfully completes, changes to data persist and are not
undone, even in the event of a system failure.
• For example, in an application that transfers funds from one account to another, the
durability property ensures that the changes made to each account will not be reversed.

22
NoSQL databases

• The name NoSQL is misleading. Stonebreaker notes that “blinding performance depends
on removing overhead. Such overhead has nothing to do with SQL, it revolves around
traditional implementations of ACID transactions, multi-threading, and disk
management.”
• No SQL provides less assurance than relational database but it scales very well and reacts
well to rapid data changes.
• Attributes:
• Scale well.
• Do not exhibit a single point of failure.
• Have built-in support for consensus-based decisions.
• Support partitioning and replication as basic primitives.

23
BASE property of NoSQL databases

• Basically Available – The database appears to work for most of the

time
• Soft state - the system may change over time, even without input.
• The system internal spread the update over time
• Eventually consistent – Nodes may be inconsistent at the
beginning, but it guarantee to be consistent over a period of time
(self-healing) E.g
• I watch the weather report and learn that it's going to rain tomorrow.
• I tell you that it's going to rain tomorrow.
• Your neighbor tells his wife that it's going to be sunny tomorrow.
• You tell your neighbor that it is going to rain tomorrow.
• Eventually, your neighbor's wife will know it is going to rain tomorrow.

24
Four main types of NoSQL databases

• Document databases — the document can vary from record to record. They
store data in JSON (JavaScript Object Notation) or BSON (Binary JSON) data
formats, which provide flexibility in working with data of all types.
• Support structured or semi-structured data
• Key-value stores employ a simple schema, with data stored as a simple
collection of key-value pairs. (e.g. redis)
• Keys are unique, and the value associated with a key can range from simple
primitives to complex objects.
• Wide column stores capture huge volumes of data in a row-and-column
format. While they are considered NoSQL databases, their format makes
them similar to relational databases.
• They differ from relational databases in that every row is not required to have the
same number of columns.
• Wide column stores are often built to handle big data use cases that require
aggregation for queries.
• Graph databases have a fundamentally different structure, in that data
elements and their relationships are stored as a graph (e.g. Amazon
Neptune)

25
File and File System

26
Logical and physical organization of a file
• File ➔ a linear array of cells stored on a persistent storage device. Viewed
by an application as a collection of logical records; the file is stored on a
physical device as a set of physical records, or blocks, of size dictated by the
physical media.
• File pointer➔ identifies a cell used as a starting point for a read or write
operation.
• The logical organization of a file ➔ reflects the data model, the view of the
data from the perspective of the application.
• The physical organization of a file ➔ reflects the storage model and
describes the manner the file is stored on a given storage media.

27
File systems
• File system ➔ collection of directories; each directory provides information
about a set of files.
• Traditional – Unix File System.
• Distributed file systems.
• Network File Systems (NFS) - very popular, have been used for some time, but do not
scale well and have reliability problems; an NFS server could be a single point of failure.
• Storage Area Networks (SAN) - allow cloud servers to deal with non-disruptive
changes in the storage configuration. The storage in a SAN can be pooled and
then allocated based on the needs of the servers. A SAN-based implementation
of a file system can be expensive, as each node must have a Fibre Channel
adapter to connect to the network.
• Parallel File Systems (PFS) - scalable, capable of distributing files across a large
number of nodes, with a global naming space. Several I/O nodes serve data to all
computational nodes; it includes also a metadata server which contains
information about the data stored in the I/O nodes. The interconnection network
of a PFS could be a SAN.

28
Unix File System (UFS)

• The layered design provides flexibility.

• The layered design allows UFS to separate the concerns for the
physical file structure from the logical one.
• The vnode layer allowed UFS to treat uniformly local and
remote file access.
• The hierarchical design supports scalability reflected by the file
naming convention. It allows grouping of files directories, supports
multiple levels of directories, and collections of directories and
files, the so-called file systems.
• The metadata supports a systematic design philosophy of the file
system and device-independence.
• Metadata includes: file owner, access rights, creation time,
time of the last modification, file size, the structure of the file
and the persistent storage device cells where data is stored.
• The inodes contain information about individual files and
directories. The inodes are kept on persistent media together
with the data.

29
UFS layering Symbolic path
name layer

Absolute path
name layer

Path name
layer

Logical file structure Logical

record
File name layer
Physical file structure
Block Block

Inode layer
UFS layered design separates the physical file structure from
the logical one. The lower three layers are related to the File layer
physical file structure, while the upper three layers are
related to logical organization.
Block layer

30
Network File System (NFS)

• Design objectives:
• Provide the same semantics as a local Unix File System
(UFS) to ensure compatibility with existing applications.
• Facilitate easy integration into existing UFS.
• Ensure that the system will be widely used; thus, support
clients running on different operating systems.
• Accept a modest performance degradation due to remote
access over a network with a bandwidth of several Mbps.
• NFS is based on the client-server paradigm. The client runs on
the local host while the server is at the site of the remote file
system; they interact by means of Remote Procedure Calls
(RPC).
• A remote file is uniquely identified by a file handle (fh) - a 32-
byte internal name.

31
Application

Local host Remote host

File system API interface File system API interface

Vnode layer Vnode layer

NFS client NFS server

NFS stub NFS stub

Local file system Remote file system

Communication network

The NFS client-server interaction. The vnode layer implements file operation in a uniform
manner, regardless of whether the file is local or remote.
An operation targeting a local file is directed to the local file system, while one for a remote
file involves NFS; an NSF client packages the relevant information about the target and the
NFS server passes it to the vnode layer on the remote host which, in turn, directs it to the
remote file system.
32
Application NFS client NFS server
API RPC
LOOKUP(dfh,fname)
Lookup fname
OPEN READ(fh, offset,count) in directory dfh and retun
(fname,flags,mode) -------------------------------------- fh (the file handle) and
CREATE(dfh,fname,mode) file attributes or create a
new file
CLOSE (fh) Remove fh from the open file table of
the process
Read data from file fh at
READ(fd,buf,count) READ(fh, offset,count) offset and length count
and return it.

Write count bytes of data

WRITE(fd,buf,count) WRITE(fh, offset,count,buf) to file fh at location given
by offset

Update the file pointer in the open file

SEEK(fd,buf,whence)
table of the process

Write all cached data to persistent Write data

FSYNCH(fd) storage

CHMOD(fd, mode) SETATTR(fh, mode)

Update inode info

RENAME RENAME(dfh,fromfname,
Rename file
(fromfname,tofname) tofh,tofname)

STAT(fname) GETATTR(fh) Get metadata

MKDIR(dname) MKDIR(dfh, dname, attr)

Create/delete directory
RMDIR(dname) RMDIR(dfh, dname)

LOOKUP(dfh, fname)
LINK(fname, linkname) READLINK(fh) Create a link
LINK(dfh,fnam)

MOUNT Check the pathname

LOOKUP(dfh, fname) and sender’s IP address
(fsname,device) and return the fh of the
export root directory.

33
General Parallel File System
(GPFS)

• Parallel I/O implies concurrent execution of

multiple input/output operations. Support for
parallel I/O is essential for the performance of
many applications.
• GPFS.
• Developed at IBM in the early 2000s as a
successor of the TigerShark multimedia file
system.
• Designed for optimal performance of large
clusters; it can support a file system of up to
4 PB consisting of up to 4,096 disks of 1 TB
each.
• Maximum file size is (263 -1) bytes.
• A file consists of blocks of equal size, ranging
from 16 KB to 1 MB, stripped across several
disks.

34
SAN: Storage Area
Networks

35
Case Study: GFS

36
Google File System (GFS)

• Why distributed file system?

• Can we have a big disk?
• I/O is a bottleneck
• Suppose we have crawled 100 Tb worth of data
• State-of-the-art 7,200 rpm SATA drive has 3 Gbps I/O
• It takes 100 Tb / 3 Gbps = 9.3 hours to scan through
the entire data!

37
I/O throughput lags behind

Source: R1Soft, http://wiki.r1soft.com/pages/viewpage.action?pageId=3016608 3

8
Other issues
• Building a high-end supercomputer is very costly

• Storing all data in one place adds the risk of hardware failures

• Never put all eggs in one basket!

3
9
Google datacenter
• Lots of cheap, commodity PCs, each with disk and CPU

• 100s to 1000s of PCs in cluster in early days

• High aggregate storage capacity

• No costly “big disk”

• Spread processing across many machines

• High I/O bandwidth, proportional to the # of machines

• Parallel data processing

40
A cool idea! But wait…

• Stuff breaks
• If you have one server, it may stay up 3 years (1,000 days)
• If you have 10k servers, expect to lose 10 a day

41
GFS: The Google File System

A highly reliable storage system built

atop highly unreliable hardwares

42
Target environment
• Thousands of computers

• Distributed

• Computers have their own disks, and the file system spans
those disks

• Failures are the norm

• Disks, networks, processors, power supplies, application

software, OS software, human errors

43
Target environment

•Files are huge, but not many

•>100M, usually multi-gigabyte
• Read/write characteristics (write-once, read-many)
•Files are mutated by appending
•Once written, files are typically only read
•Large streaming reads and small random reads
are typical
• I/O bandwidth is more important than latency
•Suitable for batch processing and log analytics
44
GFS – design decisions
• Segment a file in large chunks.
• Implement an atomic file append operation allowing multiple applications
operating concurrently to append to the same file.
• Build the cluster around a high-bandwidth rather than low-latency
interconnection network. Separate the flow of control from the data flow.
Pipeline data transfer over TCP connections. Exploit network topology by
sending data to the closest node in the network.
• Eliminate caching at the client site. Caching increases the overhead for
maintaining consistency among cashed copies.
• Ensure consistency by channeling critical file operations through a master,
a component of the cluster which controls the entire system.
• Minimize the involvement of the master in file access operations to avoid
hot-spot contention and to ensure scalability.
• Support efficient checkpointing and fast recovery mechanisms.
• Support an efficient garbage collection mechanism.

46
GFS chunks
• GFS files are collections of fixed-size segments called chunks.
• The chunk size is 64 MB; this choice is motivated by the desire to optimize
the performance for large files and to reduce the amount of metadata
maintained by the system.
• A large chunk size increases the likelihood that multiple operations will be
directed to the same chunk thus, it reduces the number of requests to
locate the chunk and, at the same time, it allows the application to
maintain a persistent network connection with the server where the chunk
is located.
• A chunk consists of 64 KB blocks and each block has a 32 bit checksum.
• Ensure reliability through replication with 3+ copies.
• Chunks are stored on Linux files systems and are replicated on multiple sites; a user
may change the number of the replicas, from the standard value of three, to any
desired value.
• At the time of file creation each chunk is assigned a unique chunk handle.

47
File name & chunk index Master
Application

Chunk handle & chunk location Meta-information

Chunk data

State
information
Instructions

Communication network
Chunk handle
& data count

Chunk server Chunk server Chunk server

Linux file system Linux file system Linux file system

• The architecture of a GFS cluster; the master maintains state information about all
system components; it controls a number of chunk servers. A chunk server runs
under Linux; it uses metadata provided by the master to communicate directly with
the application. The data and the control paths are shown separately, data paths
with thick lines and the control paths with thin lines. Arrows show the flow of
control between the application, the master and the chunk servers.

48
Distributed File System

⚫ Chunk Servers
– File is split into contiguous chunks
– Typically each chunk is 16-64MB
– Each chunk replicated (usually 2x or 3x)
– Try to keep replicas in different racks

C0 C1 D0 C1 C2 C4 C0 C4

C5 C2 C5 C3 D0 D1
… D1 C3

Chunk server 1 Chunk server 2 Chunk server 3 Chunk server N

⚫ Chunk servers also serve as compute servers

How to compute over GFS with chunks?

• Map-Reduce!
• Will be covered in the next week.

(eBook PDF) Murach's PHP and MySQL (3rd Edition) by Joel Murachpdf download
67% (3)
(eBook PDF) Murach's PHP and MySQL (3rd Edition) by Joel Murachpdf download
42 pages
ENOVIAStudioModelingPlatformMQLGuide V6R2017x
No ratings yet
ENOVIAStudioModelingPlatformMQLGuide V6R2017x
188 pages
NOSQL
No ratings yet
NOSQL
23 pages
Big Data Analytics Lecture 3A
No ratings yet
Big Data Analytics Lecture 3A
27 pages
Big data Slides
No ratings yet
Big data Slides
26 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Introduction to NoSQL
No ratings yet
Introduction to NoSQL
13 pages
NoSQL Databases
No ratings yet
NoSQL Databases
52 pages
MODULE 3
No ratings yet
MODULE 3
37 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
777 1651399819 BD Module 5
No ratings yet
777 1651399819 BD Module 5
75 pages
Module_1
No ratings yet
Module_1
69 pages
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
4.NoSQL 1
No ratings yet
4.NoSQL 1
69 pages
Big Data Storage and Processing
No ratings yet
Big Data Storage and Processing
49 pages
BDA Module-3
No ratings yet
BDA Module-3
7 pages
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
No ratings yet
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
30 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
Big Data NOTES
No ratings yet
Big Data NOTES
14 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
Data Engineering Unit 3
No ratings yet
Data Engineering Unit 3
4 pages
SQL Vs NoSQL - Full
No ratings yet
SQL Vs NoSQL - Full
95 pages
07-BigData-DataAnalysis
No ratings yet
07-BigData-DataAnalysis
66 pages
Hbase Hive Pig
No ratings yet
Hbase Hive Pig
144 pages
Chapter-2_NoSQL_Databases_part1
No ratings yet
Chapter-2_NoSQL_Databases_part1
21 pages
Introduction To Big Data and NoSQL
No ratings yet
Introduction To Big Data and NoSQL
52 pages
Seminar Nosql
No ratings yet
Seminar Nosql
56 pages
test 1 big data
No ratings yet
test 1 big data
17 pages
Seminar Nosql
No ratings yet
Seminar Nosql
59 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
CC - Lecture 6-Data
No ratings yet
CC - Lecture 6-Data
44 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
CC Unit 4
No ratings yet
CC Unit 4
46 pages
NoSQL Database
No ratings yet
NoSQL Database
64 pages
Nosql Overview: Implementation Free
No ratings yet
Nosql Overview: Implementation Free
40 pages
Module-2
No ratings yet
Module-2
100 pages
BigData_NoSQL
No ratings yet
BigData_NoSQL
30 pages
NGT 11-2018,19
No ratings yet
NGT 11-2018,19
70 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts (4)
No ratings yet
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts (4)
9 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
Module-2
No ratings yet
Module-2
104 pages
No SQL
No ratings yet
No SQL
109 pages
Super Important Questions For BDA
100% (1)
Super Important Questions For BDA
26 pages
BDA Handy Notes
No ratings yet
BDA Handy Notes
19 pages
2- NoSQL
No ratings yet
2- NoSQL
32 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Bda Module 3
No ratings yet
Bda Module 3
24 pages
Chapter 5c
No ratings yet
Chapter 5c
18 pages
unit 4 BDA
No ratings yet
unit 4 BDA
22 pages
Intro to NoSQL DBs
No ratings yet
Intro to NoSQL DBs
44 pages
IntroNoSQL Revised
No ratings yet
IntroNoSQL Revised
28 pages
BDS Session 1
100% (1)
BDS Session 1
70 pages
Big Data Imp-1
No ratings yet
Big Data Imp-1
16 pages
2.1.SummerSOC2015 Tutorial NoSQL
No ratings yet
2.1.SummerSOC2015 Tutorial NoSQL
62 pages
nosql-kk
No ratings yet
nosql-kk
23 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
NoSQL Databases and Big Data Storage Systems
No ratings yet
NoSQL Databases and Big Data Storage Systems
4 pages
Lecture 2 Scalable Data Systems
No ratings yet
Lecture 2 Scalable Data Systems
41 pages
BDA Assign 1
No ratings yet
BDA Assign 1
21 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
SRS FOR HOSPITAL MANAGEMENT SYSTEM - Odt
No ratings yet
SRS FOR HOSPITAL MANAGEMENT SYSTEM - Odt
4 pages
Internship
No ratings yet
Internship
10 pages
JScribe_142_147
No ratings yet
JScribe_142_147
28 pages
Epoque Quick Ref Guide 3rd Edition
No ratings yet
Epoque Quick Ref Guide 3rd Edition
119 pages
Security Scanner
No ratings yet
Security Scanner
13 pages
Demo Script - Using Oracle EM Express For Performance Tuning
No ratings yet
Demo Script - Using Oracle EM Express For Performance Tuning
23 pages
SMIS - A School Management Information System: Mihal Brumbulli
No ratings yet
SMIS - A School Management Information System: Mihal Brumbulli
15 pages
Project Report: Crime Management System
100% (1)
Project Report: Crime Management System
65 pages
ICAI ITT Questions
No ratings yet
ICAI ITT Questions
116 pages
CIROS Production Overview
No ratings yet
CIROS Production Overview
1 page
70-464 Exam Dumps With PDF and VCE Download
No ratings yet
70-464 Exam Dumps With PDF and VCE Download
112 pages
Top Database Interview Questions and Answers Updated
No ratings yet
Top Database Interview Questions and Answers Updated
9 pages
Application - Form FIA - Ibms
100% (1)
Application - Form FIA - Ibms
4 pages
Software Requirements Specification Version
No ratings yet
Software Requirements Specification Version
33 pages
7_Calculation_procedures
No ratings yet
7_Calculation_procedures
119 pages
Huawei
No ratings yet
Huawei
49 pages
Lockheed C-130H (-30) Fleet Life Management Within The Royal Netherlands Air Force
No ratings yet
Lockheed C-130H (-30) Fleet Life Management Within The Royal Netherlands Air Force
22 pages
TT2 Chapter 03 Modularity and Initial Client Setup
No ratings yet
TT2 Chapter 03 Modularity and Initial Client Setup
12 pages
Database Schema
No ratings yet
Database Schema
7 pages
Migration From Oracle To TD
No ratings yet
Migration From Oracle To TD
41 pages
Basics of Website Design
No ratings yet
Basics of Website Design
3 pages
Sap Standart Fonksi̇yonlari
No ratings yet
Sap Standart Fonksi̇yonlari
12 pages
University of Gondar: Document Image Retrieval
No ratings yet
University of Gondar: Document Image Retrieval
9 pages
DBA Chapter 4 Advanced Concepts in Database
No ratings yet
DBA Chapter 4 Advanced Concepts in Database
12 pages
SQL Interview Questions
100% (1)
SQL Interview Questions
14 pages
Big Data Analytics
No ratings yet
Big Data Analytics
131 pages
Synopsis of Plant Nursary
No ratings yet
Synopsis of Plant Nursary
10 pages
Adil 1234
No ratings yet
Adil 1234
1 page

09 - Cloud-Enabling Technologies - v2

Uploaded by

09 - Cloud-Enabling Technologies - v2

Uploaded by

Cloud-

Slides are modified from several

1. Data storage in the age of cloud computing.

• No single standard definition…

“Big Data” is data whose scale, complexity,

• Size of Data • Different

• Trustworthiness • Speed of data is

To extract knowledge➔ all these types of

• Healthcare monitoring: sensors monitoring your activities and body ➔ any

• The volume of data generated by human

• A storage system can achieve at

update update update

C+P ? A+P Out-dated C+A

• Basically Available – The database appears to work for most of the

• The layered design provides flexibility.

Logical file structure Logical

Local host Remote host

File system API interface File system API interface

Vnode layer Vnode layer

NFS client NFS server

Local file system Remote file system

Write count bytes of data

Update the file pointer in the open file

Write all cached data to persistent Write data

CHMOD(fd, mode) SETATTR(fh, mode)

STAT(fname) GETATTR(fh) Get metadata

MKDIR(dname) MKDIR(dfh, dname, attr)

MOUNT Check the pathname

• Parallel I/O implies concurrent execution of

• Why distributed file system?

Source: R1Soft, http://wiki.r1soft.com/pages/viewpage.action?pageId=3016608 3

• Never put all eggs in one basket!

• 100s to 1000s of PCs in cluster in early days

• High aggregate storage capacity

• Spread processing across many machines

• Parallel data processing

A highly reliable storage system built

• Failures are the norm

• Disks, networks, processors, power supplies, application

•Files are huge, but not many

Chunk handle & chunk location Meta-information

Chunk server Chunk server Chunk server

Linux file system Linux file system Linux file system

Chunk server 1 Chunk server 2 Chunk server 3 Chunk server N

⚫ Chunk servers also serve as compute servers

You might also like