0% found this document useful (0 votes)
47 views

Big Data - Part II: Amjed Almousa, Ph.D. Cloud Computing & Big Data

The document discusses big data and distributed file systems (DFS). It explains that DFS divide files into chunks which are stored across multiple servers and managed by a master server. The optimal chunk size depends on file sizes - small chunks are better for small files to avoid wasted space, while large chunks work better for large files to avoid overloading the master server. It also describes how DFS uses a lease mechanism where a server can temporarily lease a chunk from the master server to edit it before propagating changes to other copies.

Uploaded by

Farooq Bushnaq
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Big Data - Part II: Amjed Almousa, Ph.D. Cloud Computing & Big Data

The document discusses big data and distributed file systems (DFS). It explains that DFS divide files into chunks which are stored across multiple servers and managed by a master server. The optimal chunk size depends on file sizes - small chunks are better for small files to avoid wasted space, while large chunks work better for large files to avoid overloading the master server. It also describes how DFS uses a lease mechanism where a server can temporarily lease a chunk from the master server to edit it before propagating changes to other copies.

Uploaded by

Farooq Bushnaq
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Big Data Part II

Cloud Computing & Big Data. Amjed Almousa, Ph.D.

Need for parallel processing Amount of Data. (specially with data replication) Cost of licensing for large number of machines

Why DBMS is not enough?


Cloud Computing & Big Data. Amjed Almousa, Ph.D.

Material from CMU:


http://qatar.cmu.edu/~msakr/15319-s10/lectures/lecture12.pdf

Slides not required are:


3-13, 17-21,

Material for DFS


Cloud Computing & Big Data. Amjed Almousa, Ph.D.

Client

Master Server

File a

Chunk a.1 => server 1,2 Chunk a.2 => server 1,3

Chunk Server 1

Chunk Server 2

Chunk Server 3

a.1

a.2

a.1

b.1

b.1

a.2

Cloud Computing & Big Data.

Amjed Almousa, Ph.D.

Small Chunk Size (Ex: 1Mb)

Large Chunk Size (Ex: 100Mb)

Small Files Size (Ex: 2Mb)

No Problem 2 Chunks Per File

Lots of wasted Space

Large File Size (Ex: 300Mb)

Large Number of Files 300 Chunks Per File => Overload the Master

No Problem 3 Chunks Per File

File / Chunk Size Comparison


Cloud Computing & Big Data. Amjed Almousa, Ph.D.

If a Node (server) needs to edit or append to a chunk, then it would need to Lease that chunk from the master server. Once the node is done editing, the change data is propagated to other copies of the chunk existing on other nodes. The lease is returned to the master node.

Lease Mechanism
Cloud Computing & Big Data. Amjed Almousa, Ph.D.

1. 2. 3. 4.

EMC Academic Alliance Training Material http://qatar.cmu.edu/~msakr/15319-s10/ http://strata.oreilly.com/2012/01/what-is-big-data.html http://www.slideshare.net/larsga/big-data-101-17253939

References
Cloud Computing & Big Data. Amjed Almousa, Ph.D.

You might also like