Big Data Past Questions
Big Data Past Questions
- Big DataTechno
&ng'c:ml
1
t. What are 5vs of Big Data? Explain with exarnples. Justify
hor.'; Distributed System plaSrs
a vital role in Ilig Data.
a [4+s]
/.,List the various tvpes of NOSQL ,Jatabase with their data
structure anri usage.
t8l
J. Let us assume that YouTube as a client is stili
using GFS. Irew 100 videos in youTube
are really popular and 10"000 of users are accessinfit
simultaneous. Horv do you thirrk
GFS handles this read scenario? Expiain this procesr ulorg
with GFS architecture.
4. what are the main components of lr{ap-Reduce Job? Explain in
brief Data Flow
of Map-Reduce Framework. what happens if DaL Node fails cluring write
technique
Process?
[4-t4*31
5. You are appointed by lv{inistry of Health as an Engineer to
suppoil its plan to perfbmr
data analytics oi'al1 governmental hospitalsirsing a .olumrar
!3ttn
HBase" Explain the architecture of HBase and discuss ho.i,.you
sosqi database ;
exploit its features in this scenario.
! ---o-' -"-- deploy
design and --'.. to fuJl1,
i9j
6 Explain the architecture of LUCEI.IE and_its role in a typical
text search appiication. Also,
Describe different analyzers available in Lucene in briei.
1
f6+51
Compare and Contrast the diffbrent mocle of Hadoop l-tnviroment
setup. Can we step up
l{adoop in the cloud? If so, horv and explain the scenarios .,viren
these diftbrent mode are
preferable?
[4* 4.1
Write shorl notes on:
[1x11
a) FunctionalProgramming
b) Fault tolerance in GFS
c) Recent Trends in Big Data
d) Mongo DB
*rkI
TRIBHUVAN TINIVERSITY Exam.
INSTITUTE OF ENGINEERING
Examination Control Division i
n-rogou*me BEI,BCT Pass Marks ,, j
TRIBHUVAN I.INIVERSITY
iififfi:
;".;J- BE FuIl Marks 6u
INSTITUTE OF ENGINEERING
Examination Control Division Programme BCr * -.-lg:.U"#*", 1?
1' What do you know about the term "Big Data" and
what are the five V,s of Big Data?
How Big Data are helpful in increasing dusiness ,.r.n*?
[6+6]
2' Explain GFS Architecture. Why single master is not a bottleneck in GFS cluster?
[8+5]
3' Explain how MapReduce works with suitable example. Explain distributed execution
MapReduce with example. in
[6+6]
4' Explain Hbase with architecture. How can you model RDBMS table in Mango dB?
an example. Give
[4+6]
5' Explain the process of indexing and searching in
Lucene with proper diagrams.
18l
6. Explain various components of Hadoop in brief.
[10]
7. Write short notes on:
[3x5]
a) Elastic Search
b) CAP Theorem
c) JSON and XML
***
].{ATIONAL COLLEGE OF Ei{GNETRING
QUESTION BANK
Big Data Technotogies
(Elective II)(CT7 6501)
BCT IV/II
,r&,TK..r+;
ld*H ',,1r :i'""
. v ..- 1; ,n#id"-..
1
rs$ffistu
i 4
BDO15
l rillil liltllrJilllililt illlllll
TRIBHWANI.JNIVERSITY Exam. l{e sttl:tr / l}rrcli
INSTITUTE OF ENGINEERING Level BE Full Mnrks 80
Candidates are required to give their answers in their own words as far as practicable.
Attempt 4ll questions.
Thefigures in the margin indieate {ull Ma*s,
Assurne suitable data dnecessary.
Sabiecr:-BigDatql!9gh"q_l_9gl9lgk:ty!-!!!cIL{10.17)
./ Candidates are required to give their answers in their own words as far as practicable.
,/ Attempt All questions.
{ The fiSares in the margin indicate FullMarks.
./ Assume suitable data if necessary.
4. How GFS differ from other File Systems? List out five distinct differences. tsl
5. What is the main role of GFS Master during read and write processes? How data and
control messages flow in GFS architecture. Explain with suitable flow diagram. [10]
6. Map Reduce is the heart of Hadoop eco-system? Define work flow of Map reduce with
suitable examples. U 0I
7. Clock synchronizationin DFS may be the big challenge. How this clock synchronization
problem can be solved? [10]
8. Hbase, Cassandra and MongoDB are called column-oriented NoSQL database? How
row-oriented database differ from column-oriented database? Explain with suitable
examples. [10]
9. Write short notes on: [5x21
a) Scoop and fiume .
b) Zookeeper
c) Oozie
d) Pig and Hive
e) Client-Server and Master-Siave architecture
,1.**
*
35F TRIBHUVAN UNIVERSITY
INSTITUTE OF ENGINE ERING
Examination Control Division
207 4 Bhadra
Sub-j9c4-Bis-DajaTe9*hnotoE}!'|@^.!9
,/ Candidates are required to give their answers in their own words as far as practicable.
{ Attempt All questions,
,/ The figures in the margin indicate Fu!!_Ws.
,/ Assume suitable data if necessary.
l. a) Explain with example about the distributed system in Big Data. t8l
b) What is the role of Data Scientist? t4l
2, a) Explain the architecture of Google File System (GFS). t8l
b) What is availability and fault tolerance in Google File System? tsl
3. a) Explain in brief Data Flow technique of Map-Reduce Framework. t8l
b) What is Optimization and Data Locality in Map Reduce? t4l
4. Differentiate between structured and unstructured data and discuss the Taxonomy of
NoSQL. t8l
5. Explain the components of Indexing and searching. t8l
6. a) Explain in brief five daemons of Hadoop. t8l
b) What is the role of Hadoop Distributed File System in Hadoop? t4l
7 . Write short notes on: [5 x3]
i) Elastic Search
ii) Hbase Architecture
iii) Functional Programming
,'*
35 F TRIBHWAN UNIVERSITY
INSTITUTE OF ENGINEERING
Examination Control Division
2073 Magh tV/[ iTime
l. Why do we need data analytics process? Explain the role of Distributed computing in Big
data-
[s+s]
2. Why do we have large and fixed sized Chunks in GFS? What can be the demerits of that
design?
[10]
3. How is MapReduce library designed to tolerate different machines (map/reduce nodes)
failure wtrile executing MapReduce job? ll 0l
4. For following dablist the input toloutput from both the map and reduce functions for
getting marimum marks oof each co [10]
Student Name College Name Final Marks inYo
Ram ABC 70
Sita ABC 80
Hari ABC 60
Gita XYZ 90
Rita XYZ 80
Shyam PQN 90
Laxmi PQR 70
Gopal PQR 60
OR
What is the combiner function in mapreduce? Explain its purpose with suitable example. tlg]
5. Explain the term NO-SQL. Explain CAP theorem with suitable block diagram. t3+71
6. Describe the typical components involved in search application. tlg]
7. What are different daemons in HADOOP cluster? Explain each in details. [3+Z]
8. Write short notes on any two of following. [2x5]
a) Shadou' Master and Cluak services
b) Analyzers available in Lucene *..
c) Vertical and Horizontal Scalabiliby
*:f.*
*
35 F TRIBHUVAN I.JNIVERSITY i Exam.
INSTITUTE OF ENGINEERING
Examination Control Division
2073 Bhadra
l. What are the current trends in big data analytics? Wtrat are the technical c'hallenges and
characteristics of big data? lt 0l
2. Explain the GFS Architecture. Why single master is n,ct a bottleneck in GF S c luster. [s+5]
3. How does MAP-REDUCE work? Explain each step vrdth suitable example. Is+s]
4. Discuss the architecture of Hbase in short. Explair.r eventual consistency an d tunable
consistency in c.ontext of Cassandra. ll0l
5. Explain LUCRNE architecture and its data indexing approach. tl0l
6. What are the components of Hadoop? Explain each irr bricf. I l0]
7. How do you find max and min occurrence of ttre woi'ds in a given text do,cument.
Explain. tlo]
8. Write short notes on: (any two) [2x5]
a) CAP theorem
b) Role of Data Scientist in Big data
c) Amazon cloucl
,l. r$ t
35F I-RIBHUVAN LINIVERSi'IY
INSTII'UTE OF ENGINEERING
Examination control Division -s;i,
2074 Magh
ip;A;,,,- eci
s
" lj
g c!a __Bjs- D:ta re ch"9
! 9;
i;
"
; r;,r,s ii,
ici1 I |iti i i)
:;
1' How big data differ from traditional data? List out fir,e distinct differences?
2' what are the sources of structured, semi-structured and un-srructured [1+5]
data in real-world? 0]
3' Define DFS' How client writes data in HDFS? Expiain ri.ith [1
help of suitable block
diagram.
[1 0]
4. Clock synchronization in DFS rnay be the
big challenge. Hou, tiris clock sl,nchronization
problem can be solved?
[1+s]
5. I{or.r, data and control messases flow in GFS architecture. Explain ri.ith suitable flow
diagram.
[10]
6. How GFS provides fault tolerance. How it allows tolerating chunk servers
failures? [10]
7. How u,ord count job is performed for the following file in HDFS
using a i\4ap_reduce
flow chart?
[10]
File.txt(fi le size: 200MB)
Hi how are you
How is your job
How is your family
How is your brother
How is your sister
What is the time now
What is the strength of Hadoop
8' What are the differences between row and column oriented
cassandra and MongoDB are called columa oriented
database? why Hbase,
NoSeL database?
'V/rite Ii0]
9. short notes on:
[4"2]
i) Zookeeper and Oozie
ii) Pig and Hive
***
TRIBHWAN I'NIVERSITY Exam. Resular
INSTiTUTE OF ENGINEERING Level BE Full Marks BO
1. What are the technical challenges and characteristics of a big data? \\4ro are thq data
scientists, list out their roles and skills. [6+6]
2. With diagram, explain generalarchitecture of Google File System. [10]
OR
a) Why do we have single master in a GFS and millions of chunk serers? t4l
b) A cluster contains 1500 machines, each having 500G8 disc capacity. Calcuiate
approximate the number of the chunck ser\rers, the blocks and the total available size
if default chunck replica is 3 and 5 respectively. l6l
3. a) What is a map reduce? Expiain the erecution overview of the map reduce.
'
t6l
b) Draw the output of mapreduce of the following lines: t4l
"big users big voiume data cloud contributes bid data"
"facebook has big users facebook operates big data"
4. a) Explain a CAP rheorem. tsl
b) Differentiate between a RDBMS and a NoSQL Databases. t3l
5. Explain taxonomy of aNoSQL databases. Explain Cassendra database in brief. [10]
OR
Using a'MongoDB database,
a) Create a collections named "posts", insert following records: 13l
title: MongoDB, description: N4ongoDB is a NoSQL database, by: Tom,
Comments: We use MongoDB for unstructured data, likes: 100
i) Now write a query to search title of the post written by Tom. t3]
ii) Write mapReduce fuhction to count number olposts created by, r'arioris users. ' l4l
6. What is the Lucene? Dcscribe the typieai components involved in the search application. ti0]
7, Explain various components of Hacloop in brief. , t10]
8. Write short notes on: (any two) l5x2l
r) . CombinelFuncti_ons
ii) Fault tolerant systems
iii) JSoN
iv) Unstructured data
2.'
15F I'IilBHiiVA\ trNtVEllSI I Y Irlant.
Nsl'I]'ti i'li OF ENGNEhR lN( i Lc'.'c! L1 I Full I{arks E0
Ex'r mination Contlo! l)iyision !) rog re rn nr e RpY qrl]- ['sss \{a rr-:-.: ] l
iiiTZ Iiagh \ eli',' !''.rrt ilfirc ) itii.
:. '..: n3-r tS 1"3:: :-ta::",'l-,ci.,'d:sllibuted s1'siems hclp iJ j-i,e u.; ijig fi..,a a,r.l.eii..l
2. Explain how master implernents garbage collection and detects stale replica in a GFS.
3. \\tr1'do rve have iarge and fixed sized Chunks in a GFS? What are the dcmerits of that
design? li0l
4. Hor.r'a MapReduce Iibrary dcsigned to tolerate different machines (map/reduce nodes)
tailure while exccuring iviapxeciuce job? t8l
5. For following dat4 list the input to/output liom botir the map and rcduce functions for
getting maxim um marks of each collese [ 10]
Students Name Collese \amc Final Marks in 96
Ram ABC 70
Sita ABC 80
Hari ABC 60
Gita XYZ i qO
futa AYZ 80
Shyam PQR 90
Laxmi PQR 70
Gopal PQR 60
OR
What is the combiner function in mapreduce? Explain its purpose with suitable example.
6. What is the difference betrveen a structured and unstructured data. Explain the eventual
consistency and tunable consistency in context of Cassandra. [ 10]
7. \\trat is an elastic sealch? trxpiain..,a!:ic,.-ts tvnes of al:l.,,zers ra t ot
lLTo I
6. Whar are the componenrs of the iia.loop? For a hadoop cluster with i28 MB block size,
hor,v rqaly ma1lpets rrri ll lrqrlnn^ :::;::f,,,:-^ t-.--'...']:llt ;=1,=ting ri:,appcr fUnCiiOr-r On 1
GB nf data Justilv r*,ith exnlaraticn. l1 0_r
./'
ll/
a t)