Mid-Term Exam (30%) PROFESSOR: Oussama Derbel SECTION: 11112
Mid-Term Exam (30%) PROFESSOR: Oussama Derbel SECTION: 11112
Tarandeep singh
_______________________________________________________________________________________
_
Student’s Name
5354120
_______________________________________________________________________________________
_
Student’s ID Number
2022-01-03
_______________________________________________________________________________________
Date
This Exam paper should be uploaded on Omnivox via Lea (No Mio)
Exercise 1 (40%):
a- What is Data ?
Ans- On a computer, data is information that is translated into a form that
works well for movement or processing. With regard to modern computers
and transmission media, data is information that is converted into a digital
binary form. It is acceptable for data to be used as a singular or plural topic.
Raw data is a term used to describe data in its basic digital format.
b- What is Big Data?
Ans- Big data refers to large, diverse sets of information growing at ever-
increasing prices. It covers the amount of information, speed or speed at
which it is built and collected, as well as the variety or scope of data points
to be combined.
c- What is information?
Ans- Big data involves managing data sets that are so large and
sophisticated that software processing software is not enough to capture,
filter, manage, and process data over a reasonable amount of time. Big
data can be used to predict and analyze user behavior.
d- What is Hadoop?
Ans- Apache Hadoop is an open source framework used to store and
process large data sets ranging from gigabyte to petabytes of data. Instead
of using a single large computer to store and process data, Hadoop allows
multiple computers to analyze large data sets for faster compliance.
e- List the 5 components of the Hadoop ecosystems and briefly describe the
functionality of each component:
Ans- Following are the components that collectively form a Hadoop
ecosystem:
HDFS: Hadoop Distributed File System.
YARN: Yet Another Resource Negotiator.
MapReduce: Programming based Data Processing.
Spark: In-Memory data processing.
PIG, HIVE: Query based processing of data services.
2|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________
Exercise 3 (70%):
3|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________
4|Page