0% found this document useful (0 votes)
8 views

Chapter 1

The document consists of a series of questions related to Big Data, Hadoop, and various processing models such as MapReduce, Apache Spark, and Stream processing. It covers topics including the characteristics of Big Data, the benefits of YARN in Hadoop, and the suitability of different processing patterns for various applications. The questions aim to test knowledge on the technical aspects and functionalities of these technologies.

Uploaded by

ibgamal26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Chapter 1

The document consists of a series of questions related to Big Data, Hadoop, and various processing models such as MapReduce, Apache Spark, and Stream processing. It covers topics including the characteristics of Big Data, the benefits of YARN in Hadoop, and the suitability of different processing patterns for various applications. The questions aim to test knowledge on the technical aspects and functionalities of these technologies.

Uploaded by

ibgamal26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

CHAPTER1

Question 1:

What is Big Data?

a) Data with small scale and limited distribution.

b) Data that doesn't require new technical architectures.

c) Data that requires new technical architectures and analytics.

d) Data with a simple structure.

Question 2:

What is the primary benefit of YARN in Hadoop 2?

a) Lowering data storage costs.

b) Enabling new processing models in Hadoop.

c) Enhancing data security.

d) Improving batch processing.

Question 3:

Which processing framework is known for its support of interactive analysis in Hadoop?

a) MapReduce.

b) Apache Spark.

c) Apache Flink.

d) B and c.

Question 4:

What is stream processing essential for?


a) Long-term data storage.

b) Lower latency and quick responses.

c) Offline data analysis.

d) Batch processing.

Question 5:

Which processing pattern is suitable for machine learning algorithms? a) Batch processing.

b) Interactive SQL.

c) Stream processing.

d) Search.

Question 6:

Which characteristic of Big Data refers to its ability to predict future events?

a) Scale.

b) Distribution.

c) Diversity.

d) Timeliness.

Question 7:

What are the main components of big data?


A. HDFS
B. MapReduce
C. YARN
D. All of the above
Question 8:

Data in ____ bytes size is called big data


a) Meta
b) Giga
c) Tera
d) Peta

Question 9:

Transaction of data of the bank is a type of.


A. Unstructured data
B. Structured data
C. Both a and b
D. None of the above

Question 10:

The total forms of big data is ____


a) 1
b) 2
c) 3
d) 4

Question 11:

In which language is Hadoop written?


A. C++
B. Java
C. Rust
D. Python

Question 12:
___________ is a collection of data that is used in volume, yet growing exponentially
with time
a) Big Database
b) Big DBMS
c) Big Datafile
d) Big Data

Question 13:

Choose the primary characteristics of big data among the following


A. Value
B. Variety
C. Volume
D. All of the above

Question 14:

Identify the different features of Big Data Analytics.


A. Open-source
B. Data recovery
C. Scalability
D. All of the above

Question 15:

Among the following options choose the one which depicts the correct reason why
big data analysis is difficult to optimize.

a) The technology to mine data


b) Both data and cost-effective ways to mine data to make business sense out of it
c) Big data is not difficult to optimize
d) None of the above

Question 16:
What is the primary use case for MapReduce?

• A) Real-time processing
• B) Interactive analysis
• C) Batch processing
• D) Stream processing

Question 17:

Which of the following is suitable for real-time processing?

A) MapReduce

B) Apache Spark

C) Apache Flink

D) Batch processing

Question 18:

In which execution model is Apache Flink most proficient?

A) Batch processing

B) Near real-time processing

C) Real-time processing

D) Interactive analysis

Question 19:
Which processing model is not supported by MapReduce?

A) Real-time processing

B) Interactive analysis

C) Batch processing

D) Stream processing

Question 20:

What is the primary use case for Apache Spark?

A) Batch processing

B) Interactive analysis

C) Real-time processing

D) Stream processing

Question 21:

Which processing model is suitable for iterative processing?

A) MapReduce

B) Apache Spark

C) Apache Flink

D) Batch processing

Question 23:
Which processing model provides native support for iterative processing?

A) MapReduce

B) Apache Spark

C) Apache Flink

D) Real-time processing

Question 24:

What is the primary use case for Stream processing?

A) Processing large batch data

B) Low-latency response for queries

C) Exploratory data analysis

D) Indexing documents

Question 25:

What introduced the capability for different processing models in Hadoop?

A) Hadoop Distributed File System (HDFS)

B) YARN (Yet Another Resource Negotiator)

C) MapReduce

D) Hive
Question 26:

Which of the following processing models is not suitable for interactive analysis?

A) MapReduce

B) Real-time processing

C) Apache Flink

D) Interactive analysis

Question 27:

Which processing pattern allows for low-latency responses to SQL queries on Hadoop?

A) Batch processing

B) Interactive SQL

C) Iterative processing

D) Stream processing

Question 28:

n which processing pattern is it more efficient to hold intermediate working sets in


memory?

A) Batch processing

B) Stream processing

C) Iterative processing

D) Search
Question 29:

Which processing pattern is suitable for running real-time distributed computations on


unbounded data streams?

A) Interactive SQL

B) Iterative processing

C) Stream processing

D) Search

Question 30:

What type of processing pattern is associated with the use of Solr on a Hadoop cluster
for indexing and search?

A) Interactive SQL

B) Iterative processing

C) Stream processing

D) Search

Question 31:

What is the primary use case for the Iterative processing pattern?

A) Real-time processing

B) Large batch data analysis


C) Exploratory data analysis

D) Low-latency search queries

Question 32:

In which processing pattern does data exploration play a significant role?

A) Interactive SQL

B) Iterative processing

C) Stream processing

D) Search

Question 33:

What is the primary use case for Stream processing?

A) Large batch data analysis

B) Low-latency response for queries

C) Real-time processing

D) Exploratory data analysis

Question 34:

In which processing pattern does Storm, Spark Streaming, or Samza play a role?

A) Interactive SQL

B) Iterative processing

C) Stream processing

D) Search
Question 35:

What is the benefit of using a distributed query engine in the Interactive SQL pattern?

A) High-latency responses

B) Low-latency responses

C) Batch processing capabilities

D) Stream processing support

Question 36:

What is the primary strength of a Relational Database Management System (RDBMS)?

A) Real-time point queries

B) Batch processing of the entire dataset

C) Low-latency retrieval and updates

D) Continuously updated datasets

Question 37:

Which type of data is best suited for Hadoop's schema-on-read approach?

A) Structured data

B) Semi-structured data

C) Unstructured data

D) Relational data
Question 38:

What is a major advantage of using MapReduce for analyzing web server logs?

A) High-level of data normalization

B) Efficient data loading phase

C) Capability for nonlocal operations

D) Ability to perform joins easily

Question 39:

Why are web server log files well suited for analysis with Hadoop?

A) They are highly normalized

B) They are in a structured format

C) They are large and continuously updated

D) They contain detailed relational data

Question 40:

How does the scalability of MapReduce compare to SQL queries?

A) MapReduce scales linearly with the data size and cluster size

B) SQL queries scale linearly with the data size but not cluster size

C) MapReduce and SQL queries scale linearly with cluster size

D) SQL queries scale linearly with the data size but not cluster size
Question 41:

In what direction are Hadoop systems like Hive evolving with respect to features?

A) They are moving towards becoming batch processing systems

B) They are becoming more like traditional RDBMS with indexes

C) They are eliminating the need for data indexing

D) They are abandoning schema-on-read approach

Question 42:

What characterizes semi-structured data?

A) It has a defined format and strict schema

B) It is used only as a guide to data structure

C) It is highly normalized and structured

D) It is suitable for high-speed streaming reads

Question 43:

Why does MapReduce suit applications where data is written once and read many times?

A) It supports high-speed data loading

B) It allows for non-local operations

C) It works well with continuously updated datasets

D) It scales linearly with data size


Question 44:

When is a relational database a good choice for data analysis?

A) When the dataset is large and continuously updated

B) When low-latency retrieval and updates are required

C) When schema-on-read is preferred

D) When batch processing of the entire dataset is needed

Question 45:

What distinguishes Hadoop from Grid Computing with respect to data flow management?

A) Hadoop uses low-level programming for data flow management

B) Grid Computing employs high-level programming for data flow

C) Hadoop is based on explicit management of check pointing and recovery

D) Grid Computing is managed by the MapReduce processing engine

Question 46:

How does Hadoop conserve network bandwidth compared to Grid Computing?

A) By using low-level programming for network topology modeling

B) By relying on a shared-nothing architecture

C) By explicitly managing check pointing and recovery

D) By co-locating data with compute nodes


Question 47:

What is one of the main reasons for Hadoop's good performance?

A) High CPU utilization

B) Expensive resources

C) Data locality

D) Check pointing and recovery

Question 48:

In a distributed computation, what is the most challenging aspect related to process


coordination?

A) Network topology modeling

B) Detecting failed tasks

C) Handling partial failure gracefully

D) Shared-nothing architecture

Question 49:

What architecture is MapReduce based on, making it easier for programmers to handle failure?

A) Shared-everything

B) Shared-something

C) Shared-nothing

D) Shared-all
Question 50:

What distinguishes MPI programs from MapReduce in terms of check pointing and recovery?

A) MPI programs rely on the MapReduce system for recovery

B) MPI programs explicitly manage their own check pointing

C) MapReduce programs have more control over recovery

D) MPI programs are easier to write than MapReduce programs

Question 51:

What is one of the advantages of using a shared-nothing architecture, as seen in MapReduce?

A) Greater dependence on network bandwidth

B) Improved network topology modeling

C) Easier management of data co-location

D) Reduced need for high CPU utilization

You might also like