0% found this document useful (0 votes)

27 views

Hadoop Interview Questions

Uploaded by

babjeereddy

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Hadoop Interview Questions

Uploaded by

babjeereddy

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

1 Why is HDFS fault-tolerant?

HDFS is fault-tolerant because it replicates data on different DataNodes. By default, a block of data is
replicated on three DataNodes. The data blocks are stored in different DataNodes. If one node crashes,
the data can still be retrieved from other DataNodes.

2 What is Name Node

NameNode is the master service that hosts metadata in disk and RAM. It holds information about the
various DataNodes, their location, the size of each block, etc.

3 If you have an input file of 350 MB, how many input splits would HDFS
create and what would be the size of each input split?
By default, each block in HDFS is divided into 128 MB. The size of all the blocks, except the last block,
will be 128 MB. For an input file of 350 MB, there are three input splits in total. The size of each split is
128 MB, 128MB, and 94 MB.

4 What is rack awareness work in HDFS?

HDFS Rack Awareness refers to the knowledge of different DataNodes and how it is distributed across
the racks of a Hadoop Cluster

5 How do you copy data from the local system onto HDFS?
copyFromLoacal or put command

6 What role do RecordReader in a MapReduce operation?

This communicates with the InputSplit and converts the data into key-value pairs suitable for the mapper
to read.

7 What is Combiner
This is an optional phase; it is like a mini reducer. The combiner receives data from the map tasks, works
on it, and then passes its output to the reducer phase.

8 Name some Hadoop-specific data types that are used in a MapReduce

program.

IntWritable

FloatWritable

LongWritable

DoubleWritable
BooleanWritable

9 What are the major configuration parameters required in a MapReduce

program?
Input location of the job in HDFS

Output location of the job in HDFS

Input and output formats

Classes containing a map and reduce functions

JAR file for mapper, reducer and driver classes

10 Can we have more than one ResourceManager in a YARN-based cluster?

Yes, Hadoop v2 allows us to have more than one ResourceManager. You can have a high availability
YARN cluster where you can have an active ResourceManager and a standby ResourceManager, where
the ZooKeeper handles the coordination.

11 What are the different components of a Hive architecture?

User Interface

Metastore

Compiler

Execution Engine

12 What is the difference between an external table and a managed table in

Hive?

External tables in Hive refer to the data that is at an existing location outside the warehouse
director

internal table, these types of tables manage the data and move it into its warehouse directory by
default

Hive deletes the metadata information of a table and does not change the table data present in HDFS
If one drops a managed table, the metadata information along with the table data is deleted from the Hive
warehouse directory

13 What is a partition in Hive and why is partitioning required in Hive

Partition is a process for grouping similar types of data together based on columns or partition keys. Each
table can have one or m

14 What is bucketing in Hive ?

The bucketing in Hive is a data-organising technique. It is used to decompose data into more manageable
parts, known as buckets, which in result, improves the performance of the queries

15 What are the key differences between Hive and Pig?

Hive uses a declarative language, called HiveQL, which is similar to SQL for reporting.

PigUses a high-level procedural language called Pig Latin for programming

16 What are the different ways of executing a Pig script?

Grunt shell

Script file

17 different complex data types in Pig.

Tuple

Bag

Map
18 What are the relational operators in Pig?
COGROUP

CROSS

FOREACH

JOIN

LIMIT

SPLIT

UNION

ORDER

19 What is the use of having filters in Apache Pig?

FilterOperator is used to select the required tuples from a relation based on a condition. It also allows you
to remove unwanted records from the data file.

20 What are the key components of HBase?

Region Server

HMaster

ZooKeeper

21 What is Column families in Hbase?

Column families consist of a group of columns that are defined during table creation, and each column
family has certain column qualifiers that a delimiter separates.

22 Why do we need to disable a table in HBase

The HBase table is disabled to allow modifications to its settings

23 Can you import/export in an HBase table?

Yes using HBase import utility/ HBase export utility

24 Write the HBase command to list the contents of table

scan ‘table_name’

25 Write the HBase command to update the column families of a table.

alter ‘table_name’, ‘column_family_name’

26 What are the default file formats to import data using Sqoop?

The default Hadoop file formats are Delimited Text File Format and SequenceFile Format

27 Different cluster managers available in Apache Spark.

Standalone Mode
Apache Mesos
Hadoop YARN
Kubernetes

28 Types of operations supported by RDD

Transformations:
Actions

29 What is the function of filer()?

filer() function is used to develop a new RDD by selecting the various elements from the existing RDD,
which passes the function argument.

30 What is a SparkSession?

SparkSession is a unified entry point for reading data in Spark. Introduced in Spark 2.0

31 What is Lazy Evaluation in Spark.

Lazy evaluation in Spark means that the execution will not start until an action is triggered

32 What is a Parquet file in Spark?

Parquet is a columnar storage file format optimized for use with big data processing frameworks like
Apache Spark. It provides efficient data compression and encoding schemes with enhanced performance
to handle complex nested data structures.

33 What is DataFrame in Spark?

DataFrame is a distributed collection of data organized into named columns, similar to a table in a
relational database.

34 features of Spark Datasets.

Compile-time analysis
Faster Computation
Less Memory consumption
Query Optimization
Qualified Persistent storage
Single Interface for multiple languages

35 Difference between reduce() and take() function?

take() function is an action that takes into consideration all the values from an RDD to the local node.

reduce() function is an action that is applied repeatedly until the one value is left in the last.

Pressco Inc. Case Study
No ratings yet
Pressco Inc. Case Study
18 pages
BELIEVER - Imagine Dragons (Impressão)
88% (8)
BELIEVER - Imagine Dragons (Impressão)
2 pages
Bda 2
No ratings yet
Bda 2
15 pages
Big Data Analytics AAM Unit 5 (1)
No ratings yet
Big Data Analytics AAM Unit 5 (1)
28 pages
Hadoop Interviews Q
No ratings yet
Hadoop Interviews Q
9 pages
Big Data and Hadoop Guide
No ratings yet
Big Data and Hadoop Guide
8 pages
PE CS801A SampleQB2
No ratings yet
PE CS801A SampleQB2
6 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
5- HIVE
No ratings yet
5- HIVE
51 pages
Bda Unit 2
No ratings yet
Bda Unit 2
21 pages
Demystifying The Big Data Ecosystem... - Param Natarajan
100% (1)
Demystifying The Big Data Ecosystem... - Param Natarajan
8 pages
Module 2 Hadoop Eco System
No ratings yet
Module 2 Hadoop Eco System
13 pages
Hive Main 2
No ratings yet
Hive Main 2
26 pages
Module 5_data analytics
No ratings yet
Module 5_data analytics
4 pages
Unit - II
No ratings yet
Unit - II
64 pages
Hadoop Ecosystem and Their Components
No ratings yet
Hadoop Ecosystem and Their Components
19 pages
Ibiz Hive
No ratings yet
Ibiz Hive
27 pages
Module-2
No ratings yet
Module-2
23 pages
Da ANSWERS
No ratings yet
Da ANSWERS
13 pages
BDT Unit 2 Textbook
No ratings yet
BDT Unit 2 Textbook
20 pages
Module 2 CN
No ratings yet
Module 2 CN
23 pages
Big Data Analytics – Unit 4
No ratings yet
Big Data Analytics – Unit 4
32 pages
BDA Module 2
No ratings yet
BDA Module 2
40 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
6 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
CIA3 Answer
No ratings yet
CIA3 Answer
5 pages
bda unit 4-1
No ratings yet
bda unit 4-1
64 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
BDA Unit-4 Part-1 HDFS,MapReduce
No ratings yet
BDA Unit-4 Part-1 HDFS,MapReduce
76 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Fbda Unit-3
No ratings yet
Fbda Unit-3
27 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
8 pages
Explain in Detail About Hadoop Framework
No ratings yet
Explain in Detail About Hadoop Framework
4 pages
HADOOP
No ratings yet
HADOOP
40 pages
SQL and Nosql Programming With Spark
No ratings yet
SQL and Nosql Programming With Spark
63 pages
data_analytics_chapter_5
No ratings yet
data_analytics_chapter_5
14 pages
learn
No ratings yet
learn
16 pages
Syllabus:: Introduction To Hadoop (T1)
No ratings yet
Syllabus:: Introduction To Hadoop (T1)
23 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
HADOOP
No ratings yet
HADOOP
19 pages
Unit 2
No ratings yet
Unit 2
56 pages
Unit 3 & 4 big data
No ratings yet
Unit 3 & 4 big data
18 pages
Unit 2 - Hadoop PDF
No ratings yet
Unit 2 - Hadoop PDF
7 pages
BDA - II Sem - II Mid
100% (1)
BDA - II Sem - II Mid
4 pages
U-3 Big Data
No ratings yet
U-3 Big Data
23 pages
BDA Notes Unit-4
No ratings yet
BDA Notes Unit-4
86 pages
Module - 2 Half
No ratings yet
Module - 2 Half
12 pages
Bda Bi Jit Chapter-4
No ratings yet
Bda Bi Jit Chapter-4
20 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
Interview Questions - Hive and Querying
No ratings yet
Interview Questions - Hive and Querying
3 pages
Unit 3
No ratings yet
Unit 3
12 pages
BDA - Chapter-1-Components of Hadoop Ecosystem - Lecture 3
0% (1)
BDA - Chapter-1-Components of Hadoop Ecosystem - Lecture 3
38 pages
Big Data Hadoop
No ratings yet
Big Data Hadoop
11 pages
BDA Mod2@AzDOCUMENTS - in
No ratings yet
BDA Mod2@AzDOCUMENTS - in
64 pages
BDA Notes
No ratings yet
BDA Notes
25 pages
BDA- UNIT 3
No ratings yet
BDA- UNIT 3
41 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Idq Questions
No ratings yet
Idq Questions
2 pages
Database Overview
No ratings yet
Database Overview
101 pages
1 Load Data From Database Table To CSV File
No ratings yet
1 Load Data From Database Table To CSV File
43 pages
2023-IDA Custom Bootcamp Curriculum Day Wise Curriculum v0.1
No ratings yet
2023-IDA Custom Bootcamp Curriculum Day Wise Curriculum v0.1
122 pages
What Can You Do With Dataframes Using Pandas?: Pandas Is A High-Level Data Manipulation Tool Developed by Wes Mckinney
No ratings yet
What Can You Do With Dataframes Using Pandas?: Pandas Is A High-Level Data Manipulation Tool Developed by Wes Mckinney
10 pages
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
No ratings yet
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
32 pages
Datastage and Qualitystage Parallel Stages and Activities
No ratings yet
Datastage and Qualitystage Parallel Stages and Activities
154 pages
Informatica Power Center V 10.2
No ratings yet
Informatica Power Center V 10.2
4 pages
Oracle SQL: Program Duration: 7 Days. Contents
No ratings yet
Oracle SQL: Program Duration: 7 Days. Contents
11 pages
Datastage Transformer Functions
No ratings yet
Datastage Transformer Functions
21 pages
Python
No ratings yet
Python
4 pages
Fzzy Lookup Example
No ratings yet
Fzzy Lookup Example
1 page
Tableau
No ratings yet
Tableau
16 pages
Unix Examples
No ratings yet
Unix Examples
8 pages
Informatica
No ratings yet
Informatica
9 pages
Datastage
0% (1)
Datastage
9 pages
Unix MCQ
No ratings yet
Unix MCQ
12 pages
Spark Notes
No ratings yet
Spark Notes
6 pages
DW Basic + Unix
No ratings yet
DW Basic + Unix
31 pages
Scala
No ratings yet
Scala
11 pages
Informatica MDM Setup
No ratings yet
Informatica MDM Setup
17 pages
HRMS Business Intelligence (SAP BI) Power User Workshop Course Code: 01-04-BI01
No ratings yet
HRMS Business Intelligence (SAP BI) Power User Workshop Course Code: 01-04-BI01
1 page
Etl Informatica Training
No ratings yet
Etl Informatica Training
8 pages
What_Is_JavaScript[1]
No ratings yet
What_Is_JavaScript[1]
24 pages
Ethereal Bliss - shawl with lace pattern
No ratings yet
Ethereal Bliss - shawl with lace pattern
8 pages
M6A1
No ratings yet
M6A1
3 pages
In The Lands Tribunal of The Hong Kong Special Administrative Region
No ratings yet
In The Lands Tribunal of The Hong Kong Special Administrative Region
46 pages
Akses Materi Fiverr
No ratings yet
Akses Materi Fiverr
7 pages
CFP 4001 - Accounts Suggested Answers
No ratings yet
CFP 4001 - Accounts Suggested Answers
18 pages
The British Computer Society: Code of Good Practice
No ratings yet
The British Computer Society: Code of Good Practice
36 pages
Resume 4
No ratings yet
Resume 4
2 pages
Ather Brochure 450x
No ratings yet
Ather Brochure 450x
5 pages
How Can I Know My SSN When I Forget It - Google Search
No ratings yet
How Can I Know My SSN When I Forget It - Google Search
1 page
Vibration Analysis of Gear Box.5-Milosproko
100% (2)
Vibration Analysis of Gear Box.5-Milosproko
3 pages
(Ebook) Plant biotechnology: The genetic manipulation of plants by Adrian Slater, Nigel Scott, Mark Fowler ISBN 9780199254682, 0199254680 pdf download
100% (1)
(Ebook) Plant biotechnology: The genetic manipulation of plants by Adrian Slater, Nigel Scott, Mark Fowler ISBN 9780199254682, 0199254680 pdf download
59 pages
Adaa - Question Bank
No ratings yet
Adaa - Question Bank
3 pages
LESSON 2 Physical Fitness and Self Testing Activity
No ratings yet
LESSON 2 Physical Fitness and Self Testing Activity
26 pages
Sewing Projects To Make For Going Back To School PDF
100% (1)
Sewing Projects To Make For Going Back To School PDF
28 pages
Transcriere - Vehicul Înmatriculat Anterior În România: Direcția Generală Permise de Conducere Și Înmatriculări
No ratings yet
Transcriere - Vehicul Înmatriculat Anterior În România: Direcția Generală Permise de Conducere Și Înmatriculări
5 pages
Sitara-i-Imtiaz: Names Designations Tele
No ratings yet
Sitara-i-Imtiaz: Names Designations Tele
1 page
The Apple Cart
100% (1)
The Apple Cart
3 pages
Five Mile Slough - Project Study Report
No ratings yet
Five Mile Slough - Project Study Report
55 pages
Overall Shape and Dimensions: General
No ratings yet
Overall Shape and Dimensions: General
40 pages
42 Windsor Rev Legal Soc Issue
No ratings yet
42 Windsor Rev Legal Soc Issue
35 pages
BRTS MS Grill
No ratings yet
BRTS MS Grill
30 pages
Pocket-3D Manual
No ratings yet
Pocket-3D Manual
18 pages
Preformulation Testing of Solid Dosage Forms (Latest)
No ratings yet
Preformulation Testing of Solid Dosage Forms (Latest)
74 pages
Different Types of Leadership Styles
100% (1)
Different Types of Leadership Styles
9 pages
Sociology: Dr. James Loreto C. Piscos
No ratings yet
Sociology: Dr. James Loreto C. Piscos
20 pages
Fuels and Combustion
No ratings yet
Fuels and Combustion
3 pages
Art: Gillman Barracks
No ratings yet
Art: Gillman Barracks
2 pages

Hadoop Interview Questions

Uploaded by

Hadoop Interview Questions

Uploaded by

1 Why is HDFS fault-tolerant?

2 What is Name Node

4 What is rack awareness work in HDFS?

6 What role do RecordReader in a MapReduce operation?

8 Name some Hadoop-specific data types that are used in a MapReduce

9 What are the major configuration parameters required in a MapReduce

Output location of the job in HDFS

Input and output formats

Classes containing a map and reduce functions

JAR file for mapper, reducer and driver classes

10 Can we have more than one ResourceManager in a YARN-based cluster?

11 What are the different components of a Hive architecture?

12 What is the difference between an external table and a managed table in

13 What is a partition in Hive and why is partitioning required in Hive

14 What is bucketing in Hive ?

15 What are the key differences between Hive and Pig?

PigUses a high-level procedural language called Pig Latin for programming

16 What are the different ways of executing a Pig script?

17 different complex data types in Pig.

19 What is the use of having filters in Apache Pig?

20 What are the key components of HBase?

21 What is Column families in Hbase?

22 Why do we need to disable a table in HBase

The HBase table is disabled to allow modifications to its settings

23 Can you import/export in an HBase table?

Yes using HBase import utility/ HBase export utility

25 Write the HBase command to update the column families of a table.

alter ‘table_name’, ‘column_family_name’

27 Different cluster managers available in Apache Spark.

28 Types of operations supported by RDD

29 What is the function of filer()?

31 What is Lazy Evaluation in Spark.

32 What is a Parquet file in Spark?

33 What is DataFrame in Spark?

34 features of Spark Datasets.

35 Difference between reduce() and take() function?

You might also like