100 Interview Questions On Hadoop - Hadoop Online Tutorials
100 Interview Questions On Hadoop - Hadoop Online Tutorials
Home » Hadoop Common » Interview Questions » Hadoop Interview Questions for experienced and
freshers » 100 Interview Questions on Hadoop
5
HBase Interview Questions for experienced and freshers Hive Interview Questions Interview Questions
MapReduce Interview Questions Pig Interview Questions for experienced and freshers Sqoop Interview Questions and Answers and
c) Discarded hardware
6. Which of the following are example(s) of Real Time Big Data Processing? ( D)
a) OLTP Transactions
b) MapReduce wrapper
10. Which of the following are NOT true for Hadoop? (D)
a) HDFS
b) Map Reduce
c) HBase
a) ALWAYS True
d) ALWAYS False
a) TRUE
b) FALSE
a) 32 MB
b) 64 KB
c) 128 KB
d) 64 MB
a) 4
b) 1
c) 3
d) 2
16. Which of the following is NOT a type of metadata in NameNode? ( C)
a) List of files
a) Gossip protocol
b) Replicate protocol
c) HDFS protocol
19. NameNode tries to keep the first copy of data nearest to the client machine. ( C)
a) ALWAYS true
b) ALWAYS False
a) TRUE
b) FALSE
a) mapred-site.xml
b) yarn-site.xml
c) core-site.xml
d) hdfs-site.xml
a) True
b) False
23. Which of the following Hadoop config files is used to define the heap size? (C )
a) hdfs-site.xml
b) core-site.xml
c) hadoop-env.sh
d) Slaves
a) mapred-site.xml
b) hadoop-site.xml
c) core-site.xml
d) Masters
a) True
b) False
26. From the options listed below, select the suitable data sources for flume. ( D)
a) True
b) False
28. Which of following statement(s) are true about distcp command? (A)
a) Sink
b) Database
c) Source
d) Channel
31 .Which of the following can be used to control the number of part files ( B) in a map reduce program
output directory?
a) Number of Mappers
b) Number of Reducers
c) Counter
d) Partitioner
32. Which of the following operations can’t use Reducer as combiner also? (D)
a) Group by Minimum
b) Group by Maximum
c) Group by Count
d) Group by Average
d) In either phase
a) True
b) False
a) <=10 MB
b) >=250 MB
c) <=100 MB
d) <=35 MB
d) Input Splits
e) Input Format
39. Which of the following type of joins can be performed in Reduce side join operation? (E)
a) Equi Join
40. What should be an upper limit for counters of a Map Reduce job? (D)
a) ~5s
b) ~15
c) ~150
d) ~50
41. Which of the following class is responsible for converting inputs to key-value (c) Pairs of Map Reduce
a) FileInputFormat
b) InputSplit
c) RecordReader
d) Mapper
42. Which of the following writables can be used to know value from a mapper/reducer? (C)
a) Text
b) IntWritable
c) Nullwritable
d) String
a) True
b) False
44. Only one distributed cache file can be used in a Map Reduce job. (B)
a) True
b) False
a) Java
b) Ruby
c) Python
a) Programming Language
c) Query Language
d) Database
a) True
b) False
49. Pig jobs have the same run time as the native Map Reduce jobs. (B)
a) True
b) False
50. Which of the following is the correct representation to access ‘’Skill” from the (A)
a) $3.$1
b) $3.$0
c) $2.$0
d) $2.$1
51. Replicated joins are useful for dealing with data skew. (B)
a) True
b) False
52. Maximum size allowed for small dataset in replicated join is: (C)
a) 10KB
b) 10 MB
c) 100 MB
d) 500 MB
b) Shell Script
c) Command Line
d) Configuration File
a) ILLUSTRATE
b) DESCRIBE
c) DUMP
d) EXPLAIN
a) True
b) False
56. Data
can be supplied to PigUnit tests from: (C)
a) HDFS Location
b) Within Program
57. Which of the following constructs are valid Pig Control Structures? (D)
a) If-else
b) For Loop
c) Until Loop
58. Which of following is the return data type of Filter UDF? (C)
a) String
b) Integer
c) Boolean
a) True
b) False
a) Creating Tables
b) Creating Indexes
c) Creating Synonym
a) Task tracker
b) Job tracker
c) Combiner
d) Reducer
63. Which of the following are the Big Data Solutions Candidates? (E)
64. Hadoop is a framework that allows the distributed processing of: (C)
65. Where does Sqoop ingest data from? (B) & (D)
b) Oracle
c) HBase
d) MySQL
e) MongoDB
66. Identify the batch processing scenarios from following: (C) & (E)
67. Which of the following is not true about Name Node? (B)& (C) &(D)
d) Access Rights
a) File Location
b) mapred.map.tasks parameter
d) Input Splits
a) TRUE
b) FALSE
71. Which of the following are true for Hadoop Pseudo Distributed Mode? (C)
74. Which of the following is the highest level of Data Model in Hive? (c)
a) Table
b) View
c) Database
d) Partitions
a) Hours at least
b) Minutes at least
c) Seconds at least
d) Milliseconds at least
c) Are not useful if the filter columns for query are different from the partition columns
a) True
b) False
a) Sparse
b) Sorted Map
c) Distributed
d) Consistent
e) Multi- dimensional
81. Which
of the following is the outer most part of HBase data model ( A )
a) Database
b) Table
c) Row key
d) Column family
a) PigStorage
b) SqoopStorage
c) BinStorage
d) HbaseStorage
a) True
b) False
85. Which of the following APIs can be used for exploring HBase tables? (D)
a) HBaseDescriptor
b) HBaseAdmin
c) Configuration
d) HTable
86. Which of the following tables in HBase holds the region to key mapping? (B)
a) ROOT
b) .META.
c) MAP
d) REGIONS
a) INT
b) LONG
c) STRING
d) DATE
a) INT
b) STRING
c) BYTE
d) BYTE[]
a) Block Cache
b) Memstore
c) HFile
d) WAL
91. The application master monitors all Map Reduce applications in the cluster (B)
a) True
b) False
92. HDFS Federation is useful for the cluster size of: (C)
a) >500 nodes
b) >900 nodes
95. Managed tables don’t allow loading data from other tables. (B)
a) True
b) False
96. External tables can load the data from warehouse Hive directory. (A)
a) True
b) False
98. Partitioned tables can’t load the data from normal (partitioned) tables (B)
a) True
b) False
a) Table in Metastore DB
b) Table in HDFS
c) Directories in HDFS
A. As soon as at least one mapper has finished processing its input split.
Answer: C
Explanation:
In a MapReduce job reducers do not start executing the reduce method until the all Map jobs have completed.
Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The
programmer defined reduce method is called only after all the mappers have finished.
Note: The reduce phase has 3 steps: shuffle, sort, and reduce. Shuffle is where the data is collected by the
reducer from each mapper. This can happen while mappers are generating data since it is only a data transfer.
On the other hand, sort and reduce can only start once all the mappers are done.
Why is starting the reducers early a good thing? Because it spreads out the data transfer from the mappers to
the reducers over time, which is a good thing if your network is the bottleneck.
Why is starting the reducers early a bad thing? Because they “hog up” reduce slots while only copying data.
Another job that starts later that will actually use the reduce slots now can’t use them.
We can customize when the reducers startup by changing the default value of
mapred.reduce.slowstart.completed.maps in mapred-site.xml. A value of 1.00 will wait for all the mappers to
finish before starting the reducers. A value of 0.0 will start the reducers right away. A value of 0.5 will start the
reducers when half of the mappers are complete. You can also change
mapred.reduce.slowstart.completed.maps on a job-by-job basis.
Typically, keep mapred.reduce.slowstart.completed.maps above 0.9 if the system ever has multiple jobs
running at once. This way the job doesn’t hog up reducers when they aren’t doing anything but copying data. If
we have only one job running at a time, doing 0.1 would probably be appropriate.
A. The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the
client. The client reads the data directory off the DataNode(s).
B. The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly
to the client. The client reads the data directly off the DataNode.
C. The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for
block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the
DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode.
D. The client contacts the NameNode for the block location(s). The NameNode contacts the DataNode that holds
the requested data block. Data is transferred from the DataNode to the NameNode, and then from the
NameNode to the client.
Answer: C
103. When You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys,
IntWritable values. Which interface should your class implement?
A.
Combiner <Text,Text, IntWritable, IntWritable>
A. Combiner <Text, Text, IntWritable, IntWritable>
Answer: B
4. Indentify the utility that allows you to create and run MapReduce jobs with any executable or script as the
mapper and/or the reducer?
A. Oozie
B. Sqoop
C. Flume
D. Hadoop Streaming
E. mapred
Answer: D
5. How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of
MapReduce?
A. Keys are presented to reducer in sorted order; values for a given key are not sorted. B. Keys are presented to
reducer in sorted order; values for a given key are sorted in ascending order.
C. Keys are presented to a reducer in random order; values for a given key are not sorted.
D. Keys are presented to a reducer in random order; values for a given key are sorted in ascending order.
Answer: A
106. Assuming default settings, which best describes the order of data provided to a reducer’s reduce method
A. The keys given to a reducer aren’t in a predictable order, but the values associated with those keys always
are.
B. Both the keys and values passed to a reducer always appear in sorted order.
D. The keys given to a reducer are in sorted order but the values associated with each key are in no predictable
order
Answer: D
Share this:
Tweet
About Siva
Senior Hadoop developer with 4 years of experience in designing and architecture solutions for
the Big Data domain and has been involved with several complex engagements. Technical
strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie,
Falcon, Kafka, Storm, Spark, MySQL and Java.
View all posts by Siva →
Leave a comment
Your email address will not be published. Required elds are marked *
Visual Text
Name *
Email *
Website
PostComment
Dinesh Reply ↓
June 12, 2015 at 4:31 pm
Great work!! Keep going!! 🙂
Niranjana K R Reply ↓
March 3, 2018 at 7:47 pm
Great Job! Thanks!
Post navigation
← Hive Performance Tuning HBase Shell Commands in Practice →
Search
Custom Search Search
Core Hadoop
Big Data
Hadoop
Map Reduce
EcoSystem Tools
Hive
Pig
HBase
Impala
Contact Me
please reach out to us on [email protected] or +91-9704231873
Recent Comments
› Nithin George on HAR Files – Hadoop Archive Files
› Sujaan on Sqoop Interview Questions and Answers for Experienced
› sajan on HDFS Web UI
› shyam on Sqoop Import Command Arguments
› Gazal on Tracing Logs Through YARN Web UI
Hadooptutorial.info
Contat Us
Call Us On : +91-9704231873
Mail Us On : [email protected]
Email ID
· © 2020 Hadoop Online Tutorials · Powered by · Designed with the Customizr theme ·