Big Data Technologies_PGDBDA_Feb20
Big Data Technologies_PGDBDA_Feb20
Q.4 The clause used to limit the number of rows returned by a query is
a. MAXROW b. RESTRICT c. ROWNUM d. LIMIT
Q.5 Hive specific commands can be run from Beeline, when the Hive _______ driver is used.
a. ODBC-JDBC b. NONE c. ODBC d. JDBC
Q.7 The query "SHOW DATABASE LIKE 'h.*' ; gives the output with database name
a. ending with h b. containing h in their name c. containing 'h. d. starting with h
Q.8 You can run Pig in interactive mode using the ______ shell.
a. FS b. NONE OF THE ABOVE c. Grunt d. HDFS
Q.11 The 2 default TBLPROPERTIES added by hive when a hive table is created is
a. hive_version and last_modified by b. last_modified_by and table_location
c. last_modified_by and last_modified_time d. last_modified_time and hive_version
1
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q.14 Which one of the following stores data?
a. Data node b. Master node c. Name node d. None of these
Q.15 Which of the following operator is used to view the map reduce execution plans ?
a. EXPLAIN b. STORE c. DUMP d. DESCRIBE
Q.17 __________ operator is used to view the step-by-step execution of a series of statements.
a. ILLUSTRATE b. DESCRIBE c. LOAD d. STORE
Q.18 _______ supports a new command shell Beeline that works with HiveServer2.
a. HiveServer4 b. none of the above c. HiveServer3 d. HiveServer2
Q.19 When writing data to HDFS what is true if the replication factor is three? (Choose 2 answers) Data is written to
DataNodes on three separate racks (if Rack Aware). The Data is stored on each DataNode with a separate file which
contains a checksum value. Data is written to blocks on three different DataNodes. The Client is returned with a
success upon the successful writing of the first block and checksum check.
a. 1 & 4 b. 2 & 3 c. 1 & 3 d. 3 & 4
Q.25 The Hadoop tool used for uniformly spreading the data across the data nodes is named −
2
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
a. Balancer b. Scheduler c. Reporter d. Spreader
Q.26 Which property is used to specify the block size of a file stored in HDFS?
Ans : dfs.blocksize
Q.30 Which component of HDFS is responsible for maintaining the namespace of the distributed filesystem?
Ans : NameNode
Q.31 What tool would work best for putting a file on your local filesystem into HDFS?
Ans : The Hadoop client
Q.33 What is the default number of map tasks for a Sqoop job?
Ans : Four map tasks by default
Q.37 What tool would work best for importing data from a relational database into HDFS?
Ans : Sqoop
3
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
D. The framework groups Reducer inputs by keys (since different mappers may have output the same
key) in sort stage
Q40. The need for data replication can arise in various scenarios like :
A. Replication Factor is changed C. DataNode goes down
B. Data Blocks get corrupted D. All of the mentioned
Q41. ________ is the slave/worker node and holds the user data in the form of Data Blocks.
A. DataNode B. NameNode C. Data block D. Replication
Q42. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to
the data as possible
A. DataNodes B. TaskTracker C. ActionNodes D. All of the mentioned
Q43 InputFormat class calls the ________ function and computes splits for each file and then sends them to the
jobtracker.
A. puts B. gets C. getSplits D. All of the mentioned
Q44. On a tasktracker, the map task passes the split to the createRecordReader() method on InputFormat to obtain
a _________ for that split.
A. InputReader B. RecordReader C. OutputReader D. None of the mentioned
Q45. The default InputFormat is __________ which treats each value of input a new value and the associated key is
byte offset.
A. TextFormat B. TextInputFormat C. InputFormat D. All of the mentioned
Q46. __________ controls the partitioning of the keys of the intermediate map-outputs.
A. Collector B. Partitioner C. InputFormat D. None of the mentioned
Q47. Output of the mapper is first written on the local disk for sorting and _________ process.
A. shuffling B. secondary sorting C. forking D. reducing
Q48. The __________ is a framework-specific entity that negotiates resources from the
A. ResourceManager B. NodeManager C. ResourceManager
D. ApplicationMaster E. All of the mentioned
Q50. The ____________ is the ultimate authority that arbitrates resources among all the applications in the system.
A. NodeManager B. ResourceManager C. ApplicationMaster D. All of the mentioned
Q51. The __________ is responsible for allocating resources to the various running applications subject to familiar
constraints of capacities, queues etc.
A. Manager B. Master C. Scheduler D. None of the mentioned
Q52. ZooKeeper allows distributed processes to coordinate with each other through registers, known as :
A. znodes B. hnodes C. vnodes D. rnodes
4
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q53. In Hive SerDe stands for
A. serialize and Desrialize B. serializer and Deserializer
C. Serialize and Destruct D. serve and destruct
Q57. In case of one large table and 2 small tables, for an optimized query performance
A. The largest one should be cached to memory and small ones should be streamed
B. The small Ones should be cached and large one should be streamed
C. All of the table should be cached
D. All the tables should be streamed.
Q60. Which of the following options most aptly explains the reason behind the creation of Map Reduce?
A. Need to increase the processing power of new hardware
B. Need to perform complex analysis of structured data
C. Need to increase the number of web users
D. Need to spread distributed computing
Q62. In the MapReduce framework, map and reduce functions can be run in any order. Do you agree and why?
A. Yes, because in functional programming, the order of execution is not important.
B. Yes, because the functions use KVP (key value pair)as input and output; order is not important.
C. No, because the output of map function is the input for the reduce function.
D. No, because the output of the reduce function is the input for the map function.
Q64. What role does the map function play in a word count query?
A. It sorts the words alphabetically and returns a list of the most frequently used words
B. It creates a list with each word as a key and the number of occurrences as the value.
C. It creates a list with each word as a key and every occurrence as value 1.
D. It returns a list with each document as a key and the number of words in it as the value.
5
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q65. The role of following layer is to absorb the huge inflow of data and sort it out in different categories.
A. Data sources B. Ingestion C. Security D. Visualization
Q68. Which utility allows you to create and run MapReduce jobs with any executable or script as the mapper and/or
the reducer?
A. Hadoop Streaming B. Oozie C. Flume D. Sqoop
Q69. To overcome from Namenode problem of Single point of failure, in Gen2 we have:
A. Standby Namenode B. Active Namenode
B. Secondary Namenode D. All of the above
Q70. To schedule the job's component tasks on the slaves, monitoring them and reexecuting the failed tasks, we
have:
A. Datanode B. Master C. Namenode and Datanode D. Slave
Q76. The syntax ALTER TABLE old_table_name RENAME TO new_table_name;is used for
A. Renaming an existing database
B. Renaming an existing field name
C. Renaming an existing table name with a new name
D. None of the above
6
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q78. The Pig Latin scripting language is not only a higher-level data flow language but also has
operators similar to :
A. SQL B. JSON C. XML D. All of the mentioned
Q79. A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the
JobTracker.
A. MapReduce B. Mapper C. TaskTracker D. JobTracker
Q81. _________ function is responsible for consolidating the results produced by each of the Map()
functions/tasks.
A. Reduce B. Map C. Reducer D. All of the mentioned
Q.86 _________ is the primary interface for a user to describe a MapReduce job to the Hadoop
framework for execution.
A. Map Parameters B. JobConf C. MemoryConf D. None of the mentioned
Q88. The need for data replication can arise in various scenarios like :
A. Replication Factor is changed B. DataNode goes down
C. Data Blocks get corrupted D. All of the mentioned
7
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q90. What is the Default Port of File system embedded in URI "fs.defaultFS" and HDFS web UI
"dfs.namenode.http.address"
A. 8020/9000 and 50060 B. 8010/9000 and 50070
C. 8020/9000 and 50070 D. 8000/9000 and 50060
Q.91) You are Hadoop Administrator and you have 25 Data node Hadoop Cluster and One Name node,
one secondary Name Node while you are copy file form local to HDFS. HDFS goes, into read-only mode
and errors out with “Name node is in safe mode” how to solve this issue
A. hdfs dfsadmin – safemode leave B. hdfs dfsadmin - safemode off C.
dfsadmin – safemode leave D. dfsadmin – safemode off
Q.92)You are running 100 Node Hadoop for holding Big Data cluster.What happened when a DataNode is
marked as dead in your Hadoop Cluster?
A) The NameNode informs the client which write the blocks that are no longer available; the client then
re-writes the blocks to a different DataNode.
B) The NameNode forces re-replication of all the blocks which were stored on the dead DataNode.
C) The set replication criteria of ttie Hadoop for files which had blocks stored on the dead DataNode is
reduced temporarily, until the dead DataNode is recovered and live to the cluster.
D) The next time a client submits job that requires blocks from the dead DataNode, the JobTracker
receives no heart beats from the DataNode.The JobTracker tells the NameNode that the DataNode is
dead,which triggers block re-replication on the cluster.
Q.94) Which of the following is right statement to read reads a file from HDFS by Client?
A. First the client requests the Name Node to get the block locations of the file. Then Name Node contacts
the Data Node that holds the requested data block. Data is transferred from the Data Node to the Name
Node,and then from the Name Node to the client.
B. The client queries all Data Nodes in parallel. The Data Node that contains the requested data responds
direcUy to the client. The client reads the data directly off the Data Node.
C. The client queries the Name Node for the block locations. The Name Node returns the block locations to
the client. The client reads the data directory off the Data Node's.
D. First the client requests the Name Node to get the block locations of the file. Then the Name Node queries
the Data Nodes for block locations. The Data Nodes respond to the Name Node,and the Name Node
redirects the client to the Data Node that holds the requested data blocks. The client then reads the data
directly off the Data Node.
Q.95) Hive is Hadoop data warehouse technology.By default,Where Hive table data is store in the HDFS
A. hdfs://namenode_server/user/hive/Warehouse
B. hdfs://namenode_server/hive/warehouse
C. hdfs:/ namenode_server/userslhivefwarehouse
D. hdfs://namenode_server/home/hlve/warehouse
8
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q.97)For your Hadoop Cluster You set the HDFS default block size to 128 MB. What is the mean it?
A. The block size of files in the cluster can be determined as the block is written.
B. The block size of files in the Cluster will all be multiples of 128MB.
C. The block size of files in the duster will all at least be 128MB.
D. The block size of files in the cluster will all be the exactly 128MB.
Q.98) You are copying a file in HDFS from your local computer which is smaller than a default block size
of HDFS. Which of the following statement is true? A. File Can span over multiple blocks.
B. File Cannot be stored in to the HDFS.
C. File Occupies the f'ull block's size.
D. File Occupies only the size it needs and not the full block.
Q.99) You are running Map Reduce Job at which point the reduce method of a given Reducer can be
called?
A. After all mappers have finished processing all records.
B. As soon as a mapper has emitted at least one record.
C. It depends on the Input Format of the job.
D. As soon as at least one mapper has finished processing its input split.
Q.100) In sort and shuffle phase of MapReduce, how are keys and values are passed to the reducers?
A. Keys are passed to reducer in sorted order; values for a given key are sorted in ascending order.
B. Keys are passed to a reducer in random order; values for a given key are not sorted.
C. Keys are passed to a reducer in random order; values for a given key are sorted in ascending order.
D. Keys are passed to reducer in sorted order; values for a given key are not sorted.
Q.101) Which of the following file use to control replication factor of HDFS?
A. hdfs-site.xml B. core-site.xml C. mapred-slte.xml D. yarn-slte.xml
Q.102) You have two Hive table Customer and Order you want to join both table by Customer ID which is
the most appropriate syntax to join both table
A. hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN
ORDERS o ON (c.ID = o.ID);
B. hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN ORDERS o ON (c.ID =
o.CUSTOMER_ID};
C. hive> SELECT c.ID,c.NAME,c.AGE,a.AMOUNT FROM CUSTOMERS c JOIN ORDERS 0 ON (o.CUSTOMER_ID
= o.ID);
D. hive> SELECT c.ID,c.NAME,c.AGE, a.AMOUNT FROM CUSTOMERS c JOIN ORDERS o ON (o.ID = c.ID);
9
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q.103) What is the role of the slave nodes in a Hadoop duster?
A. Slave node runs a TaskTracker,but only a fraction of the slave nodes run DataNode daemons.
B. Slave node runs a JobTracker and a Oa aNode daemon.
C. Slave node runs a DataNode daemon,but only a fraction of the slave nodes run TaskTrackers.
D. Slave node runs a TaskTracker and a DataNode daemon.
Q.104) You are Big Data consultant and your client want technology which allow to store random, Real
time read/write access to hundreds of terabytes of data in distributed, scalable, data Store. Which of the
following would you use?
A. Pig B. HBase C. Hive D. HDFS
Q.105) Your Hadoop cluster has 25 nodes with a total of 250 TB (10 TB per node) of raw disk space
allocated HDFS storage. Assuming Hadoop's default configuration, how much data will you be able to
store?
A. Approximately 90 TB B. Approximately 250TB
C. Approximately 83 TB D. Approximately 10TB
Q.109) In HBase Table creation Which of the following must be declared during the schema definition
creation?
A. Column names B. Column families C. Column data types D.Column definition
10
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q.111) Where Hbase stores data?
A. One filesystem per column family
B. As many filesystems as the number or regionServers
C. As a single filesystem available to all RegionServers
D. One filesystem per table.
Q.113) You are configuring your cluster to run HDFS and MapReduce v2(MRv2) on YARN. Which Two
daemons need to be installed on your cluster’s main nodes?
A. ResourceManager and DataNode B. ResourceManager DataNode
C. TaskTracker NameNode D. Job Tracker NameNode
Q.114) What is the Right steps to be followed to deploy a Big Data solution.
A. Data Ingestion -> Data Processing -> Data Storage
B. Data Ingestion -> Data Storage -> Data Processing
C. Data Storage -> Data Ingestion -> Data processing
D. Data Storage -> Data Processing -> Data Ingestion
Q.115) Pig Is a
A. Query Language B. Programming Language C. Database D. Data Flow Language
Q.118) You have single NameNode Hadoop duster what is the function performed by the Secondary
NameNode daemon.
A. The Secondary NameNode servers as alternate data channel for clients to reach HDFS, should the
NameNode become too busy.
B. The Secondary NameNode is standby NameNode, ready to failover and provide high availability.
C. The Secondary NameNode perfonms deal-time backups of the NameNode.
D. The Secondary NameNode performs a checkpoint operation on the files by the NameNode.
Q.119) Which of the following characteristic e>f big data is relatively more concerned to data science?
A. Variety B. Velocity C. Volume D. None of the Mentioned
11
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
A. The file's contents can be modified by the owner, but no-one else
B. The file cannot be run as a MapReduce job
C. The file cannot be deleted by anyone but the owner can
D. The file cannot be deleted by anyone
Q.124) In PIG to read data which of the function you will use?
A. LOAD B. WRITE C. READ D. STORE
Q.126) You want to run Pig in testing environment. Which command is used to start Pig?
A Pig-x local. B. Pig. C. Pig MapReduce. D. Pig-x map.
12