0% found this document useful (0 votes)
3 views

Big Data Technologies_PGDBDA_Feb20

This document is a question bank for a Big Data Technologies course at USM's Shriram Mantri Vidyanidhi Info Tech Academy. It contains multiple-choice questions covering various topics such as Hive, HDFS, MapReduce, and YARN. The questions assess knowledge on concepts, commands, and functionalities related to big data technologies.

Uploaded by

Gaurav Rahane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Big Data Technologies_PGDBDA_Feb20

This document is a question bank for a Big Data Technologies course at USM's Shriram Mantri Vidyanidhi Info Tech Academy. It contains multiple-choice questions covering various topics such as Hive, HDFS, MapReduce, and YARN. The questions assess knowledge on concepts, commands, and functionalities related to big data technologies.

Uploaded by

Gaurav Rahane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

USM’s Shriram Mantri Vidyanidhi Info Tech Academy

PG DBDA Feb 20 Big Data Technologies Question Bank


Q.1 What is the main problem faced while reading and writing data in parallel from multiple disks?
a. The hardware required to do this task is extremely costly.
b. Processing high volume of data faster.
c. Combining data from multiple disks.
d. The software required to do this task is extremely costly.

Q.2 what is Hive


a. relational database b. An language c. An open source data warehouse system d. OLTP

Q.3 Hive uses _________ for logging.


a. log4J b. log4i c. logj4 d. log4l

Q.4 The clause used to limit the number of rows returned by a query is
a. MAXROW b. RESTRICT c. ROWNUM d. LIMIT

Q.5 Hive specific commands can be run from Beeline, when the Hive _______ driver is used.
a. ODBC-JDBC b. NONE c. ODBC d. JDBC

Q.6 _________ operator is used to review the schema of a relation.


a. DUMP b. EXPLAIN c. STORE d. DESCRIBE

Q.7 The query "SHOW DATABASE LIKE 'h.*' ; gives the output with database name
a. ending with h b. containing h in their name c. containing 'h. d. starting with h

Q.8 You can run Pig in interactive mode using the ______ shell.
a. FS b. NONE OF THE ABOVE c. Grunt d. HDFS

Q.9 The purpose of starting namenode in the recovery mode is to


a. Recover a failed namenode
b. Recover data from one of the metadata storage locations
c. Recover data when there is only one metadata storage location
d. Recover a failed datanode

Q.10 In Hive SerDe stands for


a. serialize and Deserialize b. serve and destruct
c. serializer and Deserializer d. Serialize and Destruct

Q.11 The 2 default TBLPROPERTIES added by hive when a hive table is created is
a. hive_version and last_modified by b. last_modified_by and table_location
c. last_modified_by and last_modified_time d. last_modified_time and hive_version

Q.12 A View in Hive can be dropped by using


a. DROP TABLE b. DELETE VIEW c. DROP VIEW d. REMOVE VIEW

Q.13 When a file in HDFS is deleted by a user


a. it is lost forever
b. It becomes hidden from the user but stays in the file system
c. It goes to trash if configured.
d. File sin HDFS cannot be deleted

1
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q.14 Which one of the following stores data?
a. Data node b. Master node c. Name node d. None of these

Q.15 Which of the following operator is used to view the map reduce execution plans ?
a. EXPLAIN b. STORE c. DUMP d. DESCRIBE

Q.16 Each database created in hive is stored as


a. a file b. a HDFS book c. a jar file d. a directory

Q.17 __________ operator is used to view the step-by-step execution of a series of statements.
a. ILLUSTRATE b. DESCRIBE c. LOAD d. STORE

Q.18 _______ supports a new command shell Beeline that works with HiveServer2.
a. HiveServer4 b. none of the above c. HiveServer3 d. HiveServer2

Q.19 When writing data to HDFS what is true if the replication factor is three? (Choose 2 answers) Data is written to
DataNodes on three separate racks (if Rack Aware). The Data is stored on each DataNode with a separate file which
contains a checksum value. Data is written to blocks on three different DataNodes. The Client is returned with a
success upon the successful writing of the first block and checksum check.
a. 1 & 4 b. 2 & 3 c. 1 & 3 d. 3 & 4

Q.20 On dropping a managed table


a. The data gets dropped without dropping the schema
b. Both the schema and the data is dropped
c. The schema gets dropped without dropping the data
d. An error is thrown

Q.21 The main role of the secondary namenode is to


a. Periodically merge the namespace image with the edit log.
b. Copy the filesystem metadata from NFS stored by primary namenode
c. Monitor if the primary namenode is up and running.
d. Copy the filesystem metadata from primary namenode.

Q.22 Which of the following is not a Hadoop operation mode?


a. Fully-Distributed mode b. Stand alone mode
c. Pseudo distributed mode d. Globally distributed mode

Q.23 Point out the correct statement :


a. None of the mentioned
b. You can run Pig in either mode using the “pig” command
c. You can run Pig in interactive mode using the FS shell
d. You can run Pig in batch mode using the Grunt shell

Q.24 The partitioning of a table in Hive creates more Select One:


a. subdirectories under the database name
b. subdirectories under the table name
c. files under the table name
d. files under database name

Q.25 The Hadoop tool used for uniformly spreading the data across the data nodes is named −

2
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
a. Balancer b. Scheduler c. Reporter d. Spreader

Q.26 Which property is used to specify the block size of a file stored in HDFS?
Ans : dfs.blocksize

Q.27 What does the following command do?


hdfs dfs –ls
Ans : list /user/root

Q.28 True or False:


To input a file into HDFS, the client application passes the data to the NameNode,which then divides the data into
blocks and passes the blocks to the DataNodes.
Ans : False

Q.29 What is the default file-replication factor in HDFS?


Ans : 3

Q.30 Which component of HDFS is responsible for maintaining the namespace of the distributed filesystem?
Ans : NameNode

Q.31 What tool would work best for putting a file on your local filesystem into HDFS?
Ans : The Hadoop client

Q.32 True or False : HDFS is a processing engine?


Ans : False

Q.33 What is the default number of map tasks for a Sqoop job?
Ans : Four map tasks by default

Q.34 True or False : MAPREDUCE is a storage engine?


Ans : False

Q.35 View the Processes on the Cluster Nodes?


Ans : JPS

Q.36 What is the root user in Bigdata?


Ans : HDFS

Q.37 What tool would work best for importing data from a relational database into HDFS?
Ans : Sqoop

Q38. Which of the following phases occur simultaneously?


A. Shuffle and Sort B. Reduce and Sort C. Shuffle and Map D. All of the mentioned

Q39. Point out the wrong statement :


A. Reducer has 2 primary phases
B. Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the
cost of failures
C. It is legal to set the number of reduce-tasks to zero if no reduction is desired

3
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
D. The framework groups Reducer inputs by keys (since different mappers may have output the same
key) in sort stage

Q40. The need for data replication can arise in various scenarios like :
A. Replication Factor is changed C. DataNode goes down
B. Data Blocks get corrupted D. All of the mentioned

Q41. ________ is the slave/worker node and holds the user data in the form of Data Blocks.
A. DataNode B. NameNode C. Data block D. Replication

Q42. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to
the data as possible
A. DataNodes B. TaskTracker C. ActionNodes D. All of the mentioned

Q43 InputFormat class calls the ________ function and computes splits for each file and then sends them to the
jobtracker.
A. puts B. gets C. getSplits D. All of the mentioned

Q44. On a tasktracker, the map task passes the split to the createRecordReader() method on InputFormat to obtain
a _________ for that split.
A. InputReader B. RecordReader C. OutputReader D. None of the mentioned

Q45. The default InputFormat is __________ which treats each value of input a new value and the associated key is
byte offset.
A. TextFormat B. TextInputFormat C. InputFormat D. All of the mentioned

Q46. __________ controls the partitioning of the keys of the intermediate map-outputs.
A. Collector B. Partitioner C. InputFormat D. None of the mentioned

Q47. Output of the mapper is first written on the local disk for sorting and _________ process.
A. shuffling B. secondary sorting C. forking D. reducing

Q48. The __________ is a framework-specific entity that negotiates resources from the
A. ResourceManager B. NodeManager C. ResourceManager
D. ApplicationMaster E. All of the mentioned

Q49. Apache Hadoop YARN stands for :


A. Yet Another Reserve Negotiator B. Yet Another Resource Network C.
Yet Another Resource Negotiator D. All of the mentioned

Q50. The ____________ is the ultimate authority that arbitrates resources among all the applications in the system.
A. NodeManager B. ResourceManager C. ApplicationMaster D. All of the mentioned

Q51. The __________ is responsible for allocating resources to the various running applications subject to familiar
constraints of capacities, queues etc.
A. Manager B. Master C. Scheduler D. None of the mentioned

Q52. ZooKeeper allows distributed processes to coordinate with each other through registers, known as :
A. znodes B. hnodes C. vnodes D. rnodes

4
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q53. In Hive SerDe stands for
A. serialize and Desrialize B. serializer and Deserializer
C. Serialize and Destruct D. serve and destruct

Q56. The drawback of managed tables in hive is


A. they are always stored under default directory
B. they cannot grow bigger than a fixed size of 100GB
C. they can never be dropped
D. they cannot be shared with other applications

Q57. In case of one large table and 2 small tables, for an optimized query performance
A. The largest one should be cached to memory and small ones should be streamed
B. The small Ones should be cached and large one should be streamed
C. All of the table should be cached
D. All the tables should be streamed.

Q59. Which of the following services is provided by YARN?


A. Global resource management B. Record reader
B. MapReduce engine D. Data mining

Q60. Which of the following options most aptly explains the reason behind the creation of Map Reduce?
A. Need to increase the processing power of new hardware
B. Need to perform complex analysis of structured data
C. Need to increase the number of web users
D. Need to spread distributed computing

Q61. Which of the following describes the Map function?


A. It processes data to create a list of key-value pairs.
B. It indexes the data to list all the words occurring in it
C. It converts a relational database to key value pairs.
D. It tracks data across multiple tables and clusters in Hadoop

Q62. In the MapReduce framework, map and reduce functions can be run in any order. Do you agree and why?
A. Yes, because in functional programming, the order of execution is not important.
B. Yes, because the functions use KVP (key value pair)as input and output; order is not important.
C. No, because the output of map function is the input for the reduce function.
D. No, because the output of the reduce function is the input for the map function.

Q63. Which of the following describes the reduce function?


A. It analyzes the map function results to show the most frequently occurring values.
B. It combines the map function results to return a list of the best matches for the query.
C. It adds the results of the map function to convert the key value pair lists to columnar databases.
D. It processes map function results and creates a new key-value pair list to answer the query.

Q64. What role does the map function play in a word count query?
A. It sorts the words alphabetically and returns a list of the most frequently used words
B. It creates a list with each word as a key and the number of occurrences as the value.
C. It creates a list with each word as a key and every occurrence as value 1.
D. It returns a list with each document as a key and the number of words in it as the value.

5
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q65. The role of following layer is to absorb the huge inflow of data and sort it out in different categories.
A. Data sources B. Ingestion C. Security D. Visualization

Q66. Which of the following is not an example of NoSQL database?


A. HBase B. MongoDB C. AllegroGraph D. Oracle

Q67. Which of the following is an example of a non-relational database?


A. SQL B. Oracle C. Mongo D. SQL SERVER 2012

Q68. Which utility allows you to create and run MapReduce jobs with any executable or script as the mapper and/or
the reducer?
A. Hadoop Streaming B. Oozie C. Flume D. Sqoop

Q69. To overcome from Namenode problem of Single point of failure, in Gen2 we have:
A. Standby Namenode B. Active Namenode
B. Secondary Namenode D. All of the above

Q70. To schedule the job's component tasks on the slaves, monitoring them and reexecuting the failed tasks, we
have:
A. Datanode B. Master C. Namenode and Datanode D. Slave

Q71. Pig was developed by:


A. Yahoo B. Gmail C. Twitter D. Facebook

Q72. Which of the following operators is used for performing iteration?


A. FOREACH B. ASSERT C. FILTER D. GROUP

Q73. UDF stands for:


A. User Defined Function B. Unique Defined Function
C. Universal Disk Format D. Unique Definition of Function

Q74. Which mode of pig is also known as the Hadoop mode?


A. Local mode B. MapReduce mode C. Global mode D. universal mode

Q75. LOAD command in Pig is used for:


A. to read data from the file system B. display the results to your terminal screen
C. to save the results D. None of the above

Q76. The syntax ALTER TABLE old_table_name RENAME TO new_table_name;is used for
A. Renaming an existing database
B. Renaming an existing field name
C. Renaming an existing table name with a new name
D. None of the above

Q77. What is true about Big Data?

A. Hadoop ecosystem handles Big Data B. It is represented by 4 V's


C. It references OLAP system D. All of the above

6
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q78. The Pig Latin scripting language is not only a higher-level data flow language but also has
operators similar to :
A. SQL B. JSON C. XML D. All of the mentioned

Q79. A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the
JobTracker.
A. MapReduce B. Mapper C. TaskTracker D. JobTracker

Q80. Point out the correct statement :


A. MapReduce tries to place the data and the compute as close as possible
B. Map Task in MapReduce is performed using the Mapper() function
C. Reduce Task in MapReduce is performed using the Map() function D. All of the mentioned

Q81. _________ function is responsible for consolidating the results produced by each of the Map()
functions/tasks.
A. Reduce B. Map C. Reducer D. All of the mentioned

Q82. _________ is the default Partitioner for partitioning key space.


A. HashPar B. Partitioner B. HashPartitioner D. None of the mentioned

Q83. Input to the _______ is the sorted output of the mappers.


A. Reducer B. Mapper C. Shuffle D. All of the mentioned

Q84. Point out the wrong statement:


A. Reducer has 2 primary phases
B. Increasing the number of reduces increases the framework overhead, but increases load balancing
and lowers the cost of failures
C. It is legal to set the number of reduce-tasks to zero if no reduction is desired
D. The framework groups Reducer inputs by keys (since different mappers may have output the same
key) in sort stage

Q85. Which of the following phases occur simultaneously?


A. Shuffle and Sort B. Reduce and Sort C. Shuffle and Map D. All of the mentioned

Q.86 _________ is the primary interface for a user to describe a MapReduce job to the Hadoop
framework for execution.
A. Map Parameters B. JobConf C. MemoryConf D. None of the mentioned

Q87. Which of the following phases occur simultaneously?


A. Shuffle and Sort B. Reduce and Sort C. Shuffle and Map D.All of the mentioned

Q88. The need for data replication can arise in various scenarios like :
A. Replication Factor is changed B. DataNode goes down
C. Data Blocks get corrupted D. All of the mentioned

Q89. Where we store Bag on Pig?


A. – {} B. – [ ] C. – ( ) D. - < >

7
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q90. What is the Default Port of File system embedded in URI "fs.defaultFS" and HDFS web UI
"dfs.namenode.http.address"
A. 8020/9000 and 50060 B. 8010/9000 and 50070
C. 8020/9000 and 50070 D. 8000/9000 and 50060

Q.91) You are Hadoop Administrator and you have 25 Data node Hadoop Cluster and One Name node,
one secondary Name Node while you are copy file form local to HDFS. HDFS goes, into read-only mode
and errors out with “Name node is in safe mode” how to solve this issue
A. hdfs dfsadmin – safemode leave B. hdfs dfsadmin - safemode off C.
dfsadmin – safemode leave D. dfsadmin – safemode off

Q.92)You are running 100 Node Hadoop for holding Big Data cluster.What happened when a DataNode is
marked as dead in your Hadoop Cluster?
A) The NameNode informs the client which write the blocks that are no longer available; the client then
re-writes the blocks to a different DataNode.

B) The NameNode forces re-replication of all the blocks which were stored on the dead DataNode.

C) The set replication criteria of ttie Hadoop for files which had blocks stored on the dead DataNode is
reduced temporarily, until the dead DataNode is recovered and live to the cluster.

D) The next time a client submits job that requires blocks from the dead DataNode, the JobTracker
receives no heart beats from the DataNode.The JobTracker tells the NameNode that the DataNode is
dead,which triggers block re-replication on the cluster.

Q.93) The main role of the secondary Namenode is to A)


Copy the fllesystem metadata from primary Namenode.
B) Copy the filesystem metadata from NFS stored by primary Namenode
C) Periodically merge the namespaoe image with the edit log.
D) Monitor if the primary Namenode is up and running.

Q.94) Which of the following is right statement to read reads a file from HDFS by Client?
A. First the client requests the Name Node to get the block locations of the file. Then Name Node contacts
the Data Node that holds the requested data block. Data is transferred from the Data Node to the Name
Node,and then from the Name Node to the client.
B. The client queries all Data Nodes in parallel. The Data Node that contains the requested data responds
direcUy to the client. The client reads the data directly off the Data Node.
C. The client queries the Name Node for the block locations. The Name Node returns the block locations to
the client. The client reads the data directory off the Data Node's.
D. First the client requests the Name Node to get the block locations of the file. Then the Name Node queries
the Data Nodes for block locations. The Data Nodes respond to the Name Node,and the Name Node
redirects the client to the Data Node that holds the requested data blocks. The client then reads the data
directly off the Data Node.

Q.95) Hive is Hadoop data warehouse technology.By default,Where Hive table data is store in the HDFS
A. hdfs://namenode_server/user/hive/Warehouse
B. hdfs://namenode_server/hive/warehouse
C. hdfs:/ namenode_server/userslhivefwarehouse
D. hdfs://namenode_server/home/hlve/warehouse
8
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank

Q.96) what is Metastore in Hive?


A. Metastore is a central repository in Hive. It is used for storing schema information or metadata in the
HDFS file.
B. Metastore is a central repository in Hive. It is used for storing schema Information or metadata in the
external database.
C. Metastore is a central repository in Hive. It is used for storing Hive Table Data in the external database.
D. Metastore is a central repository in Hive. It is used for storing Hive Table Data in the HDFS File.

Q.97)For your Hadoop Cluster You set the HDFS default block size to 128 MB. What is the mean it?
A. The block size of files in the cluster can be determined as the block is written.
B. The block size of files in the Cluster will all be multiples of 128MB.
C. The block size of files in the duster will all at least be 128MB.
D. The block size of files in the cluster will all be the exactly 128MB.

Q.98) You are copying a file in HDFS from your local computer which is smaller than a default block size
of HDFS. Which of the following statement is true? A. File Can span over multiple blocks.
B. File Cannot be stored in to the HDFS.
C. File Occupies the f'ull block's size.
D. File Occupies only the size it needs and not the full block.

Q.99) You are running Map Reduce Job at which point the reduce method of a given Reducer can be
called?
A. After all mappers have finished processing all records.
B. As soon as a mapper has emitted at least one record.
C. It depends on the Input Format of the job.
D. As soon as at least one mapper has finished processing its input split.

Q.100) In sort and shuffle phase of MapReduce, how are keys and values are passed to the reducers?
A. Keys are passed to reducer in sorted order; values for a given key are sorted in ascending order.
B. Keys are passed to a reducer in random order; values for a given key are not sorted.
C. Keys are passed to a reducer in random order; values for a given key are sorted in ascending order.
D. Keys are passed to reducer in sorted order; values for a given key are not sorted.

Q.101) Which of the following file use to control replication factor of HDFS?
A. hdfs-site.xml B. core-site.xml C. mapred-slte.xml D. yarn-slte.xml

Q.102) You have two Hive table Customer and Order you want to join both table by Customer ID which is
the most appropriate syntax to join both table
A. hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN
ORDERS o ON (c.ID = o.ID);
B. hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN ORDERS o ON (c.ID =
o.CUSTOMER_ID};
C. hive> SELECT c.ID,c.NAME,c.AGE,a.AMOUNT FROM CUSTOMERS c JOIN ORDERS 0 ON (o.CUSTOMER_ID
= o.ID);
D. hive> SELECT c.ID,c.NAME,c.AGE, a.AMOUNT FROM CUSTOMERS c JOIN ORDERS o ON (o.ID = c.ID);

9
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q.103) What is the role of the slave nodes in a Hadoop duster?
A. Slave node runs a TaskTracker,but only a fraction of the slave nodes run DataNode daemons.
B. Slave node runs a JobTracker and a Oa aNode daemon.
C. Slave node runs a DataNode daemon,but only a fraction of the slave nodes run TaskTrackers.
D. Slave node runs a TaskTracker and a DataNode daemon.

Q.104) You are Big Data consultant and your client want technology which allow to store random, Real
time read/write access to hundreds of terabytes of data in distributed, scalable, data Store. Which of the
following would you use?
A. Pig B. HBase C. Hive D. HDFS

Q.105) Your Hadoop cluster has 25 nodes with a total of 250 TB (10 TB per node) of raw disk space
allocated HDFS storage. Assuming Hadoop's default configuration, how much data will you be able to
store?
A. Approximately 90 TB B. Approximately 250TB
C. Approximately 83 TB D. Approximately 10TB

Q.106) What does 'Velocity" in Big Data mean?


A. Speed of storing and processing data B. Speed of ONLY storing data
C. Speed of input data generation D. Speed of individual machine processors

Q.107) What is a Sequence File?


A. A Sequence File is a flat file consisting of binary encoding of an arbitrary number of homogeneous
writable objects.
B. A Sequence File is a flat file consisting of binary encoding of an arbitrary number of heterogeneous
writable objects.
C. A Sequence File is a flat file consisting of binary encoding of an arbitrary number of Writable Comparable
objects, in sorted order.
D. A Sequence File is a flat file consisting of binary key/value pairs of an arbitrary number. Each key must
be the same type. Each value must be same type.

Q.108) What is the regionservers.


A. Communicate with the client and handle data-related operations. Handle read and write requests
for all the regions under it.
B. Is responsible for schema changes and other metadata operations such as creation of tables and
column families.
C. provides services like maintaining configuration information. naming, providing distributed
synchronization. etc.
D. None of the above.

Q.109) In HBase Table creation Which of the following must be declared during the schema definition
creation?
A. Column names B. Column families C. Column data types D.Column definition

Q.110) In any MapReduce Job HBase can be used as a


A. Metadata store B. Datanode C. Metadata node D. Data source

10
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
Q.111) Where Hbase stores data?
A. One filesystem per column family
B. As many filesystems as the number or regionServers
C. As a single filesystem available to all RegionServers
D. One filesystem per table.

Q.112) Which daemon of child JVMs to perform MapReduce Processing?


A. JobTracker B. NameNode C. DataNode D. TaskTracker

Q.113) You are configuring your cluster to run HDFS and MapReduce v2(MRv2) on YARN. Which Two
daemons need to be installed on your cluster’s main nodes?
A. ResourceManager and DataNode B. ResourceManager DataNode
C. TaskTracker NameNode D. Job Tracker NameNode

Q.114) What is the Right steps to be followed to deploy a Big Data solution.
A. Data Ingestion -> Data Processing -> Data Storage
B. Data Ingestion -> Data Storage -> Data Processing
C. Data Storage -> Data Ingestion -> Data processing
D. Data Storage -> Data Processing -> Data Ingestion

Q.115) Pig Is a
A. Query Language B. Programming Language C. Database D. Data Flow Language

Q.116) In Pig how the schema of a relation can be examined?


A. ILLUSTRATE B. DESCRIBE C. DUMP D. EXPLAIN

Q.117) HDFS use for file


A. Write ones and Read many B. Read Write Many times
C. Write once and Read Once D. None of the above

Q.118) You have single NameNode Hadoop duster what is the function performed by the Secondary
NameNode daemon.
A. The Secondary NameNode servers as alternate data channel for clients to reach HDFS, should the
NameNode become too busy.
B. The Secondary NameNode is standby NameNode, ready to failover and provide high availability.
C. The Secondary NameNode perfonms deal-time backups of the NameNode.
D. The Secondary NameNode performs a checkpoint operation on the files by the NameNode.

Q.119) Which of the following characteristic e>f big data is relatively more concerned to data science?
A. Variety B. Velocity C. Volume D. None of the Mentioned

Q.120)when a block is written. What metadata is stored on a DataNode?


A. Information on the file's location in HDFS.
B. Node location of each block belonging to the same namespace.
C. Checksums for the data in the block, as a separate file.
D. None. Only the block itself is written.

Q.121) In HDFS, a file with rw-r--r--set as its permissions.

11
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Big Data Technologies Question Bank
A. The file's contents can be modified by the owner, but no-one else
B. The file cannot be run as a MapReduce job
C. The file cannot be deleted by anyone but the owner can
D. The file cannot be deleted by anyone

Q.122) Which of the following scenarios makes HDFS unavailable?


A. DataNode failure B. Secondary NameNode failure
C. TaskTracker failure D. NameNode failure

Q.123) What is the relationship between MapReduce and Pig?


A. Pig programs rely on MapReduce but are extensible, allowing developers to do special-purpose
processing not provided by MapReduce.
B. Pig provides no additional capabilities to MapReduce. Pig programs are executed as MapReduce
jobs via the Pig interpreter.
C. Pig provides the additional capability or allowing you to control the now or multiple MapReduce
jobs.
D. Pig provides additional capabilities that allow certain types of data manipulation not possible with
MapReduce.

Q.124) In PIG to read data which of the function you will use?
A. LOAD B. WRITE C. READ D. STORE

Q.125) Which of the following statement is true for fsimage_N.


A. Contains the data block of files. and stored in DataNode?
B. Contains the entire file system namespace, including the mapping of blocks to files and file system
properties?
C. contains the log of each data block stored in DataNode,
D. None of the above

Q.126) You want to run Pig in testing environment. Which command is used to start Pig?
A Pig-x local. B. Pig. C. Pig MapReduce. D. Pig-x map.

Q.127) HBase a table can be drop by the following way –


A. Dropped directly B. Only disable, not dropped
C. Only compressed, not dropped D. Dropped after disabling

Q.128) In Hadoop 2.0 JobTracker and TaskTracker replace by.


A. Yet Another Resource Negotiator. B. Yet Another Resource Narrator.
C. Yet Another Resource Manager. D. Yet Another Resource Node.

Q.129) Define the difference between hive and HBase.


A. Hive is used to support record level operations but HBase does not support record level operations.
B. HBase is used to support record level operations but hive does not support record level operations.
C. Hive can be use as a live data system and HBase is Data warehouse technology.
D. None of the above.

12

You might also like