Question Bank For PUT
Question Bank For PUT
2 marks questions
2. Discuss one difference between map reduce and yarn. (2 marks, CO: 2, B.T.: K2)
3. Describe the concept of data partitioning in Hadoop. (2 marks, CO: 3, B.T.: K1)
4. Define Hadoop and discuss its significance in handling big data. (2 marks, CO: 1, B.T.:
K1)
5. Write down any 4 industry examples of big data. (2 marks, CO: 1, B.T.: K2)
7. Discuss different types of data that can be handled with HIVE. (2 marks, CO: 5, B.T.: K2)
8. What are the key components of the Hadoop ecosystem? Briefly explain each. (2 marks,
CO: 2, B.T.: K3)
9. Describe the role of NameNode and DataNode in HDFS. (2 marks, CO: 3, B.T.: K1)
10. Discuss two main points of utility PIG. (2 marks, CO: 5, B.T.: K2)
12. Compare and contrast NoSQL with Relational Databases. (2 marks, CO: 4, B.T.: K2)
13. Discuss the advantages and disadvantages of using Hadoop for big data processing. (2
marks, CO: 4, B.T.: K2)
15. Explain the concept of shuffling and sorting in the map-reduce framework. (2 marks,
CO: 2, B.T.: K1)
16. Compare and contrast Hadoop 1.x with Hadoop 2.x. (2 marks, CO: 2, B.T.: K2)
17. Name two types of nodes in Hadoop. (2 marks, CO: 3, B.T.: K1)
18. Discuss the concept of data locality in Hadoop and its importance in distributed
computing. (2 marks, CO: 3, B.T.: K1)
19. Describe the role of ZooKeeper in Hadoop ecosystem and its importance for distributed
coordination. (2 marks, CO: 4, B.T.: K2)
20. Explain the concept of data replication in HDFS and its significance for fault tolerance.
(2 marks, CO: 3, B.T.: K1)
10 marks questions
2. Discuss the advantages and disadvantages of using Apache Kafka as a real-time data
streaming platform in a big data ecosystem.
3. Examine the process of reading and writing data in HDFS by the client.
4. Analyze the role of Apache Spark in processing big data, highlighting its components
and workflow.
6. With examples, illustrate the use cases and benefits of implementing Apache HBase as
a NoSQL database in a big data environment.
7. With the help of a suitable example, explain how CRUD operations are performed in
MongoDB.
8. Examine the architecture of Apache Cassandra and explain how it ensures high
availability and scalability in distributed database systems.
10. Analyze the architecture of Apache Storm in processing streaming data and provide a
comparison with Apache Flink.
12. Discuss the significance of data governance in big data environments and outline its key
components.
14. Explore the role of data lakes in modern big data architectures and compare them with
traditional data warehouses.
16. Analyze the impact of data replication strategies on fault tolerance and data availability
in distributed systems like Hadoop.
17. Differentiate "Scale up and Scale-out". Explain with an example how Hadoop uses the
scale-out feature to improve performance.
18. Compare and contrast the CAP theorem and the BASE principles in the context of
distributed database systems.
25. Explain the concept of data sharding and its importance in achieving scalability and
performance in NoSQL databases like Apache Cassandra.