Chapter 3: The Hadoop Distributed File System 

Easy Level Questions

These questions aim to assess the students’ ability to recall fundamental facts and basic concepts related to HDFS.

1.     Identify the two main daemons (components) that constitute the core architecture of HDFS and briefly describe their primary roles [CO2]

2.     What is a ‘block’ in HDFS, and why is its size typically much larger than blocks in traditional file systems? [CO3]

3.     List two fundamental design goals of HDFS that distinguish it from conventional file systems. [CO3]  

4.     How does HDFS achieve fault tolerance to prevent data loss in case of node failures? [CO3]  

Moderate Level Questions

These questions require students to apply their knowledge, interpret information, and make connections between different HDFS concepts.

5.     Explain the data flow when a client writes a new file to HDFS, detailing the interaction between the client, NameNode, and DataNodes [CO3].

6.     Compare and contrast HDFS Federation with a single NameNode architecture in terms of scalability and availability. [CO2, CO3].

7.     Describe how the distcp command is utilized in HDFS and explain its advantages for parallel copying of large datasets [CO3].

8.     Given the characteristic of HDFS to prioritize high-throughput over low-latency access, explain the types of applications for which HDFS is well-suited [CO1, CO3]  

Difficult  Level Questions

These questions challenge students to synthesize information, critically evaluate concepts, and design solutions, demonstrating a deeper understanding of HDFS.

9.     Discuss the coherency model of HDFS. How does it balance consistency, availability, and partition tolerance in its design, especially during concurrent write operations? [CO3].

10.                        A Hadoop cluster experiences frequent NameNode failures. Propose a solution involving HDFS High-Availability features to mitigate this issue, detailing the components and their roles in ensuring continuous operation [CO2, CO5]

11.                        Design a sequence of command-line interface operations a user would perform to: create a directory, upload a local file into it, verify the file’s presence, and then download it back, followed by deleting the file and directory. Explain the purpose of each command [CO3].

12.                        Analyze a scenario where an HDFS cluster becomes unbalanced (i.e., data is unevenly distributed across DataNodes). Explain the potential consequences of this imbalance and outline strategies for keeping an HDFS cluster balanced [CO3, CO5]