Chapter 3: The Hadoop Distributed File System
Easy
Level Questions
These
questions aim to assess the students’ ability to recall fundamental facts and
basic concepts related to HDFS.
1.
Identify the two main daemons (components) that constitute the core architecture
of HDFS and briefly describe their primary roles [CO2]
2.
What is a ‘block’ in HDFS, and why is its size typically much larger
than blocks in traditional file systems? [CO3]
3.
List two fundamental design goals of HDFS that distinguish it from
conventional file systems. [CO3]
4.
How does HDFS achieve fault tolerance to prevent data loss in case of
node failures? [CO3]
Moderate
Level Questions
These
questions require students to apply their knowledge, interpret information, and
make connections between different HDFS concepts.
5.
Explain the data flow when a client writes a new file to HDFS, detailing
the interaction between the client, NameNode, and DataNodes [CO3].
6.
Compare and contrast HDFS Federation with a single NameNode
architecture in terms of scalability and availability. [CO2, CO3].
7.
Describe how the distcp command is utilized in HDFS and explain its
advantages for parallel copying of large datasets [CO3].
8.
Given the characteristic of HDFS to prioritize high-throughput over
low-latency access, explain the types of applications for which HDFS is
well-suited [CO1, CO3]
Difficult Level Questions
These
questions challenge students to synthesize information, critically evaluate
concepts, and design solutions, demonstrating a deeper understanding of HDFS.
9.
Discuss the coherency model of HDFS. How does it balance consistency,
availability, and partition tolerance in its design, especially during
concurrent write operations? [CO3].
10.
A Hadoop cluster experiences frequent NameNode failures. Propose a solution involving HDFS
High-Availability features to mitigate this issue, detailing the components and
their roles in ensuring continuous operation [CO2, CO5]
11.
Design a sequence of command-line interface operations a user would
perform to: create a directory, upload a local file into it, verify the file’s
presence, and then download it back, followed by deleting the file and
directory. Explain the purpose of each command [CO3].
12.
Analyze a scenario where an HDFS cluster becomes
unbalanced (i.e., data is unevenly distributed across DataNodes).
Explain the potential consequences of this imbalance and outline strategies for
keeping an HDFS cluster balanced [CO3, CO5]