Chapter 2: MapReduce
Easy Level Questions
These
questions aim to assess the students’ ability to recall facts, basic concepts,
and explain ideas related to MapReduce.
1.
What is the primary purpose of the MapReduce
programming model in Big Data processing? [CO3]
2.
Describe the function of the ‘Map’ phase in a MapReduce
job. [CO3]
3.
Describe the function of the ‘Reduce’ phase in a MapReduce
job. [CO3]
4.
Briefly explain what Hadoop Streaming allows
developers to do. [CO5]
Moderate Level Questions
These
questions require students to apply their knowledge to new situations,
interpret information, and draw connections between different concepts.
5.
Given a weather dataset containing temperature readings, explain how you
would structure the key-value pairs generated by the Map phase to find the
average temperature for each day. [CO3]
6.
Discuss the concept of "scaling out" in the context of MapReduce and why it is crucial for handling Big Data.
[CO3, CO5]
7.
Explain the role of a ‘Combiner’ function in a MapReduce
job, and describe how it can optimize performance. [CO5]
8.
Outline the typical data flow within a MapReduce
job, from input data to final output. [CO3, CO5]
Difficult Level Questions
These
questions challenge students to make judgments, synthesize information, and
design solutions, demonstrating a deeper understanding of the subject matter.
9.
Compare and contrast the process of analyzing
a large weather dataset using traditional Unix tools
versus using the Hadoop MapReduce
framework. Discuss the advantages and disadvantages of each approach for very
large datasets. [CO3]
10.
Design a pseudo-code MapReduce program
(specifying Map and Reduce logic) to calculate the maximum temperature recorded
each year from a weather dataset. [CO3, CO5]
11.
Discuss the implications of choosing an appropriate data format for
input to a MapReduce job. How can an inefficient data
format impact the performance and scalability of the job? [CO3, CO5]
12.
A distributed MapReduce job is failing due to
out-of-memory errors on some nodes during the shuffle phase. Propose potential
solutions, including how a Combiner might be used and other strategies to
mitigate such issues, linking them to efficient job execution. [CO3, CO5]