Big Data with Hadoop
Practical
4:
Student Grading
System using MapReduce
Objective:
To implement and execute a MapReduce program that
reads student scores from an input file, assigns a grade to each student based
on a predefined grading scale, and outputs the student ID along with their
calculated grade. This practical demonstrates data parsing, conditional logic,
and simple data transformation within a distributed environment.
Introduction:
Building on your experience with word counting and finding maximum values, this
practical introduces a common data processing task: assigning grades. You will
write a MapReduce job that processes student records,
where each record contains a student ID and their score. The Mapper will
extract these details, and the Reducer will apply grading logic to determine
the final letter grade for each student. This further solidifies your understanding
of how MapReduce can be used for custom business
logic.
Step-by-Step
Guide:
1. Prepare
Your Development Environment:
- Continue
using your existing Java project setup from previous practicals,
ensuring Hadoop client dependencies are
included.
2. Input Data
Preparation:
- Create
a sample text file (student_scores.txt) with student records. Each line should contain a student ID and
their score, separated by a comma.
Example student_scores.txt:
S001,85
S002,92
S003,67
S004,78
S005,55
S006,90
S007,72
S008,61
- Upload
this file to HDFS:
hdfs dfs –mkdir /student_input
hdfs dfs -put
student_scores.txt /student_input/student_scores.txt
3. Crafting
the Mapper Class:
- The
Mapper parses each line, extracts the student ID and score, and emits <StudentID, Score> as key-value pairs.
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class StudentGradeMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text studentId
= new Text();
private IntWritable
score = new IntWritable();
@Override
public void map(LongWritable
key, Text value, Context context)
throws IOException, InterruptedException
{
String line = value.toString();
String[] parts
= line.split(","); // Format: StudentID,Score
if (parts.length == 2) {
try {
studentId.set(parts[0].trim());
score.set(Integer.parseInt(parts[1].trim()));
context.write(studentId, score); // Emit
} catch (NumberFormatException
e) {
System.err.println("Skipping
malformed record: " + line);
}
} else {
System.err.println("Skipping
malformed record: " + line);
}
}
}
4.
Developing the Reducer Class:
- The
Reducer receives <StudentID, list_of_scores>.
Assuming one score per student, it applies grading logic and emits <StudentID, Grade>.
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class StudentGradeReducer
extends Reducer<Text, IntWritable, Text, Text>
{
@Override
public void reduce(Text key, Iterable<IntWritable>
values, Context context)
throws IOException, InterruptedException
{
int
studentScore = 0;
// Take the first (and only) score
for (IntWritable val : values) {
studentScore = val.get();
break;
}
String grade;
if (studentScore >= 90) {
grade =
"A";
} else if (studentScore
>= 80) {
grade =
"B";
} else if (studentScore
>= 70) {
grade =
"C";
} else if (studentScore
>= 60) {
grade =
"D";
} else {
grade =
"F";
}
context.write(key, new Text(grade)); // Emit
}
}
5.
Configuring and Submitting the MapReduce Job:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class StudentGradeDriver
{
public static void main(String[] args) throws Exception {
Configuration conf
= new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage:
studentgrade <in> <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "student grading");
job.setJarByClass(StudentGradeDriver.class);
job.setMapperClass(StudentGradeMapper.class);
job.setReducerClass(StudentGradeReducer.class);
// Mapper output types
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// Reducer output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 :
1);
}
}
6. Compiling
and Packaging Your Application:
- Compile
your Java code into a .jar file:
mkdir -p classes
javac –cp
"$(hadoop classpath)"
-d classes StudentGradeMapper.java StudentGradeReducer.java
StudentGradeDriver.java
jar –cvf
studentgrades.jar -C classes/ .
7. Running
the Job on Hadoop:
hdfs dfs –rm -r /student_output # Remove output directory if it exists
hadoop jar studentgrades.jar StudentGradeDriver /student_input/student_scores.txt
/student_output
8. Verifying
the Output:
hdfs dfs -cat /student_output/part-r-00000
- Sample
output (order may vary):
S001 B
S002 A
S003 D
S004 C
S005 F
S006 A
S007 C
S008 D
Conclusion:
You have successfully implemented a MapReduce program
to process student scores and assign grades based on custom grading logic. This
practical demonstrated how MapReduce can transform
raw data into meaningful insights using business-specific rules, reinforcing
your understanding of Mapper and Reducer design.