Big Data with Hadoop

Practical 4:

Student Grading System using MapReduce

Objective:
To implement and execute a MapReduce program that reads student scores from an input file, assigns a grade to each student based on a predefined grading scale, and outputs the student ID along with their calculated grade. This practical demonstrates data parsing, conditional logic, and simple data transformation within a distributed environment.

Introduction:
Building on your experience with word counting and finding maximum values, this practical introduces a common data processing task: assigning grades. You will write a MapReduce job that processes student records, where each record contains a student ID and their score. The Mapper will extract these details, and the Reducer will apply grading logic to determine the final letter grade for each student. This further solidifies your understanding of how MapReduce can be used for custom business logic.

Step-by-Step Guide:

1. Prepare Your Development Environment:

  • Continue using your existing Java project setup from previous practicals, ensuring Hadoop client dependencies are included.

2. Input Data Preparation:

  • Create a sample text file (student_scores.txt) with student records. Each line should contain a student ID and their score, separated by a comma.

Example student_scores.txt:

S001,85

S002,92

S003,67

S004,78

S005,55

S006,90

S007,72

S008,61

  • Upload this file to HDFS:

hdfs dfsmkdir /student_input

hdfs dfs -put student_scores.txt /student_input/student_scores.txt

3. Crafting the Mapper Class:

  • The Mapper parses each line, extracts the student ID and score, and emits <StudentID, Score> as key-value pairs.

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

 

public class StudentGradeMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    private Text studentId = new Text();

    private IntWritable score = new IntWritable();

 

    @Override

    public void map(LongWritable key, Text value, Context context)

            throws IOException, InterruptedException {

        String line = value.toString();

        String[] parts = line.split(","); // Format: StudentID,Score

        if (parts.length == 2) {

            try {

                studentId.set(parts[0].trim());

                score.set(Integer.parseInt(parts[1].trim()));

                context.write(studentId, score); // Emit

            } catch (NumberFormatException e) {

                System.err.println("Skipping malformed record: " + line);

            }

        } else {

            System.err.println("Skipping malformed record: " + line);

        }

    }

}

4. Developing the Reducer Class:

  • The Reducer receives <StudentID, list_of_scores>. Assuming one score per student, it applies grading logic and emits <StudentID, Grade>.

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

 

public class StudentGradeReducer extends Reducer<Text, IntWritable, Text, Text> {

 

    @Override

    public void reduce(Text key, Iterable<IntWritable> values, Context context)

            throws IOException, InterruptedException {

        int studentScore = 0;

        // Take the first (and only) score

        for (IntWritable val : values) {

            studentScore = val.get();

            break;

        }

 

        String grade;

        if (studentScore >= 90) {

            grade = "A";

        } else if (studentScore >= 80) {

            grade = "B";

        } else if (studentScore >= 70) {

            grade = "C";

        } else if (studentScore >= 60) {

            grade = "D";

        } else {

            grade = "F";

        }

 

        context.write(key, new Text(grade)); // Emit

    }

}

5. Configuring and Submitting the MapReduce Job:

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

 

public class StudentGradeDriver {

 

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

        if (otherArgs.length < 2) {

            System.err.println("Usage: studentgrade <in> <out>");

            System.exit(2);

        }

 

        Job job = Job.getInstance(conf, "student grading");

        job.setJarByClass(StudentGradeDriver.class);

        job.setMapperClass(StudentGradeMapper.class);

        job.setReducerClass(StudentGradeReducer.class);

 

        // Mapper output types

        job.setMapOutputKeyClass(Text.class);

        job.setMapOutputValueClass(IntWritable.class);

 

        // Reducer output types

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(Text.class);

 

        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

 

        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

6. Compiling and Packaging Your Application:

  • Compile your Java code into a .jar file:

mkdir -p classes

javaccp "$(hadoop classpath)" -d classes StudentGradeMapper.java StudentGradeReducer.java StudentGradeDriver.java

jarcvf studentgrades.jar -C classes/ .

7. Running the Job on Hadoop:

hdfs dfsrm -r /student_output  # Remove output directory if it exists

hadoop jar studentgrades.jar StudentGradeDriver /student_input/student_scores.txt /student_output

8. Verifying the Output:

hdfs dfs -cat /student_output/part-r-00000

  • Sample output (order may vary):

S001 B

S002 A

S003 D

S004 C

S005 F

S006 A

S007 C

S008 D

Conclusion:
You have successfully implemented a MapReduce program to process student scores and assign grades based on custom grading logic. This practical demonstrated how MapReduce can transform raw data into meaningful insights using business-specific rules, reinforcing your understanding of Mapper and Reducer design.