Big Data with Hadoop

Practical 10:

Music Track Analytics using MapReduce

Objective:
Compute, per track:

  • Number of unique listeners
  • Total shares
  • Radio plays
  • Total listens

Input: music_logs.txt (pipe-separated log file):

UserId | TrackId | Shared | Radio | Skip

111115 | 222 | 0 | 1 | 0

111113 | 225 | 1 | 0 | 0

111117 | 223 | 0 | 1 | 1

111115 | 225 | 1 | 0 | 0

111118 | 222 | 0 | 0 | 0

111119 | 223 | 0 | 1 | 0

111115 | 222 | 0 | 0 | 0

Steps:

1.     Upload Input to HDFS

hdfs dfs -mkdir /music_input

hdfs dfs -put music_logs.txt /music_input/music_logs.txt

2.     Custom Writable Classes

  • TrackActivityWritable: holds per-record activity flags and userId
  • TrackStatsWritable: holds final aggregated metrics

3.     Mapper:

  • Skip header
  • Parse each log line
  • Emit (TrackId, TrackActivityWritable)

4.     Reducer:

  • Aggregate metrics per track:
    • Unique listeners (HashSet)
    • Total listens
    • Shared count
    • Radio plays
  • Emit (TrackId, TrackStatsWritable)

5.     Driver:

  • Set Mapper, Reducer, Output Key/Value classes
  • Specify input/output paths
  • Submit job

6.     Compile & Package

mkdir -p classes

javac -cp "$(hadoop classpath)" -d classes *.java

jar -cvf music_analytics.jar -C classes/ .

7.     Run Job

hdfs dfs -rm -r /music_output

hadoop jar music_analytics.jar MusicAnalyticsDriver /music_input/music_logs.txt /music_output

8.     Verify Output

hdfs dfs -cat /music_output/part-r-00000

Example Output:

222  2  0  1  3

223  2  0  2  2

225  2  2  0  2

(Columns: TrackId, UniqueListeners, SharedCount, RadioListens, TotalListens)


Conclusion:
Students now know how to:

  • Use custom Writable classes for complex data structures
  • Aggregate multiple metrics in a single MapReduce job
  • Extract meaningful analytics from raw user logs