Big Data with Hadoop
Practical 10:
Music Track Analytics using MapReduce
Objective:
Compute, per track:
- Number
of unique listeners
- Total
shares
- Radio
plays
- Total
listens
Input: music_logs.txt (pipe-separated log file):
UserId | TrackId | Shared | Radio |
Skip
111115 | 222 | 0 | 1 | 0
111113 | 225 | 1 | 0 | 0
111117 | 223 | 0 | 1 | 1
111115 | 225 | 1 | 0 | 0
111118 | 222 | 0 | 0 | 0
111119 | 223 | 0 | 1 | 0
111115 | 222 | 0 | 0 | 0
Steps:
1.
Upload Input to HDFS
hdfs dfs -mkdir /music_input
hdfs dfs -put music_logs.txt /music_input/music_logs.txt
2.
Custom Writable Classes
- TrackActivityWritable: holds
per-record activity flags and userId
- TrackStatsWritable: holds
final aggregated metrics
3.
Mapper:
- Skip
header
- Parse
each log line
- Emit (TrackId,
TrackActivityWritable)
4.
Reducer:
- Aggregate
metrics per track:
- Unique
listeners (HashSet)
- Total
listens
- Shared
count
- Radio
plays
- Emit (TrackId,
TrackStatsWritable)
5.
Driver:
- Set
Mapper, Reducer, Output Key/Value classes
- Specify
input/output paths
- Submit
job
6.
Compile & Package
mkdir -p classes
javac -cp "$(hadoop
classpath)" -d classes *.java
jar -cvf music_analytics.jar -C
classes/ .
7.
Run Job
hdfs dfs -rm -r /music_output
hadoop jar music_analytics.jar
MusicAnalyticsDriver /music_input/music_logs.txt /music_output
8.
Verify Output
hdfs dfs -cat /music_output/part-r-00000
Example
Output:
222
2 0 1 3
223
2 0 2 2
225
2 2 0 2
(Columns:
TrackId, UniqueListeners, SharedCount, RadioListens, TotalListens)
Conclusion:
Students now know how to:
- Use
custom Writable classes for complex data structures
- Aggregate
multiple metrics in a single MapReduce job
- Extract
meaningful analytics from raw user logs