Text and Social Media Analysis Lab Solutions
Introduction:
Sentiment Analysis: Imagine you’re trying to figure out if people are happy or angry about a new phone. You read their tweets and label each one as positive, negative, or neutral to get an overall sense of the mood.
Association Rule Mining: Think of a grocery store trying to figure out which items are often bought together so they can place them near each other. If people often buy coffee and milk together, the store might put them side-by-side .
Characterization: It’s like describing a typical student in your class. You might say they study hard, attend regularly, and participate in class discussions
WhatsApp Media Analysis: It’s like looking at what types of files (pictures, videos, etc.) people are sharing in a group chat to understand what they are talking about and what’s important to them .
I. Sentiment Analysis of Tweets
Objective:
Perform sentiment analysis on a collection of tweets to understand public opinion on a particular topic
Data:
Let’s use a small, illustrative set of tweets for simplicity. In a real lab, you would collect tweets using the Twitter API (Boehm & Hanlon, 2020, 2018).
| Tweet ID | Tweet Text
1 | “I love the new update! It’s so much faster.” ||
2 | “This product is terrible. What a waste of money.” ||
3 | “The service was okay, nothing special.” ||
4 | “Excited to try out the new features!” ||
5 | “Feeling frustrated with the lack of support.” |
Tasks:
- Manual Sentiment Assignment:
- Go through each tweet and manually assign a sentiment label.
- Example:
- Tweet 1: Positive
- Tweet 2: Negative
- Tweet 3: Neutral
- Tweet 4: Positive
- Tweet 5: Negative
- Simple Sentiment Scoring:
- Assign scores: Positive = +1, Negative = -1, Neutral = 0.
- Calculate the average sentiment score. In this case: (1 – 1 + 0 + 1 – 1) / 5 = 0. This indicates a slightly neutral overall sentiment.
- Interpretation:
- Discuss the overall sentiment based on the average score and the distribution of positive, negative, and neutral tweets.
Even with this small dataset, we can get a sense of public opinion. A real dataset would provide more robust results
II. Association Rule Mining – Online Bookstore
Objective:
Reinforce association rule mining with a simpler example. This relates to finding rules that predict the occurrence of one item based on the presence of another item
Data:
| Transaction ID | Items Purchased
1 | “Coffee”, “Milk”
2 | “Coffee”, “Sugar”
3 | “Tea”, “Biscuits”
4 | “Coffee”, “Milk”
5 | “Tea”
Tasks:
- Calculate Support and Confidence:
- For the rule: {“Coffee”} -> {“Milk”}
- Support: 2/5 = 0.4
- Confidence: 2/3 = 0.67
- For the rule: {“Coffee”} -> {“Milk”}
- Interpretation:
- A confidence of 0.67 suggests that customers who buy coffee also buy milk 67% of the time. The support shows that 40% of all transactions contain both items.
III. Characterization – Frequent Social Media Users
Objective:
Describe the characteristics of “Frequent Social Media Users”. This is about creating a detailed profile of a specific group
Data:
- Average Time Spent per Day on Social Media: 3 hours
- Number of Social Media Accounts: 4
- Frequency of Posting: Multiple times per day
- Types of Content Shared: Mix of personal updates, news, and shared content
- Engagement with Other Users: High (frequent commenting and liking)
Tasks:
- Create a Characterization Summary:
- “Frequent Social Media Users are characterized by spending an average of 3 hours per day on social media platforms, maintaining an average of 4 accounts, and posting multiple times per day. They share a mix of personal updates, news, and content from other sources, and actively engage with other users through comments and likes”.
III. Characterization – Frequent Social Media Users
Objective:
Analyze the types of media shared on WhatsApp and their potential impact.
Introduction:
WhatsApp is a popular platform for sharing various types of media, including images, videos, audio files, and documents . Analyzing the distribution and content of this media can provide insights into user behavior, trends, and potential issues like the spread of misinformation
Data:
For this lab, we’ll simulate a dataset of media shared in a WhatsApp group. In a real-world scenario, gathering this data would require ethical considerations and adherence to privacy policies.
| Message ID | Media Type | File Size | Caption
1 | Image | 0.5 | “Funny meme about the lecture”
2 | Video | 5.2 | “News report on local elections”
3 | Audio | 1.1 | “Voice note discussing project details”
4 | Image | 0.8 | “Infographic about climate change”
5 | Document | 0.3 | “PDF of the assigned reading”
6 | Image | 0.6 | “Promotional poster for a local event”
Tasks:
- Media Type Distribution:
- Calculate the percentage of each media type (image, video, audio, document) in the dataset.
- Example: If there are 6 messages in total:
- Images: 3/6 = 50%
- Videos: 1/6 = 16.7%
- Audio: 1/6 = 16.7%
- Documents: 1/6 = 16.7%
- Content Analysis of Captions:
- Perform a basic text analysis of the captions to identify common keywords and topics.
- For example, you might find that “lecture,” “election,” and “project” are frequent terms.
- Potential Impact Assessment:
- Discuss the potential impact of the media being shared. For example:
- Memes could be for entertainment or to spread opinions.
- News reports could inform or misinform, depending on the source .
- Voice notes facilitate quick communication.
- Infographics can be educational but may also oversimplify complex issues.
- Documents are used for sharing written information.
- Discuss the potential impact of the media being shared. For example:
- Ethical Considerations:
- Discuss the ethical considerations involved in analyzing WhatsApp media, including privacy, consent, and the potential for misuse of the data .