Chapter 1: Introduction to Big Data

Easy Level Questions

These questions aim to assess the students’ ability to recall facts, basic concepts, and explain ideas related to Big Data and Hadoop.

1.     Define Big Data and list its three primary characteristics, commonly known as the "3 Vs". [CO1]

2.     Explain why traditional relational database management systems are not well-suited for processing and storing Big Data. [CO1]

3.     What is the fundamental purpose of Apache Hadoop in the context of Big Data processing? [CO2]

4.     Name and briefly describe two core components of the Hadoop Ecosystem. [CO2]

Moderate Level Questions

These questions require students to apply their knowledge to new situations, interpret information, and draw connections between different concepts.

5.     Compare and contrast the architectural approaches of Hadoop with Grid Computing for large-scale data processing, highlighting their key differences in handling data. [CO1]

6.     Describe a real-world business scenario where the inability to analyze Big Data could lead to significant competitive disadvantage. [CO1]

7.     Explain the concept of "data locality" in Hadoop Distributed File System and elaborate on how it contributes to the overall efficiency of Big Data processing. [CO2]

8.     Given a dataset exhibiting high velocity and variety (e.g., social media feeds), discuss how a Big Data approach would be fundamentally different from a conventional data management strategy. [CO1]

Difficult Level Questions

These questions challenge students to make judgments, synthesize information, and design solutions, demonstrating a deeper understanding of the subject matter.

9.     Critically evaluate the advantages and disadvantages of adopting a Big Data strategy for an organization, considering both technological and business implications. [CO1]

10.                        Propose a simplified architectural diagram illustrating how data flows and is processed within the Hadoop Ecosystem, specifically showing the interaction between HDFS, MapReduce, and YARN for a data analysis task. [CO2]

11.                        Discuss the historical milestones that led to the development of Apache Hadoop as a prominent Big Data framework, assessing the impact of these developments on the evolution of data storage and processing paradigms. [CO1, CO2]

12.                        Imagine a smart city project collecting real-time traffic sensor data. Design a high-level conceptual framework for storing, processing, and analyzing this continuous stream of Big Data using principles introduced in the "Introduction to Big Data" chapter. Justify the choice of Big Data characteristics addressed. [CO1, CO2]