Data Analytics : Ch. 1 Question Bank 2 1. Compare different evaluation metrics for classification models and their applications. Classification models are evaluated using various metrics, each suited for specific scenarios depending on the problem and dataset characteristics. Below is a comparison of key evaluation metrics and their applications:Comparison:Evaluation metrics for classification models vary based on the problem type and desired performance characteristics:Accuracy: Measures the proportion of correctly classified instances. Suitable for balanced datasets but misleading for imbalanced ones.Precision: Measures the proportion of true positives among predicted positives. Useful for minimizing false positives (e.g., spam detection).Recall (Sensitivity): Measures the proportion of actual positives correctly identified. Important for cases where missing positives is costly (e.g., medical diagnosis).F1-Score: Harmonic mean of precision and recall. Best for imbalanced datasets where both precision and recall matter.ROC-AUC (Receiver Operating Characteristic - Area Under Curve): Evaluates model performance across different thresholds. Ideal for comparing models. 2. Evaluate the role of prescriptive analytics in supply chain optimization. Role of Prescriptive Analytics in Supply Chain OptimizationPrescriptive analytics helps optimize supply chain operations by providing actionable recommendations based on data-driven insights. It integrates historical data, real-time information, and predictive models to suggest the best course of action. Key roles include:Demand Forecasting & Inventory Management – Recommends optimal stock levels to prevent overstocking or shortages.Route Optimization – Suggests the most efficient transportation routes to reduce costs and delivery times.Supplier Selection & Risk Management – Identifies the best suppliers based on cost, reliability, and risk assessment.Production Planning – Optimizes scheduling and resource allocation for improved efficiency.Cost Reduction – Helps minimize operational costs through better decision-making on procurement, logistics, and warehousing. 3. How can predictive analytics be used to improve customer experience in e-commerce? Predictive analytics can significantly enhance the customer experience in e-commerce by leveraging data to anticipate customer needs and provide personalized interactions. Here’s how it can be used:Personalized Recommendations – Predictive models analyze past purchase behavior, browsing history, and customer preferences to suggest relevant products, increasing engagement and conversion rates.Dynamic Pricing – Machine learning algorithms assess demand, competitor pricing, and customer behavior to optimize pricing strategies, offering competitive and personalized pricing.Customer Churn Prediction – Identifying customers at risk of leaving allows businesses to take proactive measures such as targeted promotions, personalized emails, or loyalty rewards.Inventory Management – Predictive analytics forecasts demand patterns, ensuring that popular products remain in stock while reducing excess inventory costs.Chatbots & Virtual Assistants – AI-powered chatbots use predictive analytics to provide relevant responses and anticipate customer inquiries, enhancing the overall shopping experience.Fraud Detection & Prevention – By analyzing transaction data, predictive analytics can detect unusual patterns that may indicate fraudulent activity, ensuring secure transactions.Optimized Marketing Campaigns – E-commerce businesses can use predictive models to segment customers effectively and tailor promotions, emails, and advertisements based on individual preferences and behavior.Improved Customer Support – Predictive analytics helps anticipate common issues and suggests proactive solutions, enabling faster and more effective customer service responses. 4. Why is F1-score a better metric than accuracy for imbalanced datasets? F1-score is a better metric than accuracy for imbalanced datasets because it provides a balanced evaluation of a model’s performance by considering both precision and recall. Here’s why:1. Accuracy Can Be MisleadingIn highly imbalanced datasets, accuracy can be artificially high.Example: If 95% of the data belongs to one class, a model predicting only the majority class achieves 95% accuracy but fails to detect the minority class.2. F1-Score Balances Precision & RecallPrecision measures how many of the predicted positive instances are actually correct.Recall measures how many actual positive instances were correctly identified.F1-score is the harmonic mean of precision and recall, ensuring a balance between the two.3. Handles False Positives & False Negatives WellIn imbalanced datasets, false negatives (missing important cases) or false positives (misclassifications) may have serious consequences.Accuracy does not distinguish between these errors, but the F1-score penalizes extreme cases where precision or recall is very low.4. Example Consider a fraud detection system where fraud cases are only 1% of the data.A model predicting "No Fraud" for all cases achieves 99% accuracy but fails to detect fraud.F1-score would be low in this case, correctly indicating poor model performance. 5. How does class imbalance affect the performance of machine learning models? Class imbalance occurs when one class has significantly more samples than the other(s) in a dataset. It negatively affects model performance in the following ways:Bias Towards Majority ClassModels tend to favor the majority class, as predicting the majority class for all instances can still yield a high accuracy, even though the minority class is neglected.Poor Generalization for Minority ClassThe model may fail to learn the patterns of the minority class, leading to poor performance in identifying rare but important instances (e.g., fraud detection, disease diagnosis).Misleading Evaluation MetricsStandard metrics like accuracy can be misleading in imbalanced datasets. A model predicting only the majority class might appear to perform well in terms of accuracy, while failing to detect the minority class.Increased False Negatives or False PositivesThe model may incorrectly predict minority class instances as the majority class (false negatives), or vice versa (false positives), depending on the nature of the imbalance. 6. What are the advantages of using AUC-ROC for evaluating classifiers? Advantages of Using AUC-ROC for Evaluating ClassifiersThreshold IndependenceAUC-ROC evaluates a model’s performance across all possible classification thresholds, providing a more comprehensive view of model performance compared to a fixed threshold (e.g., accuracy at 0.5).Works Well with Imbalanced DatasetsUnlike accuracy, AUC-ROC is not affected by class imbalance. It focuses on the True Positive Rate (recall) and False Positive Rate, making it suitable for situations where one class is more prevalent than the other.Measures Discrimination AbilityThe AUC (Area Under the Curve) quantifies how well a model distinguishes between positive and negative classes. A higher AUC indicates better performance in classifying both classes.Provides a Single Metric for ComparisonAUC-ROC offers a single scalar value (ranging from 0 to 1) that can be easily used to compare different classifiers or models, simplifying model selection.InterpretabilityAUC-ROC values are intuitive:0.5 → Random guessing,>0.7 → Good model,1.0 → Perfect classifier. 7. Why is accuracy not always a reliable metric for evaluating classifiers? Accuracy not always a reliable metric because : Misleading in Imbalanced DatasetsIn cases where one class dominates (e.g., 95% of the data is from the majority class), a model that always predicts the majority class can still achieve high accuracy, even though it fails to detect the minority class. This doesn't reflect true model performance, especially for important rare events (e.g., fraud detection).Does Not Consider Class DistributionAccuracy treats all misclassifications equally, without considering the severity of false positives or false negatives, which may have different costs or consequences in real-world applications (e.g., medical diagnosis).Ignores False Positives and False NegativesA model can have high accuracy but still make significant errors in predicting important classes (e.g., predicting a disease as negative when it's actually present). It doesn't differentiate between the types of errors (false positives vs. false negatives).Over-simplifies Model PerformanceAccuracy doesn't provide insights into the quality of predictions for each class. It fails to reflect the model’s ability to balance between precision and recall, which are critical for many tasks. 8. How can false positives impact decision-making in classification models? Impact of False Positives in Decision-MakingResource WastageFalse positives lead to unnecessary actions or resources being allocated to incorrect predictions, such as investigating non-fraudulent transactions or treating a healthy patient as diseased.Increased CostsAddressing false positives can incur additional costs, whether in terms of time, money, or human resources, especially in fields like healthcare, finance, or security.Reduced Trust and EfficiencyFrequent false positives can lead to user frustration or system inefficiency, as people may become less likely to trust the system or might need to process more false alerts.Potential Harm in Critical DomainsIn safety-critical applications (e.g., medical diagnoses or autonomous driving), false positives can lead to unnecessary treatments or interventions that could be harmful. Table of Contents Question Bank 1 Next