Step 7 – Evaluation & Explainability (The Board Exam)
7 - Evaluation & Explainability (The Board Exam)
Section titled “7 - Evaluation & Explainability (The Board Exam)”Name of Tool
Section titled “Name of Tool”Scikit-Learn (The Grader) & Seaborn (The Visualizer)
Technical Explanation
Section titled “Technical Explanation”Scikit-Learn metrics compare predictions (y_pred) to ground truth (y_true), producing confusion matrices, F1 scores, and ROC curves. Seaborn builds on Matplotlib to visualize these metrics (for example, heatmaps) clearly.
Simplified Explanation
Section titled “Simplified Explanation”This is peer review / M&M for your model. Accuracy alone is not enough; you need to see the types of mistakes.
- Evaluation: “When wrong, how wrong?” (False Positives vs False Negatives).
- Explainability: “Why did you say that?” (heatmaps/saliency).
What can it do?
Section titled “What can it do?”- Confusion Matrix: Counts TP/TN/FP/FN.
- ROC/AUC: Threshold-independent discrimination.
- Probability Calibration: Check if “90% confident” means ~90% correct.
- Saliency/Heatmap: Show hotspots that drove the prediction.
Situations where it’s used (Medical Examples)
Section titled “Situations where it’s used (Medical Examples)”- Accuracy Paradox: 99 benign, 1 cancer → model predicts “benign” for all: 99% accuracy, 0% sensitivity. Confusion matrix exposes this.
- Black Box Trust: Saliency highlights true tumor focus (trust ↑) or marker ink artifact (trust ↓).
Why it’s important to pathologists
Section titled “Why it’s important to pathologists”False negatives (missed cancer) and false positives (unneeded alarms) have different costs. Accuracy hides this. Sensitivity and specificity reveal clinical utility.
Installation Instructions
Section titled “Installation Instructions”Run in terminal:
pip install scikit-learn seaborn matplotlibLego Building Blocks (Code)
Section titled “Lego Building Blocks (Code)”Block A: The Truth Table (Confusion Matrix)
Section titled “Block A: The Truth Table (Confusion Matrix)”The Situation: You predicted diagnoses for 100 test slides. You need to see exactly which errors occurred.
The Solution: Build a confusion matrix and visualize it with Seaborn; print precision/recall/F1.
import matplotlib.pyplot as pltimport seaborn as snsfrom sklearn.metrics import confusion_matrix, classification_report
# 1. Simulate resultsy_true = [0, 0, 0, 0, 1, 1, 1, 1, 0, 1] # ground truthy_pred = [0, 0, 0, 1, 1, 1, 0, 1, 0, 1] # model predictions# Index 3 = FP, index 6 = FN
# 2. Confusion matrixcm = confusion_matrix(y_true, y_pred)
# 3. Visualizeplt.figure(figsize=(6, 5))sns.heatmap( cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Predicted Benign", "Predicted Tumor"], yticklabels=["Actual Benign", "Actual Tumor"],)plt.ylabel("Actual Label")plt.xlabel("Predicted Label")plt.title("Confusion Matrix")plt.show()
# 4. Clinical reportprint(classification_report(y_true, y_pred, target_names=["Benign", "Tumor"]))Simulated output:
precision recall f1-score support
Benign 0.83 0.80 0.82 5 Tumor 0.80 0.83 0.82 5
accuracy 0.82 10Block B: The Confidence Check (ROC Curve)
Section titled “Block B: The Confidence Check (ROC Curve)”The Situation: You have probabilities (for example 0.75 for tumor). Changing the decision threshold changes sensitivity/specificity.
The Solution: Plot ROC and compute AUC to summarize discrimination independent of threshold.
from sklearn.metrics import roc_curve, aucimport matplotlib.pyplot as plt
# 1. Simulate probabilities and truthsy_probs = [0.1, 0.2, 0.4, 0.8, 0.9, 0.6, 0.3, 0.95, 0.1, 0.85]y_true = [0, 0, 0, 0, 1, 1, 1, 1, 0, 1]
# 2. ROC statsfpr, tpr, thresholds = roc_curve(y_true, y_probs)roc_auc = auc(fpr, tpr)
# 3. Plotplt.figure(figsize=(6, 6))plt.plot(fpr, tpr, color="darkorange", lw=2, label=f"ROC curve (AUC = {roc_auc:.2f})")plt.plot([0, 1], [0, 1], color="navy", lw=2, linestyle="--") # random guess lineplt.xlim([0.0, 1.0])plt.ylim([0.0, 1.05])plt.xlabel("1 - Specificity (False Positive Rate)")plt.ylabel("Sensitivity (True Positive Rate)")plt.title("Receiver Operating Characteristic (ROC)")plt.legend(loc="lower right")plt.show()Simulated output: A curve rising toward the top-left; AUC near 0.88 indicates strong discrimination.
Block C: The “Where is it?” Map (Heatmap Overlay)
Section titled “Block C: The “Where is it?” Map (Heatmap Overlay)”The Situation: The model predicts tumor with 95% confidence; you want to see where.
The Solution: Overlay a probability grid (heatmap) onto the slide thumbnail with transparency.
import numpy as npimport matplotlib.pyplot as plt
# 1. Dummy heatmap grid (10x10 patch scores)heatmap_grid = np.zeros((10, 10))heatmap_grid[2:5, 6:9] = 0.9 # hot zoneheatmap_grid[8:9, 1:3] = 0.5 # warm zone
# 2. Dummy background slideslide_thumbnail = np.full((10, 10, 3), 200, dtype=np.uint8)
# 3. Overlayplt.figure(figsize=(6, 6))plt.imshow(slide_thumbnail)plt.imshow(heatmap_grid, cmap="jet", alpha=0.5)plt.title("Tumor Probability Heatmap")plt.axis("off")plt.colorbar(label="Cancer Probability")plt.show()Simulated output: Gray background with a red hotspot overlay where the model thinks cancer is most likely.
Resource Site
Section titled “Resource Site”- Scikit-Learn Metrics Guide: https://scikit-learn.org/stable/modules/model_evaluation.html
- Seaborn Examples: https://seaborn.pydata.org/examples/index.html
- Understanding ROC Curves: Google ML Crash Course – ROC/AUC