Step 7 – Evaluation & Explainability (The Board Exam)

7 - Evaluation & Explainability (The Board Exam)

Name of Tool

Scikit-Learn (The Grader) & Seaborn (The Visualizer)

Technical Explanation

Scikit-Learn metrics compare predictions (y_pred) to ground truth (y_true), producing confusion matrices, F1 scores, and ROC curves. Seaborn builds on Matplotlib to visualize these metrics (for example, heatmaps) clearly.

Simplified Explanation

This is peer review / M&M for your model. Accuracy alone is not enough; you need to see the types of mistakes.

Evaluation: “When wrong, how wrong?” (False Positives vs False Negatives).
Explainability: “Why did you say that?” (heatmaps/saliency).

What can it do?

Confusion Matrix: Counts TP/TN/FP/FN.
ROC/AUC: Threshold-independent discrimination.
Probability Calibration: Check if “90% confident” means ~90% correct.
Saliency/Heatmap: Show hotspots that drove the prediction.

Situations where it’s used (Medical Examples)

Accuracy Paradox: 99 benign, 1 cancer → model predicts “benign” for all: 99% accuracy, 0% sensitivity. Confusion matrix exposes this.
Black Box Trust: Saliency highlights true tumor focus (trust ↑) or marker ink artifact (trust ↓).

Why it’s important to pathologists

False negatives (missed cancer) and false positives (unneeded alarms) have different costs. Accuracy hides this. Sensitivity and specificity reveal clinical utility.

Installation Instructions

Run in terminal:

pip install scikit-learn seaborn matplotlib

Lego Building Blocks (Code)

Block A: The Truth Table (Confusion Matrix)

The Situation: You predicted diagnoses for 100 test slides. You need to see exactly which errors occurred.
The Solution: Build a confusion matrix and visualize it with Seaborn; print precision/recall/F1.

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report

# 1. Simulate results
y_true = [0, 0, 0, 0, 1, 1, 1, 1, 0, 1]  # ground truth
y_pred = [0, 0, 0, 1, 1, 1, 0, 1, 0, 1]  # model predictions
# Index 3 = FP, index 6 = FN

# 2. Confusion matrix
cm = confusion_matrix(y_true, y_pred)

# 3. Visualize
plt.figure(figsize=(6, 5))
sns.heatmap(
    cm,
    annot=True,
    fmt="d",
    cmap="Blues",
    xticklabels=["Predicted Benign", "Predicted Tumor"],
    yticklabels=["Actual Benign", "Actual Tumor"],
)
plt.ylabel("Actual Label")
plt.xlabel("Predicted Label")
plt.title("Confusion Matrix")
plt.show()

# 4. Clinical report
print(classification_report(y_true, y_pred, target_names=["Benign", "Tumor"]))

Simulated output:

              precision    recall  f1-score   support

      Benign       0.83      0.80      0.82         5
       Tumor       0.80      0.83      0.82         5

    accuracy                           0.82        10

Block B: The Confidence Check (ROC Curve)

The Situation: You have probabilities (for example 0.75 for tumor). Changing the decision threshold changes sensitivity/specificity.
The Solution: Plot ROC and compute AUC to summarize discrimination independent of threshold.

from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

# 1. Simulate probabilities and truths
y_probs = [0.1, 0.2, 0.4, 0.8, 0.9, 0.6, 0.3, 0.95, 0.1, 0.85]
y_true  = [0,   0,   0,   0,   1,   1,   1,   1,    0,   1]

# 2. ROC stats
fpr, tpr, thresholds = roc_curve(y_true, y_probs)
roc_auc = auc(fpr, tpr)

# 3. Plot
plt.figure(figsize=(6, 6))
plt.plot(fpr, tpr, color="darkorange", lw=2, label=f"ROC curve (AUC = {roc_auc:.2f})")
plt.plot([0, 1], [0, 1], color="navy", lw=2, linestyle="--")  # random guess line
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("1 - Specificity (False Positive Rate)")
plt.ylabel("Sensitivity (True Positive Rate)")
plt.title("Receiver Operating Characteristic (ROC)")
plt.legend(loc="lower right")
plt.show()

Simulated output: A curve rising toward the top-left; AUC near 0.88 indicates strong discrimination.

Block C: The “Where is it?” Map (Heatmap Overlay)

The Situation: The model predicts tumor with 95% confidence; you want to see where.
The Solution: Overlay a probability grid (heatmap) onto the slide thumbnail with transparency.

import numpy as np
import matplotlib.pyplot as plt

# 1. Dummy heatmap grid (10x10 patch scores)
heatmap_grid = np.zeros((10, 10))
heatmap_grid[2:5, 6:9] = 0.9  # hot zone
heatmap_grid[8:9, 1:3] = 0.5  # warm zone

# 2. Dummy background slide
slide_thumbnail = np.full((10, 10, 3), 200, dtype=np.uint8)

# 3. Overlay
plt.figure(figsize=(6, 6))
plt.imshow(slide_thumbnail)
plt.imshow(heatmap_grid, cmap="jet", alpha=0.5)
plt.title("Tumor Probability Heatmap")
plt.axis("off")
plt.colorbar(label="Cancer Probability")
plt.show()

Simulated output: Gray background with a red hotspot overlay where the model thinks cancer is most likely.

Resource Site

Scikit-Learn Metrics Guide: https://scikit-learn.org/stable/modules/model_evaluation.html
Seaborn Examples: https://seaborn.pydata.org/examples/index.html
Understanding ROC Curves: Google ML Crash Course – ROC/AUC