Skip to content

Step 7 – Evaluation & Explainability (The Board Exam)

7 - Evaluation & Explainability (The Board Exam)

Section titled “7 - Evaluation & Explainability (The Board Exam)”

Scikit-Learn (The Grader) & Seaborn (The Visualizer)

Scikit-Learn metrics compare predictions (y_pred) to ground truth (y_true), producing confusion matrices, F1 scores, and ROC curves. Seaborn builds on Matplotlib to visualize these metrics (for example, heatmaps) clearly.

This is peer review / M&M for your model. Accuracy alone is not enough; you need to see the types of mistakes.

  • Evaluation: “When wrong, how wrong?” (False Positives vs False Negatives).
  • Explainability: “Why did you say that?” (heatmaps/saliency).
  • Confusion Matrix: Counts TP/TN/FP/FN.
  • ROC/AUC: Threshold-independent discrimination.
  • Probability Calibration: Check if “90% confident” means ~90% correct.
  • Saliency/Heatmap: Show hotspots that drove the prediction.

Situations where it’s used (Medical Examples)

Section titled “Situations where it’s used (Medical Examples)”
  • Accuracy Paradox: 99 benign, 1 cancer → model predicts “benign” for all: 99% accuracy, 0% sensitivity. Confusion matrix exposes this.
  • Black Box Trust: Saliency highlights true tumor focus (trust ↑) or marker ink artifact (trust ↓).

False negatives (missed cancer) and false positives (unneeded alarms) have different costs. Accuracy hides this. Sensitivity and specificity reveal clinical utility.

Run in terminal:

Terminal window
pip install scikit-learn seaborn matplotlib

Block A: The Truth Table (Confusion Matrix)

Section titled “Block A: The Truth Table (Confusion Matrix)”

The Situation: You predicted diagnoses for 100 test slides. You need to see exactly which errors occurred.
The Solution: Build a confusion matrix and visualize it with Seaborn; print precision/recall/F1.

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
# 1. Simulate results
y_true = [0, 0, 0, 0, 1, 1, 1, 1, 0, 1] # ground truth
y_pred = [0, 0, 0, 1, 1, 1, 0, 1, 0, 1] # model predictions
# Index 3 = FP, index 6 = FN
# 2. Confusion matrix
cm = confusion_matrix(y_true, y_pred)
# 3. Visualize
plt.figure(figsize=(6, 5))
sns.heatmap(
cm,
annot=True,
fmt="d",
cmap="Blues",
xticklabels=["Predicted Benign", "Predicted Tumor"],
yticklabels=["Actual Benign", "Actual Tumor"],
)
plt.ylabel("Actual Label")
plt.xlabel("Predicted Label")
plt.title("Confusion Matrix")
plt.show()
# 4. Clinical report
print(classification_report(y_true, y_pred, target_names=["Benign", "Tumor"]))

Simulated output:

precision recall f1-score support
Benign 0.83 0.80 0.82 5
Tumor 0.80 0.83 0.82 5
accuracy 0.82 10

The Situation: You have probabilities (for example 0.75 for tumor). Changing the decision threshold changes sensitivity/specificity.
The Solution: Plot ROC and compute AUC to summarize discrimination independent of threshold.

from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
# 1. Simulate probabilities and truths
y_probs = [0.1, 0.2, 0.4, 0.8, 0.9, 0.6, 0.3, 0.95, 0.1, 0.85]
y_true = [0, 0, 0, 0, 1, 1, 1, 1, 0, 1]
# 2. ROC stats
fpr, tpr, thresholds = roc_curve(y_true, y_probs)
roc_auc = auc(fpr, tpr)
# 3. Plot
plt.figure(figsize=(6, 6))
plt.plot(fpr, tpr, color="darkorange", lw=2, label=f"ROC curve (AUC = {roc_auc:.2f})")
plt.plot([0, 1], [0, 1], color="navy", lw=2, linestyle="--") # random guess line
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("1 - Specificity (False Positive Rate)")
plt.ylabel("Sensitivity (True Positive Rate)")
plt.title("Receiver Operating Characteristic (ROC)")
plt.legend(loc="lower right")
plt.show()

Simulated output: A curve rising toward the top-left; AUC near 0.88 indicates strong discrimination.

Block C: The “Where is it?” Map (Heatmap Overlay)

Section titled “Block C: The “Where is it?” Map (Heatmap Overlay)”

The Situation: The model predicts tumor with 95% confidence; you want to see where.
The Solution: Overlay a probability grid (heatmap) onto the slide thumbnail with transparency.

import numpy as np
import matplotlib.pyplot as plt
# 1. Dummy heatmap grid (10x10 patch scores)
heatmap_grid = np.zeros((10, 10))
heatmap_grid[2:5, 6:9] = 0.9 # hot zone
heatmap_grid[8:9, 1:3] = 0.5 # warm zone
# 2. Dummy background slide
slide_thumbnail = np.full((10, 10, 3), 200, dtype=np.uint8)
# 3. Overlay
plt.figure(figsize=(6, 6))
plt.imshow(slide_thumbnail)
plt.imshow(heatmap_grid, cmap="jet", alpha=0.5)
plt.title("Tumor Probability Heatmap")
plt.axis("off")
plt.colorbar(label="Cancer Probability")
plt.show()

Simulated output: Gray background with a red hotspot overlay where the model thinks cancer is most likely.