Skip to content

Step 6 – ML & Modeling (The Diagnosis Phase)

Scikit-Learn (The Statistician) & PyTorch (The Neural Network)

Scikit-Learn (sklearn) is a CPU-focused library for classical ML on structured data (tables). It offers regression, SVMs, random forests, and more. PyTorch (torch) is a deep learning framework optimized for GPUs; it uses tensors and automatic differentiation to train neural networks on unstructured data (raw images, text).

This is the “Medical Board Exam.”

  • Path A (Scikit-Learn): A checklist. You hand over measured features (for example, nucleus area, circularity) and it follows decision rules (like a flowchart) to classify. Fast and interpretable, limited by the features you provide.
  • Path B (PyTorch): A resident. You show 10,000 images and it learns patterns you did not explicitly describe (texture, chromatin). Heavier compute, but more powerful.
  • Classification: Tumor vs Normal (binary) or Grade 1/2/3 (multiclass).
  • Regression: Predict a continuous value (for example, survival months).
  • Segmentation (PyTorch): Pixel-level outlines (for example, tumor mask).

Situations where it’s used (Medical Examples)

Section titled “Situations where it’s used (Medical Examples)”
  • Feature Approach (Path A): Random Forest on circularity from Step 5 shows “Circularity < 0.6” predicts malignancy.
  • End-to-End (Path B): Raw H&E patches into ResNet-50 learn stromal orientation signals to predict metastasis.

This is the engine. Everything before (tiling, normalizing, feature extraction) prepared fuel. Here the computer actually attempts a diagnosis.

Run in terminal:

Terminal window
pip install scikit-learn torch torchvision

For PyTorch with GPUs, follow the CUDA-specific command from pytorch.org; the above installs the CPU version.


Use this when you have a CSV of numeric features (for example, from Step 5).

Block A1: Train a Random Forest on tabular features

Section titled “Block A1: Train a Random Forest on tabular features”

The Situation: You measured nuclei (area, perimeter, circularity) and have labels (0 = Benign, 1 = Malignant).
The Solution: Train a Random Forest—100 trees vote on the diagnosis.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# 1. Load your feature table (replace this dummy data with pd.read_csv)
data = {
"area": [100, 120, 110, 400, 420, 450], # Smaller often benign
"circularity": [0.9, 0.85, 0.92, 0.4, 0.3, 0.5], # Round vs irregular
"diagnosis": [0, 0, 0, 1, 1, 1], # 0=Benign, 1=Malignant
}
df = pd.DataFrame(data)
# 2. Split into train/test
X = df[["area", "circularity"]]
y = df["diagnosis"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
# 3. Model
clf = RandomForestClassifier(n_estimators=100, random_state=42)
# 4. Train
clf.fit(X_train, y_train)
# 5. Predict and score
predictions = clf.predict(X_test)
acc = accuracy_score(y_test, predictions)
print(f"Model Accuracy: {acc:.2%}")
# 6. Inference on a new sample
new_cell = [[410, 0.35]]
result = clf.predict(new_cell)
print(f"Prediction for new cell: {'Malignant' if result[0]==1 else 'Benign'}")

Simulated output:

Model Accuracy: 100.00%
Prediction for new cell: Malignant

Path B: Deep Learning (The Convolutional Neural Network)

Section titled “Path B: Deep Learning (The Convolutional Neural Network)”

Use this when you have raw images (for example, patches from Step 2/3).

Block B1: Architecture setup with transfer learning (ResNet-18)

Section titled “Block B1: Architecture setup with transfer learning (ResNet-18)”
import torch
import torch.nn as nn
from torchvision import models
# 1. Load pretrained ResNet-18
model = models.resnet18(weights="DEFAULT")
# 2. Replace the final layer to predict 2 classes (Tumor/Normal)
num_features = model.fc.in_features # usually 512
model.fc = nn.Linear(num_features, 2)
# 3. Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
print("Model architecture modified for 2 classes (Tumor/Normal).")
print(f"Running on: {device}")

Simulated output:

Model architecture modified for 2 classes (Tumor/Normal).
Running on: cpu
from torchvision import transforms
from PIL import Image
import torch.nn.functional as F
# 1. Preprocessing pipeline
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
# 2. Load and preprocess an image (replace with a real patch path)
img_path = "path/to/project/data/raw_images/test_patch.jpg"
img = Image.new("RGB", (300, 300), color="pink") # demo fallback
input_tensor = preprocess(img)
input_batch = input_tensor.unsqueeze(0).to(device)
# 3. Forward pass
model.eval()
with torch.no_grad():
output = model(input_batch) # logits
# 4. Probabilities
probabilities = F.softmax(output[0], dim=0)
print(f"Raw Output scores: {output}")
print(f"Probability Class 0 (Benign): {probabilities[0].item():.4f}")
print(f"Probability Class 1 (Tumor): {probabilities[1].item():.4f}")

Simulated output:

Raw Output scores: tensor([[-0.5612, 0.3421]])
Probability Class 0 (Benign): 0.2883
Probability Class 1 (Tumor): 0.7117