Step 6 – ML & Modeling (The Diagnosis Phase)
6 - ML & Modeling (The Diagnosis Phase)
Section titled “6 - ML & Modeling (The Diagnosis Phase)”Name of Tool
Section titled “Name of Tool”Scikit-Learn (The Statistician) & PyTorch (The Neural Network)
Technical Explanation
Section titled “Technical Explanation”Scikit-Learn (sklearn) is a CPU-focused library for classical ML on structured data (tables). It offers regression, SVMs, random forests, and more. PyTorch (torch) is a deep learning framework optimized for GPUs; it uses tensors and automatic differentiation to train neural networks on unstructured data (raw images, text).
Simplified Explanation
Section titled “Simplified Explanation”This is the “Medical Board Exam.”
- Path A (Scikit-Learn): A checklist. You hand over measured features (for example, nucleus area, circularity) and it follows decision rules (like a flowchart) to classify. Fast and interpretable, limited by the features you provide.
- Path B (PyTorch): A resident. You show 10,000 images and it learns patterns you did not explicitly describe (texture, chromatin). Heavier compute, but more powerful.
What can it do?
Section titled “What can it do?”- Classification: Tumor vs Normal (binary) or Grade 1/2/3 (multiclass).
- Regression: Predict a continuous value (for example, survival months).
- Segmentation (PyTorch): Pixel-level outlines (for example, tumor mask).
Situations where it’s used (Medical Examples)
Section titled “Situations where it’s used (Medical Examples)”- Feature Approach (Path A): Random Forest on circularity from Step 5 shows “Circularity < 0.6” predicts malignancy.
- End-to-End (Path B): Raw H&E patches into ResNet-50 learn stromal orientation signals to predict metastasis.
Why it’s important to pathologists
Section titled “Why it’s important to pathologists”This is the engine. Everything before (tiling, normalizing, feature extraction) prepared fuel. Here the computer actually attempts a diagnosis.
Installation Instructions
Section titled “Installation Instructions”Run in terminal:
pip install scikit-learn torch torchvisionFor PyTorch with GPUs, follow the CUDA-specific command from pytorch.org; the above installs the CPU version.
Path A: Classical ML (The Random Forest)
Section titled “Path A: Classical ML (The Random Forest)”Use this when you have a CSV of numeric features (for example, from Step 5).
Block A1: Train a Random Forest on tabular features
Section titled “Block A1: Train a Random Forest on tabular features”The Situation: You measured nuclei (area, perimeter, circularity) and have labels (0 = Benign, 1 = Malignant).
The Solution: Train a Random Forest—100 trees vote on the diagnosis.
import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.metrics import accuracy_score
# 1. Load your feature table (replace this dummy data with pd.read_csv)data = { "area": [100, 120, 110, 400, 420, 450], # Smaller often benign "circularity": [0.9, 0.85, 0.92, 0.4, 0.3, 0.5], # Round vs irregular "diagnosis": [0, 0, 0, 1, 1, 1], # 0=Benign, 1=Malignant}df = pd.DataFrame(data)
# 2. Split into train/testX = df[["area", "circularity"]]y = df["diagnosis"]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
# 3. Modelclf = RandomForestClassifier(n_estimators=100, random_state=42)
# 4. Trainclf.fit(X_train, y_train)
# 5. Predict and scorepredictions = clf.predict(X_test)acc = accuracy_score(y_test, predictions)print(f"Model Accuracy: {acc:.2%}")
# 6. Inference on a new samplenew_cell = [[410, 0.35]]result = clf.predict(new_cell)print(f"Prediction for new cell: {'Malignant' if result[0]==1 else 'Benign'}")Simulated output:
Model Accuracy: 100.00%Prediction for new cell: MalignantPath B: Deep Learning (The Convolutional Neural Network)
Section titled “Path B: Deep Learning (The Convolutional Neural Network)”Use this when you have raw images (for example, patches from Step 2/3).
Block B1: Architecture setup with transfer learning (ResNet-18)
Section titled “Block B1: Architecture setup with transfer learning (ResNet-18)”import torchimport torch.nn as nnfrom torchvision import models
# 1. Load pretrained ResNet-18model = models.resnet18(weights="DEFAULT")
# 2. Replace the final layer to predict 2 classes (Tumor/Normal)num_features = model.fc.in_features # usually 512model.fc = nn.Linear(num_features, 2)
# 3. Move to GPU if availabledevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")model = model.to(device)
print("Model architecture modified for 2 classes (Tumor/Normal).")print(f"Running on: {device}")Simulated output:
Model architecture modified for 2 classes (Tumor/Normal).Running on: cpuBlock B2: Single-image inference
Section titled “Block B2: Single-image inference”from torchvision import transformsfrom PIL import Imageimport torch.nn.functional as F
# 1. Preprocessing pipelinepreprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),])
# 2. Load and preprocess an image (replace with a real patch path)img_path = "path/to/project/data/raw_images/test_patch.jpg"img = Image.new("RGB", (300, 300), color="pink") # demo fallbackinput_tensor = preprocess(img)input_batch = input_tensor.unsqueeze(0).to(device)
# 3. Forward passmodel.eval()with torch.no_grad(): output = model(input_batch) # logits
# 4. Probabilitiesprobabilities = F.softmax(output[0], dim=0)
print(f"Raw Output scores: {output}")print(f"Probability Class 0 (Benign): {probabilities[0].item():.4f}")print(f"Probability Class 1 (Tumor): {probabilities[1].item():.4f}")Simulated output:
Raw Output scores: tensor([[-0.5612, 0.3421]])Probability Class 0 (Benign): 0.2883Probability Class 1 (Tumor): 0.7117Resource Site
Section titled “Resource Site”- Scikit-Learn User Guide: https://scikit-learn.org/stable/user_guide.html
- PyTorch Beginner Tutorial: https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
- TorchVision Models (ResNet): https://pytorch.org/vision/stable/models.html