Computational Pathology Pipeline

This site is a clinician‑friendly, versioned knowledge base for digital/computational pathology. It explains purpose, scope, and where to start, then dives into practical workflows with reproducibility in mind. Use it as a quick reference guide for everything from code blocks to curated resources that keep each stage moving.

The computational pathology pipeline at a glance

Below is the high-level sequence that most digital and computational pathology projects follow. The rest of this documentation is organized around these steps.

Step 0 – Initial System Setup: your lab computer and tools
Install Python and a few core libraries, choose a notebook or editor, and set up a clean project folder so every file has a sensible home. You typically do this once per machine and then reuse it across projects.
Step 1 – Data & Cohorts: deciding which cases are in your study
Decide which patients, slides, or blocks to include and collect all key information in a “cohort table” (usually a CSV) with IDs, diagnoses, outcomes, and other clinicopathologic variables.
Step 2 – Slides & Viewing: opening and exploring your digital slides
Work directly with the digital slides: open whole-slide images, zoom and pan, and perform quick sanity checks that you are looking at the right tissue, stain, and a reasonably good scan.
Step 3 – Preprocessing & Quality Control: cleaning bad images and files
Detect and handle obviously bad data: out-of-focus scans, pen marks, folds, staining issues, or missing files. You can also run basic numeric checks (image size, brightness, file integrity) to automatically flag problems before modeling.

Workflow strategy – choose your path

Stop and decide which pipeline you are building before you leave Preprocessing.

Path A – Supervised (the teacher)
Goal: teach the AI to find something you already know (for example “find tumor”).
Next step: continue to Step 4 – Annotation & Labeling and draw labels in QuPath.

Path B – Unsupervised (the explorer)
Goal: let the AI discover new patterns (for example “cluster tissue types”).
Next step: skip Step 4 – Annotation & Labeling and jump straight to Step 5 – Feature Extraction, typically using libvips-based tiling and feature extraction.
You will still use QuPath later (for example in Step 7) to visualise what the model found.

Step 4 – Annotation & Labeling: telling the computer what is what
Add human knowledge by marking regions, labeling tiles or patches, or assigning slide-level and case-level labels. These labels connect pixels to ground truth diagnoses, grades, or outcomes.
Step 5 – Feature Extraction: turning images into numbers
Convert slides or patches into numeric descriptors such as color statistics, texture features, or deep features from a neural network. The end result is usually a table where each row is a slide or patch and each column is a feature.
Step 6 – ML & Modeling: training models on those features
Use the feature tables (and possibly clinical variables) to train machine-learning and deep-learning models, such as logistic regression, random forests, CNNs, or multiple-instance learning models, depending on the question.
Step 7 – Evaluation & Explainability: checking if you trust the model
Evaluate model performance using appropriate metrics (for example accuracy, AUC, sensitivity, specificity), inspect error patterns, and apply simple explainability tools to see what the model is focusing on, so you can judge whether you would trust it.
Step 8 – Pipelines & Deployment: making your workflow reusable
Turn your notebook or prototype into a repeatable pipeline or simple app. Organize your code, configuration, and models so that the same steps can be run on new data, whether on your workstation, a server, or eventually in a hospital environment.

Each section below dives deeper into one step. libvips is central to Preprocessing & QC for cropping/tiling/format conversion, and OpenSeadragon powers browser‑based viewing in Slides & Viewing.