Preprocessing & QC
Step 3 – Preprocessing & Quality Control: cleaning bad images and files
Section titled “Step 3 – Preprocessing & Quality Control: cleaning bad images and files”Here you deal with the messy reality of digital slides. You look for scans that are out of focus, covered in pen marks, folded, poorly stained, or simply missing. You may also run basic numeric checks (for example, image size, brightness, or file integrity) to automatically flag obviously bad slides before they ever reach a model.
Technical name: Preprocessing & QC
What this is
Section titled “What this is”Prepare slides so they don’t waste time or confuse models:
- Keep only relevant parts.
- Remove obvious junk/background.
- Reduce cross‑site/scanner stain differences.
Typical questions
Section titled “Typical questions”- “Can we crop to just the tumor, not blank glass?”
- “Can we remove tiles that are all white or out of focus?”
- “Why do these H&Es look different? Can we make them more uniform?”
Common tasks
Section titled “Common tasks”- Crop to tissue area or ROI.
- Split slides into tiles/patches (e.g., 256×256).
- Filter tiles without tissue/marker/focus.
- Basic stain normalization across labs/scanners.
Core tools (examples)
Section titled “Core tools (examples)”- libvips — fast CLI to crop, resize, tile WSIs. See /tools/libvips/
- QuPath — detect tissue, export tiles, batch operations.
- ImageJ / Fiji — general‑purpose processing for smaller images.
- Histolab / TIAToolbox — WSI‑specific preprocessing and tiling.
Clinician mental model
Section titled “Clinician mental model”Think of grossing and QC for digital slides: trim, clean, and standardize before deeper analysis.
Ready-to-use code
Section titled “Ready-to-use code”- QC-01: Grossing & QC blocks — OpenCV/NumPy tissue masking with Otsu, morphology cleanup, and blur detection to trim background and flag bad scans before modeling.