Skip to content

Preprocessing & QC

Step 3 – Preprocessing & Quality Control: cleaning bad images and files

Section titled “Step 3 – Preprocessing & Quality Control: cleaning bad images and files”

Here you deal with the messy reality of digital slides. You look for scans that are out of focus, covered in pen marks, folded, poorly stained, or simply missing. You may also run basic numeric checks (for example, image size, brightness, or file integrity) to automatically flag obviously bad slides before they ever reach a model.

Technical name: Preprocessing & QC

Prepare slides so they don’t waste time or confuse models:

  • Keep only relevant parts.
  • Remove obvious junk/background.
  • Reduce cross‑site/scanner stain differences.
  • “Can we crop to just the tumor, not blank glass?”
  • “Can we remove tiles that are all white or out of focus?”
  • “Why do these H&Es look different? Can we make them more uniform?”
  • Crop to tissue area or ROI.
  • Split slides into tiles/patches (e.g., 256×256).
  • Filter tiles without tissue/marker/focus.
  • Basic stain normalization across labs/scanners.
  • libvips — fast CLI to crop, resize, tile WSIs. See /tools/libvips/
  • QuPath — detect tissue, export tiles, batch operations.
  • ImageJ / Fiji — general‑purpose processing for smaller images.
  • Histolab / TIAToolbox — WSI‑specific preprocessing and tiling.

Think of grossing and QC for digital slides: trim, clean, and standardize before deeper analysis.

  • QC-01: Grossing & QC blocks — OpenCV/NumPy tissue masking with Otsu, morphology cleanup, and blur detection to trim background and flag bad scans before modeling.