Step 3 – Preprocessing & QC (The Grossing Room)
3 - Preprocessing & QC (The Grossing Room)
Section titled “3 - Preprocessing & QC (The Grossing Room)”Name of Tool
Section titled “Name of Tool”OpenCV (The Scalpel) & NumPy (The Calculator)
Technical Explanation
Section titled “Technical Explanation”OpenCV (cv2) is a highly optimized computer vision library for real-time image processing—filtering, thresholding, and morphology. NumPy (numpy) is the core package for scientific computing in Python. Images are multi-dimensional arrays of numbers; NumPy lets you manipulate those pixels (subtraction, averaging, filtering) at high speed.
Simplified Explanation
Section titled “Simplified Explanation”This is your “Digital Grossing Station.”
When you gross a specimen, you trim away fat, find the tumor, and cut clean blocks. OpenCV helps you digitally trim the fat: separate Tissue (important) from Glass/Dust/Marker Pen (garbage). NumPy is the math that decides: “Is this pixel pink enough to be tissue or white enough to be glass?”
What can it do?
Section titled “What can it do?”- Tissue Detection: Outline tissue so the AI ignores empty glass.
- Artifact Removal: Detect and ignore marker pen ink, dust, or coverslip cracks.
- Blur Detection: Calculate focus sharpness and auto-reject out-of-focus scans.
- Color Correction: Switch from RGB to HSV to better isolate stains.
Situations where it’s used (Medical Examples)
Section titled “Situations where it’s used (Medical Examples)”- The “Glass” Problem: A WSI is 80% white background. OpenCV builds a mask so the AI looks only at tissue.
- The “Blurry” Scan: A strip is out of focus. A Laplacian filter detects low sharpness and auto-rejects the slide.
Why it’s important to pathologists
Section titled “Why it’s important to pathologists”Efficiency and Accuracy.
Training on empty glass teaches nothing; training on marker pen teaches the wrong thing. Preprocessing makes sure the data represents biology, not artifacts.
Installation Instructions
Section titled “Installation Instructions”Run in terminal:
pip install opencv-python numpy matplotlibLego Building Blocks (Code)
Section titled “Lego Building Blocks (Code)”Block A: Tissue Detection (Otsu Thresholding)
Section titled “Block A: Tissue Detection (Otsu Thresholding)”The Situation: You have a thumbnail from Step 2 and need a binary mask: white = tissue, black = glass. Hard thresholds fail because slides vary.
The Solution: Convert to HSV, use Saturation (S) to distinguish tissue, and apply Otsu’s Method to auto-pick the threshold.
import cv2import numpy as npimport matplotlib.pyplot as plt
# 1. Load an image (thumbnail from Step 2)# TODO: replace with your thumbnail or patch pathimg_path = "thumbnail.png"img = cv2.imread(img_path) # BGR format
# 2. Convert to HSV and grab Saturation channelhsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)saturation_channel = hsv_img[:, :, 1]
# 3. Apply Otsu Thresholdingthreshold_val, mask = cv2.threshold( saturation_channel, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# 4. Displayfig, ax = plt.subplots(1, 2, figsize=(10, 5))ax[0].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))ax[0].set_title("Original Thumbnail")ax[0].axis("off")ax[1].imshow(mask, cmap="gray")ax[1].set_title(f"Tissue Mask (Otsu Thresh: {threshold_val:.1f})")ax[1].axis("off")plt.tight_layout()plt.show()Simulated output: Two images—left: original pink tissue on white glass; right: black-and-white mask with tissue as solid white and glass as black.
Block B: Cleaning the Mask (Morphology)
Section titled “Block B: Cleaning the Mask (Morphology)”The Situation: The mask has dust specks (white dots) and tiny holes (fat/lumens).
The Solution: Morphological opening removes specks; closing fills holes. Then measure tissue area.
# Assumes 'mask' from Block A exists
# 1. Define a kernel (brush)kernel = np.ones((5, 5), np.uint8)
# 2. Remove noise (Opening)mask_cleaned = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
# 3. Fill holes (Closing)mask_solid = cv2.morphologyEx(mask_cleaned, cv2.MORPH_CLOSE, kernel)
# 4. Calculate tissue percentagetissue_pixels = cv2.countNonZero(mask_solid)total_pixels = mask_solid.sizetissue_percentage = (tissue_pixels / total_pixels) * 100
print(f"Tissue Area detected: {tissue_percentage:.2f}% of the slide.")
# Displayplt.figure(figsize=(5, 5))plt.imshow(mask_solid, cmap="gray")plt.title("Cleaned Solid Mask")plt.axis("off")plt.show()Simulated output (text): Tissue Area detected: 14.35% of the slide.
Image shows a smooth white blob with holes filled compared to Block A.
Block C: Blur Detection (Quality Control)
Section titled “Block C: Blur Detection (Quality Control)”The Situation: A scan strip may be blurry; nuclear detail is gone.
The Solution: Use Laplacian variance: high variance = sharp; low variance = blurry. Set your own cutoff.
# 1. Convert to grayscale (color not needed for blur)gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# 2. Calculate Laplacian variance (edge strength)variance = cv2.Laplacian(gray_img, cv2.CV_64F).var()
# 3. Decision rule (tune threshold for your scanner)blur_threshold = 100
print(f"Sharpness Score: {variance:.2f}")if variance < blur_threshold: print("QC STATUS: FAIL (Blurry)")else: print("QC STATUS: PASS (Sharp)")Simulated output (text):
Sharpness Score: 1250.45QC STATUS: PASS (Sharp)Resource Site
Section titled “Resource Site”- OpenCV Official Tutorials: https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html
- NumPy User Guide: https://numpy.org/doc/stable/user/absolute_beginners.html
- Scikit-Image (alternative to OpenCV): https://scikit-image.org/docs/stable/user_guide.html