Skip to content

Step 3 – Preprocessing & QC (The Grossing Room)

3 - Preprocessing & QC (The Grossing Room)

Section titled “3 - Preprocessing & QC (The Grossing Room)”

OpenCV (The Scalpel) & NumPy (The Calculator)

OpenCV (cv2) is a highly optimized computer vision library for real-time image processing—filtering, thresholding, and morphology. NumPy (numpy) is the core package for scientific computing in Python. Images are multi-dimensional arrays of numbers; NumPy lets you manipulate those pixels (subtraction, averaging, filtering) at high speed.

This is your “Digital Grossing Station.”
When you gross a specimen, you trim away fat, find the tumor, and cut clean blocks. OpenCV helps you digitally trim the fat: separate Tissue (important) from Glass/Dust/Marker Pen (garbage). NumPy is the math that decides: “Is this pixel pink enough to be tissue or white enough to be glass?”

  • Tissue Detection: Outline tissue so the AI ignores empty glass.
  • Artifact Removal: Detect and ignore marker pen ink, dust, or coverslip cracks.
  • Blur Detection: Calculate focus sharpness and auto-reject out-of-focus scans.
  • Color Correction: Switch from RGB to HSV to better isolate stains.

Situations where it’s used (Medical Examples)

Section titled “Situations where it’s used (Medical Examples)”
  1. The “Glass” Problem: A WSI is 80% white background. OpenCV builds a mask so the AI looks only at tissue.
  2. The “Blurry” Scan: A strip is out of focus. A Laplacian filter detects low sharpness and auto-rejects the slide.

Efficiency and Accuracy.
Training on empty glass teaches nothing; training on marker pen teaches the wrong thing. Preprocessing makes sure the data represents biology, not artifacts.

Run in terminal:

Terminal window
pip install opencv-python numpy matplotlib

Block A: Tissue Detection (Otsu Thresholding)

Section titled “Block A: Tissue Detection (Otsu Thresholding)”

The Situation: You have a thumbnail from Step 2 and need a binary mask: white = tissue, black = glass. Hard thresholds fail because slides vary.
The Solution: Convert to HSV, use Saturation (S) to distinguish tissue, and apply Otsu’s Method to auto-pick the threshold.

import cv2
import numpy as np
import matplotlib.pyplot as plt
# 1. Load an image (thumbnail from Step 2)
# TODO: replace with your thumbnail or patch path
img_path = "thumbnail.png"
img = cv2.imread(img_path) # BGR format
# 2. Convert to HSV and grab Saturation channel
hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
saturation_channel = hsv_img[:, :, 1]
# 3. Apply Otsu Thresholding
threshold_val, mask = cv2.threshold(
saturation_channel, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU
)
# 4. Display
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
ax[0].set_title("Original Thumbnail")
ax[0].axis("off")
ax[1].imshow(mask, cmap="gray")
ax[1].set_title(f"Tissue Mask (Otsu Thresh: {threshold_val:.1f})")
ax[1].axis("off")
plt.tight_layout()
plt.show()

Simulated output: Two images—left: original pink tissue on white glass; right: black-and-white mask with tissue as solid white and glass as black.

The Situation: The mask has dust specks (white dots) and tiny holes (fat/lumens).
The Solution: Morphological opening removes specks; closing fills holes. Then measure tissue area.

# Assumes 'mask' from Block A exists
# 1. Define a kernel (brush)
kernel = np.ones((5, 5), np.uint8)
# 2. Remove noise (Opening)
mask_cleaned = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
# 3. Fill holes (Closing)
mask_solid = cv2.morphologyEx(mask_cleaned, cv2.MORPH_CLOSE, kernel)
# 4. Calculate tissue percentage
tissue_pixels = cv2.countNonZero(mask_solid)
total_pixels = mask_solid.size
tissue_percentage = (tissue_pixels / total_pixels) * 100
print(f"Tissue Area detected: {tissue_percentage:.2f}% of the slide.")
# Display
plt.figure(figsize=(5, 5))
plt.imshow(mask_solid, cmap="gray")
plt.title("Cleaned Solid Mask")
plt.axis("off")
plt.show()

Simulated output (text): Tissue Area detected: 14.35% of the slide.
Image shows a smooth white blob with holes filled compared to Block A.

The Situation: A scan strip may be blurry; nuclear detail is gone.
The Solution: Use Laplacian variance: high variance = sharp; low variance = blurry. Set your own cutoff.

# 1. Convert to grayscale (color not needed for blur)
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# 2. Calculate Laplacian variance (edge strength)
variance = cv2.Laplacian(gray_img, cv2.CV_64F).var()
# 3. Decision rule (tune threshold for your scanner)
blur_threshold = 100
print(f"Sharpness Score: {variance:.2f}")
if variance < blur_threshold:
print("QC STATUS: FAIL (Blurry)")
else:
print("QC STATUS: PASS (Sharp)")

Simulated output (text):

Sharpness Score: 1250.45
QC STATUS: PASS (Sharp)