Skip to content

Step 4 – Annotation & Labeling (The Digital Stencil)

4 - Annotation & Labeling (The Digital Stencil)

Section titled “4 - Annotation & Labeling (The Digital Stencil)”

Shapely (The Geometry Engine) & GeoJSON (The Universal File Format)

GeoJSON is an open standard for representing geometries plus attributes in JSON. Shapely converts those JSON structures into in-memory geometric objects, enabling operations like intersection, union, containment, and area calculation.

This is your “Digital Tracing Paper.” When you draw a circle around a tumor in QuPath, it is a visual sketch; the computer needs math.

  • It is just text—you can open it in Notepad.
  • It handles shapes cleanly; unlike CSV (rows/cols), GeoJSON loves nested coordinate lists.
  • It is the Universal Adapter: QuPath writes it, Python reads it, Google Maps displays it. Your annotations are not locked inside QuPath.

It turns GeoJSON coordinates into Shape Objects in Python so you can ask: “Is this cell inside the tumor polygon?”

  • Read: Import annotations from QuPath, ASAP, or Aperio.
  • Filter: Apply math checks (for example, “ignore any annotation smaller than 500 pixels”).
  • Rasterize: Convert line drawings into binary masks for AI training.
  • Crop: Cut out exact tumor regions to build focused datasets.

Situations where it’s used (Medical Examples)

Section titled “Situations where it’s used (Medical Examples)”
  1. The “Donut” Problem: You circled a tumor with necrotic center. GeoJSON supports polygons with holes; CSV would struggle.
  2. The “Gold Standard” Training: Use Shapely to extract only patches inside your hand-annotated “High Grade” regions.

An AI only learns what you show it. If your circle never becomes math, the model sees the whole slide (including normal tissue) and gets confused. This step ensures the model sees exactly what you intended.

Run in terminal:

Terminal window
pip install shapely geojson numpy opencv-python

Block A: The Bridge (Reading QuPath Annotations)

Section titled “Block A: The Bridge (Reading QuPath Annotations)”

The Situation: You drew tumor regions in QuPath; by default they live in .qpdata, which Python cannot read.
The Solution: Export as GeoJSON in QuPath (File → Export → Annotations as GeoJSON), then load and convert to Shapely polygons.

import json
from shapely.geometry import shape, Polygon
from pathlib import Path
# 1. Load the QuPath export (GeoJSON)
# TODO: export your annotations from QuPath as GeoJSON and set the path
geojson_path = Path("metadata/qupath_annotations.geojson")
if not geojson_path.exists():
# Demo GeoJSON if file is missing (tutorial only)
demo_data = {
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {"classification": {"name": "Tumor"}}, # Label
"geometry": {
"type": "Polygon",
"coordinates": [[[0, 0], [0, 100], [100, 100], [100, 0], [0, 0]]],
},
}
],
}
else:
with open(geojson_path) as f:
demo_data = json.load(f)
# 2. Extract polygons
shapes_list = []
for feature in demo_data["features"]:
label = feature["properties"]["classification"]["name"]
geom = shape(feature["geometry"]) # text -> Shapely geometry
shapes_list.append({"label": label, "polygon": geom})
print(f"Loaded {len(shapes_list)} annotations.")
print(f"First annotation label: {shapes_list[0]['label']}")
print(f"First annotation area: {shapes_list[0]['polygon'].area} pixels")

Simulated output:

Loaded 1 annotations.
First annotation label: Tumor
First annotation area: 10000.0 pixels

Block B: The Inclusion Check (Point-in-Polygon)

Section titled “Block B: The Inclusion Check (Point-in-Polygon)”

The Situation: You have detected cells (nuclei centroids) and must decide which are tumor vs bystander.
The Solution: Use contains to test if each point is inside the polygon.

from shapely.geometry import Point
# 1. Example cell coordinate
cell_x, cell_y = 50, 50
cell_point = Point(cell_x, cell_y)
# 2. Use the first polygon from Block A
tumor_polygon = shapes_list[0]["polygon"]
# 3. Test inclusion
is_inside = tumor_polygon.contains(cell_point)
print(f"Cell at ({cell_x}, {cell_y}) is inside Tumor? {is_inside}")
# Test a known outside point
outside_point = Point(200, 200)
print(f"Cell at (200, 200) is inside Tumor? {tumor_polygon.contains(outside_point)}")

Simulated output:

Cell at (50, 50) is inside Tumor? True
Cell at (200, 200) is inside Tumor? False

Block C: The Coloring Book (Rasterizing to Mask)

Section titled “Block C: The Coloring Book (Rasterizing to Mask)”

The Situation: Deep learning models want pixel masks, not coordinate lists.
The Solution: Convert polygons to a binary mask with cv2.fillPoly.

import numpy as np
import cv2
import matplotlib.pyplot as plt
# 1. Blank canvas (match your patch size)
mask = np.zeros((200, 200), dtype=np.uint8)
# 2. Prepare coordinates for OpenCV
coords = np.array(tumor_polygon.exterior.coords, dtype=np.int32)
# 3. Draw the mask (255 = white)
cv2.fillPoly(mask, [coords], 255)
# 4. Display
plt.figure(figsize=(5, 5))
plt.imshow(mask, cmap="gray")
plt.title("Converted Binary Mask")
plt.axis("off")
plt.show()

Simulated output: Black image with a white square; your vector shape is now a raster mask the AI can read.