Step 4 – Annotation & Labeling (The Digital Stencil)
4 - Annotation & Labeling (The Digital Stencil)
Section titled “4 - Annotation & Labeling (The Digital Stencil)”Name of Tool
Section titled “Name of Tool”Shapely (The Geometry Engine) & GeoJSON (The Universal File Format)
Technical Explanation
Section titled “Technical Explanation”GeoJSON is an open standard for representing geometries plus attributes in JSON. Shapely converts those JSON structures into in-memory geometric objects, enabling operations like intersection, union, containment, and area calculation.
Simplified Explanation
Section titled “Simplified Explanation”This is your “Digital Tracing Paper.” When you draw a circle around a tumor in QuPath, it is a visual sketch; the computer needs math.
Why export to .geojson?
Section titled “Why export to .geojson?”- It is just text—you can open it in Notepad.
- It handles shapes cleanly; unlike CSV (rows/cols), GeoJSON loves nested coordinate lists.
- It is the Universal Adapter: QuPath writes it, Python reads it, Google Maps displays it. Your annotations are not locked inside QuPath.
What does Shapely do?
Section titled “What does Shapely do?”It turns GeoJSON coordinates into Shape Objects in Python so you can ask: “Is this cell inside the tumor polygon?”
What can it do?
Section titled “What can it do?”- Read: Import annotations from QuPath, ASAP, or Aperio.
- Filter: Apply math checks (for example, “ignore any annotation smaller than 500 pixels”).
- Rasterize: Convert line drawings into binary masks for AI training.
- Crop: Cut out exact tumor regions to build focused datasets.
Situations where it’s used (Medical Examples)
Section titled “Situations where it’s used (Medical Examples)”- The “Donut” Problem: You circled a tumor with necrotic center. GeoJSON supports polygons with holes; CSV would struggle.
- The “Gold Standard” Training: Use Shapely to extract only patches inside your hand-annotated “High Grade” regions.
Why it’s important to pathologists
Section titled “Why it’s important to pathologists”An AI only learns what you show it. If your circle never becomes math, the model sees the whole slide (including normal tissue) and gets confused. This step ensures the model sees exactly what you intended.
Installation Instructions
Section titled “Installation Instructions”Run in terminal:
pip install shapely geojson numpy opencv-pythonLego Building Blocks (Code)
Section titled “Lego Building Blocks (Code)”Block A: The Bridge (Reading QuPath Annotations)
Section titled “Block A: The Bridge (Reading QuPath Annotations)”The Situation: You drew tumor regions in QuPath; by default they live in .qpdata, which Python cannot read.
The Solution: Export as GeoJSON in QuPath (File → Export → Annotations as GeoJSON), then load and convert to Shapely polygons.
import jsonfrom shapely.geometry import shape, Polygonfrom pathlib import Path
# 1. Load the QuPath export (GeoJSON)# TODO: export your annotations from QuPath as GeoJSON and set the pathgeojson_path = Path("metadata/qupath_annotations.geojson")
if not geojson_path.exists(): # Demo GeoJSON if file is missing (tutorial only) demo_data = { "type": "FeatureCollection", "features": [ { "type": "Feature", "properties": {"classification": {"name": "Tumor"}}, # Label "geometry": { "type": "Polygon", "coordinates": [[[0, 0], [0, 100], [100, 100], [100, 0], [0, 0]]], }, } ], }else: with open(geojson_path) as f: demo_data = json.load(f)
# 2. Extract polygonsshapes_list = []for feature in demo_data["features"]: label = feature["properties"]["classification"]["name"] geom = shape(feature["geometry"]) # text -> Shapely geometry shapes_list.append({"label": label, "polygon": geom})
print(f"Loaded {len(shapes_list)} annotations.")print(f"First annotation label: {shapes_list[0]['label']}")print(f"First annotation area: {shapes_list[0]['polygon'].area} pixels")Simulated output:
Loaded 1 annotations.First annotation label: TumorFirst annotation area: 10000.0 pixelsBlock B: The Inclusion Check (Point-in-Polygon)
Section titled “Block B: The Inclusion Check (Point-in-Polygon)”The Situation: You have detected cells (nuclei centroids) and must decide which are tumor vs bystander.
The Solution: Use contains to test if each point is inside the polygon.
from shapely.geometry import Point
# 1. Example cell coordinatecell_x, cell_y = 50, 50cell_point = Point(cell_x, cell_y)
# 2. Use the first polygon from Block Atumor_polygon = shapes_list[0]["polygon"]
# 3. Test inclusionis_inside = tumor_polygon.contains(cell_point)print(f"Cell at ({cell_x}, {cell_y}) is inside Tumor? {is_inside}")
# Test a known outside pointoutside_point = Point(200, 200)print(f"Cell at (200, 200) is inside Tumor? {tumor_polygon.contains(outside_point)}")Simulated output:
Cell at (50, 50) is inside Tumor? TrueCell at (200, 200) is inside Tumor? FalseBlock C: The Coloring Book (Rasterizing to Mask)
Section titled “Block C: The Coloring Book (Rasterizing to Mask)”The Situation: Deep learning models want pixel masks, not coordinate lists.
The Solution: Convert polygons to a binary mask with cv2.fillPoly.
import numpy as npimport cv2import matplotlib.pyplot as plt
# 1. Blank canvas (match your patch size)mask = np.zeros((200, 200), dtype=np.uint8)
# 2. Prepare coordinates for OpenCVcoords = np.array(tumor_polygon.exterior.coords, dtype=np.int32)
# 3. Draw the mask (255 = white)cv2.fillPoly(mask, [coords], 255)
# 4. Displayplt.figure(figsize=(5, 5))plt.imshow(mask, cmap="gray")plt.title("Converted Binary Mask")plt.axis("off")plt.show()Simulated output: Black image with a white square; your vector shape is now a raster mask the AI can read.
Resource Site
Section titled “Resource Site”- GeoJSON Specification: https://geojson.org/
- Shapely Documentation: https://shapely.readthedocs.io/en/stable/manual.html