Step 5 – Feature Extraction (The Morphometrics Phase)
5 - Feature Extraction (The Morphometrics Phase)
Section titled “5 - Feature Extraction (The Morphometrics Phase)”Name of Tool
Section titled “Name of Tool”Scikit-Image (The Measuring Tape) & Pillow (The Image Manipulator)
Technical Explanation
Section titled “Technical Explanation”Scikit-Image (skimage) is a collection of scientific image-processing algorithms. It focuses on measurement: geometric features (area, perimeter, eccentricity) and texture features (Haralick/GLCM, Local Binary Patterns). Pillow handles basic image I/O and simple transformations before measurement.
Simplified Explanation
Section titled “Simplified Explanation”This is your “Digital Ruler and Scale.” A pathologist says “The nuclei are large, irregular, hyperchromatic.” A computer only understands numbers. Scikit-Image translates adjectives into measurements:
- “Large” → Area = 450 pixels
- “Irregular” → Circularity = 0.45
- “Hyperchromatic” → Mean_Intensity = 20 (dark)
What can it do?
Section titled “What can it do?”- Morphometry: Measure size and shape of every cell.
- Texture Analysis: Quantify “roughness” vs “smoothness” (for example, stroma vs tumor).
- Color Quantization: Measure how “blue” a nucleus is (DNA content).
Situations where it’s used (Medical Examples)
Section titled “Situations where it’s used (Medical Examples)”- Grading Cancer: For nuclear pleomorphism, measure 1,000 nuclei and compute size variability.
- Stromal Analysis: Use texture features (Haralick) to show tumor stroma is more chaotic than normal stroma.
Why it’s important to pathologists
Section titled “Why it’s important to pathologists”This step turns “It looks bad” into “Nuclei are 2.5× larger than normal.” Quantitative pathology needs numbers, not just adjectives.
Installation Instructions
Section titled “Installation Instructions”Run in terminal:
pip install scikit-image pillow numpy matplotlibLego Building Blocks (Code)
Section titled “Lego Building Blocks (Code)”Block A: Geometric Features (Measuring Shape)
Section titled “Block A: Geometric Features (Measuring Shape)”The Situation: You have a binary mask of a nucleus and want to know if it is “good” (round) or “bad” (irregular).
The Solution: Use measure.regionprops to compute properties and circularity.
import numpy as npfrom skimage import measureimport math
# 1. Simulate an irregular nucleus mask (in practice, use your segmentation output)mask = np.zeros((100, 100), dtype=np.uint8)mask[30:70, 30:70] = 1 # square (not a circle)
# 2. Label blobs and compute propertieslabel_img = measure.label(mask)props = measure.regionprops(label_img)nucleus = props[0]
# 3. Circularity: (4 * pi * area) / (perimeter^2)perimeter = nucleus.perimeterarea = nucleus.areacircularity = (4 * math.pi * area) / (perimeter ** 2)
print(f"Nucleus Area: {area} pixels")print(f"Nucleus Perimeter: {perimeter:.2f} pixels")print(f"Circularity Score: {circularity:.2f}")
if circularity < 0.8: print("Conclusion: Irregular Shape (Possible Atypia)")else: print("Conclusion: Round Shape (Benign)")Simulated output:
Nucleus Area: 1600 pixelsNucleus Perimeter: 160.00 pixelsCircularity Score: 0.79Conclusion: Irregular Shape (Possible Atypia)Block B: Texture Features (Haralick/GLCM)
Section titled “Block B: Texture Features (Haralick/GLCM)”The Situation: You want to separate “Normal Collagen” (smooth) from “Desmoplasia” (chaotic).
The Solution: Use the gray-level co-occurrence matrix (GLCM) to measure contrast and homogeneity.
from skimage.feature import graycomatrix, graycopropsfrom skimage import data
# 1. Load a sample texture (replace with your grayscale tissue patch)image = data.gravel()
# 2. Calculate GLCM: distance=1 pixel, angle=0 degrees (to the right)glcm = graycomatrix( image, distances=[1], angles=[0], levels=256, symmetric=True, normed=True,)
# 3. Extract featurescontrast = graycoprops(glcm, "contrast")[0, 0] # roughnesshomogeneity = graycoprops(glcm, "homogeneity")[0, 0] # smoothness
print(f"Texture Contrast: {contrast:.2f}")print(f"Texture Homogeneity: {homogeneity:.2f}")
if contrast > 50: print("Conclusion: High texture variation (Rough/Chaotic)")else: print("Conclusion: Low texture variation (Smooth)")Simulated output:
Texture Contrast: 125.40Texture Homogeneity: 0.35Conclusion: High texture variation (Rough/Chaotic)Block C: Intensity Features (Hyperchromasia)
Section titled “Block C: Intensity Features (Hyperchromasia)”The Situation: You need to quantify how dark nuclei are (hyperchromasia).
The Solution: Measure mean intensity; darker nuclei have lower mean values (0 = black, 255 = white).
import numpy as np
# 1. Simulate nuclei (grayscale)nucleus_roi_dark = np.full((10, 10), 50, dtype=np.uint8) # darknucleus_roi_light = np.full((10, 10), 200, dtype=np.uint8) # light
# 2. Measure mean intensitymean_intensity_dark = float(np.mean(nucleus_roi_dark))mean_intensity_light = float(np.mean(nucleus_roi_light))
print(f"Nucleus A Intensity: {mean_intensity_dark} (Darker)")print(f"Nucleus B Intensity: {mean_intensity_light} (Lighter)")
# 3. Decision rule (tune threshold to your data)if mean_intensity_dark < 100: print("Nucleus A is Hyperchromatic.")else: print("Nucleus A is Normal.")Simulated output:
Nucleus A Intensity: 50.0 (Darker)Nucleus B Intensity: 200.0 (Lighter)Nucleus A is Hyperchromatic.Resource Site
Section titled “Resource Site”- Scikit-Image User Guide: https://scikit-image.org/docs/stable/user_guide.html
- Gallery of Examples: https://scikit-image.org/docs/stable/auto_examples/index.html