Initial System Setup

Initial System Setup: your lab computer and tools

This is where you get your “computational lab” ready: installing Python and a few core libraries, choosing a notebook or editor, and creating a clean project folder so every file has a sensible home. Most of the time you only need to do this once per machine, and then you can reuse the same setup for many projects.

Technical name: Initial System Setup

All-in-one Linux setup script (Ubuntu & Arch)

This is a personal bootstrap script to get a fresh Linux machine ready for digital/computational pathology work.

It will:

detect whether you’re on Ubuntu/Debian or an Arch-based distro (Arch, Manjaro, EndeavourOS)
install:
- basic build tools
- git, curl, wget
- Docker
- Visual Studio Code
- Miniconda (Python + conda env manager)

It does not install NVIDIA drivers or CUDA (those are hardware-specific).

How to use this script (step by step)

Do this on the Linux machine you want to prepare (native Ubuntu/Arch or inside WSL2).

Open a terminal
- Ubuntu: “Terminal” from the app menu.
- Arch: your usual terminal emulator.
- WSL2: “Ubuntu” / your distro from the Start menu.
Go to your home directory
Terminal window
```
cd ~
```
Create a new file for the script

Open it with a simple editor like nano:
Terminal window
```
nano setup_dcp_env.sh
```
This opens an empty file called setup_dcp_env.sh.

In your browser, select the entire script block (below), copy it:

#!/usr/bin/env bash
# setup_dcp_env.sh - quick environment bootstrap for Ubuntu/Debian and Arch

set -euo pipefail

echo "=== Digital / computational pathology environment bootstrap ==="

if [ ! -f /etc/os-release ]; then
  echo "Cannot detect Linux distribution (no /etc/os-release). Exiting."
  exit 1
fi

# shellcheck disable=SC1091
. /etc/os-release
DISTRO_ID="${ID:-unknown}"

echo "Detected distro: ${DISTRO_ID}"

install_ubuntu_like() {
  echo "Running Ubuntu/Debian setup..."

  sudo apt update

  sudo apt install -y \
    build-essential \
    git \
    curl \
    wget \
    ca-certificates \
    software-properties-common

  # VS Code repo
  wget -qO- https://packages.microsoft.com/keys/microsoft.asc | \
    gpg --dearmor | \
    sudo tee /usr/share/keyrings/packages.microsoft.gpg >/dev/null

  echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/packages.microsoft.gpg] https://packages.microsoft.com/repos/code stable main" | \
    sudo tee /etc/apt/sources.list.d/vscode.list >/dev/null

  sudo apt update
  sudo apt install -y code

  # Docker (simple Ubuntu Docker package; adjust as needed)
  sudo apt install -y docker.io
  sudo systemctl enable --now docker
  sudo usermod -aG docker "$USER" || true
}

install_arch_like() {
  echo "Running Arch setup..."

  sudo pacman -Syu --noconfirm

  sudo pacman -S --noconfirm --needed \
    base-devel \
    git \
    curl \
    wget \
    ca-certificates \
    code \
    docker

  sudo systemctl enable --now docker
  sudo usermod -aG docker "$USER" || true
}

case "$DISTRO_ID" in
  ubuntu|debian)
    install_ubuntu_like
    ;;
  arch|manjaro|endeavouros)
    install_arch_like
    ;;
  *)
    echo "This script only supports Ubuntu/Debian and Arch-like distros right now."
    echo "Detected ID='$DISTRO_ID'. Exiting."
    exit 1
    ;;
esac

# Miniconda (user install)
if [ ! -d "$HOME/miniconda3" ]; then
  echo "Installing Miniconda into $HOME/miniconda3 ..."
  TMP_INSTALLER="/tmp/miniconda.sh"
  curl -fsSL https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o "$TMP_INSTALLER"
  bash "$TMP_INSTALLER" -b -p "$HOME/miniconda3"
  rm -f "$TMP_INSTALLER"

  if ! grep -q 'miniconda3' "$HOME/.bashrc" 2>/dev/null; then
    echo 'export PATH="$HOME/miniconda3/bin:$PATH"' >> "$HOME/.bashrc"
  fi
else
  echo "Miniconda directory already exists at $HOME/miniconda3, skipping."
fi

echo
echo "Done."
echo "- You may need to log out and back in for Docker group changes to take effect."
echo "- Open a new shell so the Miniconda PATH in .bashrc is picked up."

Go back to the terminal where nano is open and paste
- Click inside the terminal window with nano.
- Paste:
  - Often right-click → Paste, or
  - Shift+Insert depending on terminal.
You should now see the full script inside nano.
Save and exit the editor

In nano:
- Ctrl+O → Enter (save)
- Ctrl+X (exit)
Make the script executable
Terminal window
```
chmod +x setup_dcp_env.sh
```
Run the script
Terminal window
```
./setup_dcp_env.sh
```
- Enter your password when sudo prompts.
- Let it finish; watch for obvious errors.
Log out and back in
- Log out of your session (or reboot) and log back in.
- This makes sure:
  - your docker group membership is active
  - your shell picks up the Miniconda PATH added to .bashrc
Quick checks
Terminal window
```
code --version
git --version
docker ps
conda --version
```
If those print versions or basic output, your base environment is ready.

Initial System Setup Tools

At a glance, these tools fall into three groups that together make up a digital pathology workstation:

OS & Core: Linux/WSL or macOS terminal, Git, Docker, SSH/remote development.
Code & AI: VS Code, Python + Miniconda/conda, NVIDIA GPU & CUDA.
Pathology engines (new): OpenSlide and libvips, QuPath, and Java (JDK) so you can actually open, view, and annotate whole‑slide images.

Linux environment (native, WSL2, macOS terminal)

What it is

The OS context where all your tools run. In practice:

Native Linux (for example Ubuntu Desktop / Server)
Windows with WSL2 (Linux userland inside Windows)
macOS terminal (Unix-like, close enough for most user-level tasks)

What it is (Simplified)

Think of this as the building where your lab lives.

Windows, macOS, Linux are different buildings.
Inside you place your equipment: Python, Docker, libvips, etc.
Most hospital servers and research clusters use the Linux building.

Reasons a clinician would want to use it

Almost all serious back-end systems and clusters you will touch are Linux.
Most ML examples and GitHub repos assume Linux commands and paths.
If you get comfortable in a Linux shell (native, WSL2, or macOS), working on hospital servers feels much less alien.
For a dedicated GPU workstation, native Ubuntu keeps drivers and Docker simpler than Windows.

Quick installation / setup code

On Windows (from an elevated PowerShell) to enable WSL2 with default distro:

wsl --install

On Ubuntu/Arch you normally install the whole OS from ISO; there is no single one-liner beyond boot + installer wizard. Use the official guides instead.

Official documentation

Ubuntu downloads: https://ubuntu.com/download
Ubuntu install tutorials (desktop/server): https://ubuntu.com/tutorials/install-ubuntu-desktop
WSL overview and install (Windows Subsystem for Linux): https://learn.microsoft.com/windows/wsl/install
Ubuntu on WSL2 tutorial: https://ubuntu.com/tutorials/install-ubuntu-on-wsl2-on-windows-10

Visual Studio Code (VS Code)

What it is

A cross-platform editor / lightweight IDE to edit code, configs, notes, and use Git, with a built-in terminal and powerful extensions.

What it is (Simplified)

Think of VS Code as a good lab notebook and pen for code:

One place to write scripts and notes
See your project folders
Run commands in a small terminal pane

Reasons a clinician would want to use it

Same tool on Windows, macOS, Linux, and inside WSL2.
Nice Git integration for committing and reviewing changes.
Great Python support and Jupyter integration.
With “Remote” extensions, you can edit files on a remote GPU server from your laptop.

Quick installation / setup code

Normally you just download the installer from the website. On Ubuntu, after adding Microsoft’s repo (as the script does), you can:

sudo apt update
sudo apt install -y code

On Arch (community repo):

sudo pacman -Syu --noconfirm
sudo pacman -S --noconfirm code

After install:

Extensions: “Python”, “Docker”, “Remote - SSH”, “GitHub Pull Requests”, “YAML”, “Quarto” (if writing notebooks).
Enable autosave (File -> Auto Save) and set a default formatter (Prettier/Black).
Use “Remote - SSH” to work on GPU servers without copying files locally.

Official documentation

VS Code download (all platforms): https://code.visualstudio.com/
VS Code documentation / getting started: https://code.visualstudio.com/docs

Python + Miniconda / conda

What it is

Python is the main programming language you will use for data handling, tiling, ML, and evaluation.
Miniconda / conda is a package and environment manager that installs Python and keeps each project’s dependencies in its own environment.

What it is (Simplified)

Python is the language you talk to the computer in.
conda is the medicine cabinet system that keeps each project’s drugs (packages) in its own labeled drawer, so they do not mix.

Reasons a clinician would want to use it

Most modern pathology ML tooling (PyTorch, MONAI, scikit-learn, etc.) is Python-based.
Separate environments prevent “I installed this for project A and broke project B”.
An environment.yml or requirements.txt makes it easy to recreate a working setup months later or on another machine.

Recommended base Conda environment for this pipeline

Most code examples in Steps 1–2 assume a small set of Python packages:

numpy
pandas
scikit-learn
matplotlib
Pillow
openslide-python

A minimal environment.yml that works for the examples up to Step 2 could look like this:

name: histopath-core
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - pip
  - numpy
  - pandas
  - scikit-learn
  - matplotlib
  - pillow
  - pip:
      - openslide-python

Save this file as environment.yml at the root of your project and create the environment with:

conda env create -f environment.yml
conda activate histopath-core

You can add extra packages later (for example opencv-python in Step 3), then export an updated file with:

conda env export > environment.yml

Quick installation / setup code

After installing Miniconda from the official installer:

# create a project environment
conda create -n dcp-env python=3.11

# activate it
conda activate dcp-env

# install some common packages
conda install numpy pandas matplotlib

# optional: add conda-forge for broader packages
conda config --add channels conda-forge
conda config --set channel_priority strict

The all-in-one script above already installs Miniconda into ~/miniconda3 and adds it to .bashrc.

Official documentation

Miniconda overview and downloads: https://docs.anaconda.com/miniconda/
General conda documentation (getting started, managing environments): https://docs.conda.io/projects/conda/en/latest/user-guide/index.html

Git

What it is

A version control system that tracks changes to files in a project, lets you create named snapshots (“commits”), and roll back if needed.

What it is (Simplified)

Git is a time machine for your project folder:

Every save (commit) records what changed and a short message.
You can later say “show me the project as it looked last Monday”.

Reasons a clinician would want to use it

Undo bad changes without losing everything.
See exactly what changed between two versions of a pipeline.
Share code and configs with collaborators via GitHub/GitLab.
Necessary if you want your computational pipelines to be reproducible and reviewable.

Quick installation / setup code

On Ubuntu/Debian:

sudo apt update
sudo apt install -y git

On Arch:

sudo pacman -Syu --noconfirm
sudo pacman -S --noconfirm git

Basic first-time use in a project directory:

git init
git add .
git commit -m "Initial snapshot"

Configure identity and GitHub access:

git config --global user.name "Your Name"
git config --global user.email "you@example.com"
git config --global init.defaultBranch main   # optional but recommended

# create an SSH key
ssh-keygen -t ed25519 -C "you@example.com"
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519

# copy the public key and paste into GitHub Settings > SSH and GPG keys
cat ~/.ssh/id_ed25519.pub

# test
ssh -T git@github.com

Tips:

git status to see changes; git log --oneline to review history.
Use GitHub CLI (gh auth login) if you prefer HTTPS/pat flows.
Set git config --global pull.rebase false (or true) depending on your workflow.

Official documentation

Git install page (all platforms): https://git-scm.com/downloads
“Installing Git” in the Pro Git book: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
GitHub “Set up Git” (nice walkthrough): https://docs.github.com/en/get-started/getting-started-with-git/set-up-git

Docker / containers

What it is

Docker is a platform for building and running containers: packaged environments that include your code plus all required system and Python dependencies.

What it is (Simplified)

A container is like a small, self-contained lab room in a box:

Same tools and reagents wherever you ship it
Less “it works on my machine, not on the server”

Reasons a clinician would want to use it

Freeze a working environment so you can rerun or deploy it later without reinstalling everything.
Give IT a concrete artifact (“run this container”) instead of a long “how to set up my pipeline” document.
Aligns with how many hospital IT teams run internal services.

Quick installation / setup code

On Ubuntu (simple case):

sudo apt update
sudo apt install -y docker.io
sudo systemctl enable --now docker
sudo usermod -aG docker "$USER"   # then log out and back in

On Arch:

sudo pacman -Syu --noconfirm
sudo pacman -S --noconfirm docker
sudo systemctl enable --now docker
sudo usermod -aG docker "$USER"

Basic check:

docker run hello-world

If Docker needs sudo, add yourself to the docker group and re-login:

sudo usermod -aG docker "$USER"
newgrp docker

Official documentation

Docker Engine install docs (Linux): https://docs.docker.com/engine/install/
Docker Desktop (for dev use on Windows/macOS/WSL): https://www.docker.com/products/docker-desktop

NVIDIA GPU & CUDA (high-level)

What it is

An NVIDIA GPU is a hardware accelerator for heavy compute tasks (matrix multiplications, convolutions, etc.).
CUDA Toolkit is NVIDIA’s software stack that ML libraries use to talk to the GPU.

What it is (Simplified)

A GPU is like a room full of many simple assistants who all do tiny calculations at the same time.
CUDA is the instruction set and tools they understand.

Reasons a clinician would want to use it

Cuts training time from days to hours for realistic WSI-scale models.
Makes it feasible to run deep models on large cohorts on your own hardware.
Allows more experimentation with model architecture and hyperparameters on real data, not just toy patches.

Quick installation / setup code

This is hardware- and distro-specific. In practice you will:

Install an appropriate NVIDIA driver for your card and OS.
Install a CUDA Toolkit version compatible with your deep learning framework (or use a container image that bundles the right stack).

The exact commands change over time; rely on official docs and framework “get started” pages.

Typical sanity check once things are installed:

nvidia-smi

and in Python:

import torch
print(torch.cuda.is_available())

Official documentation

CUDA Toolkit docs and downloads: https://developer.nvidia.com/cuda-toolkit
NVIDIA documentation hub: https://docs.nvidia.com/

SSH and remote development

What it is

SSH (OpenSSH) is a protocol and toolset to securely log into another machine and run commands.
VS Code Remote SSH lets you use VS Code to edit files on that remote machine.

What it is (Simplified)

SSH is a secure hallway to another computer:

you stay at your desk,
but your commands run in the hospital server room where the slides and GPUs live.

Reasons a clinician would want to use it

Run code close to where WSIs and databases live instead of copying terabytes to your laptop.
Use your familiar editor (VS Code) while the work runs on a powerful remote server.
This is how IT will usually give you access to institutional compute resources.

Quick installation / setup code

On most Linux/macOS systems, an SSH client is already installed.

On Ubuntu (if needed):

sudo apt update
sudo apt install -y openssh-client

On Arch:

sudo pacman -Syu --noconfirm
sudo pacman -S --noconfirm openssh

Basic usage:

ssh yourname@your-hospital-server

For VS Code Remote SSH, install the “Remote - SSH” extension from the VS Code marketplace and follow its first-run prompts.

Official documentation

OpenSSH project page: https://www.openssh.com/
OpenSSH manual pages (e.g. ssh): https://man.openbsd.org/ssh
Microsoft OpenSSH on Windows docs (if you ever need the server side there): https://learn.microsoft.com/windows-server/administration/openssh/openssh_overview
VS Code Remote SSH documentation: https://code.visualstudio.com/docs/remote/ssh

OpenSlide & libvips (WSI libraries)

What it is

Two core C/C++ imaging libraries for whole-slide images:

OpenSlide – low-level reader for scanner formats (for example .svs, .ndpi).
libvips – high-performance image processing engine used for tiling, resizing, and format conversion.

Python packages such as openslide-python and command-line tiling tools depend on these system libraries being installed correctly.

What it is (Simplified)

These are the device drivers for your microscope images:

They translate vendor slide formats into pixels your code can read.
Without them, pip install openslide-python or libvips-based pipelines will error or crash.

Reasons a clinician would want to use it

Essential for Step 2 – Slides & Viewing and Step 3 – Preprocessing & QC: viewing slides, generating thumbnails, cropping/tiling, and converting formats.
Enable Python scripts (for example VIEW-01 and QC-01) to read the exact scanner files that pathology scanners produce.
Form the backbone for browser viewers like OpenSeadragon by preparing multi-resolution image tiles.

Quick installation / setup code

On Ubuntu / Debian:

sudo apt update
sudo apt install -y libopenslide0 libvips-tools

On macOS (Homebrew):

brew install openslide vips

On Windows:

Download:
- OpenSlide: download the Windows binary (for example openslide-win64-xxxx.zip) from https://openslide.org/download/.
- libvips: download the Windows build (for example vips-dev-w64-all.zip) from the libvips GitHub Releases page at https://github.com/libvips/libvips/releases.
Install (crucial):
- Unzip both archives to a permanent location, for example C:\tools\openslide and C:\tools\vips.
- Update your PATH so Python and other tools can find the libraries:
  - Search Windows for “Edit the system environment variables” → “Environment Variables…” → under “System variables” select Path → “Edit” → “New”.
  - Add: C:\tools\openslide\bin
  - Add: C:\tools\vips\bin
- Restart your terminal or VS Code so the changes take effect.

Basic check in Python (inside your environment):

import openslide
print(openslide.__version__)

If this import fails, your system libraries are not configured correctly.

Official documentation

OpenSlide home and downloads: https://openslide.org/
OpenSlide Python API docs: https://openslide.org/api/python/
libvips project and docs: https://libvips.github.io/libvips/

QuPath (viewer and annotation)

What it is

QuPath is a cross-platform graphical application for viewing, annotating, and running basic analyses on whole-slide images.

What it is (Simplified)

Think of QuPath as your digital microscope and annotation pen:

Open and scroll around slides just like on a scanner viewer.
Draw regions, tiles, and cell-level annotations for training and evaluation.

Reasons a clinician would want to use it

You can actually see your data in Step 2, not just trust command-line tools.
Provides the main interface for Step 4 – Annotation & Labeling (drawing tumor regions, masks, ROIs).
Even in unsupervised workflows, you will often return to QuPath later (for example in Step 7) to visualise clusters, heatmaps, or model outputs.
Widely used in research and education; many tutorials and community scripts exist.

Quick installation / setup code

QuPath is distributed as a desktop application for Windows, macOS, and Linux. There is no single package-manager command for all platforms; instead:

Download the installer or archive for your OS from the QuPath website: https://qupath.github.io/
Follow the platform-specific instructions (for example .dmg on macOS, installer on Windows).
On Linux you may run a .jar or bundled app image depending on the distribution.

After installation, confirm that QuPath launches and can open a test .svs or .ndpi slide.

Official documentation

QuPath home and downloads: https://qupath.github.io/
QuPath documentation: https://qupath.readthedocs.io/

Java (JDK for QuPath plugins)

What it is

The Java Development Kit (JDK) provides the Java runtime and tooling that QuPath and many of its advanced plugins (for example StarDist) rely on.

What it is (Simplified)

Java is the engine room under QuPath:

QuPath needs Java to run at all.
Many plugins bundle or require specific Java versions for reliable behaviour.

Reasons a clinician would want to use it

Ensures QuPath starts reliably on all platforms.
Required for more advanced workflows: scripting, deep-learning-based segmentation plugins, or batch processing tools.
Avoids hard-to-debug crashes where QuPath fails silently because Java is missing or incompatible.

Quick installation / setup code

First, check whether Java is already available:

java -version

If this reports a Java version (for example openjdk version "17..."), you are generally ready for QuPath. If the command is not found or you have a very old version, install a current JDK:

On Ubuntu / Debian:

sudo apt update
sudo apt install -y default-jdk

On macOS (Homebrew):

brew install openjdk

On Windows:

Install a current LTS JDK (for example from Adoptium/Temurin or your institution’s preferred vendor).
After installation, open a new terminal and re-run java -version to confirm.

When configuring QuPath or plugins, prefer the Java version they recommend in their documentation.

Official documentation

General Java SE documentation: https://docs.oracle.com/javase/
Eclipse Adoptium (Temurin JDK builds): https://adoptium.net/