Skip to content

Initial System Setup

Initial System Setup: your lab computer and tools

Section titled “Initial System Setup: your lab computer and tools”

This is where you get your “computational lab” ready: installing Python and a few core libraries, choosing a notebook or editor, and creating a clean project folder so every file has a sensible home. Most of the time you only need to do this once per machine, and then you can reuse the same setup for many projects.

Technical name: Initial System Setup

All-in-one Linux setup script (Ubuntu & Arch)

Section titled “All-in-one Linux setup script (Ubuntu & Arch)”

This is a personal bootstrap script to get a fresh Linux machine ready for digital/computational pathology work.

It will:

  • detect whether you’re on Ubuntu/Debian or an Arch-based distro (Arch, Manjaro, EndeavourOS)
  • install:
    • basic build tools
    • git, curl, wget
    • Docker
    • Visual Studio Code
    • Miniconda (Python + conda env manager)

It does not install NVIDIA drivers or CUDA (those are hardware-specific).


Do this on the Linux machine you want to prepare (native Ubuntu/Arch or inside WSL2).

  1. Open a terminal

    • Ubuntu: “Terminal” from the app menu.
    • Arch: your usual terminal emulator.
    • WSL2: “Ubuntu” / your distro from the Start menu.
  2. Go to your home directory

    Terminal window
    cd ~
  3. Create a new file for the script

    Open it with a simple editor like nano:

    Terminal window
    nano setup_dcp_env.sh

    This opens an empty file called setup_dcp_env.sh.

  4. In your browser, select the entire script block (below), copy it:

    #!/usr/bin/env bash
    # setup_dcp_env.sh - quick environment bootstrap for Ubuntu/Debian and Arch
    set -euo pipefail
    echo "=== Digital / computational pathology environment bootstrap ==="
    if [ ! -f /etc/os-release ]; then
    echo "Cannot detect Linux distribution (no /etc/os-release). Exiting."
    exit 1
    fi
    # shellcheck disable=SC1091
    . /etc/os-release
    DISTRO_ID="${ID:-unknown}"
    echo "Detected distro: ${DISTRO_ID}"
    install_ubuntu_like() {
    echo "Running Ubuntu/Debian setup..."
    sudo apt update
    sudo apt install -y \
    build-essential \
    git \
    curl \
    wget \
    ca-certificates \
    software-properties-common
    # VS Code repo
    wget -qO- https://packages.microsoft.com/keys/microsoft.asc | \
    gpg --dearmor | \
    sudo tee /usr/share/keyrings/packages.microsoft.gpg >/dev/null
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/packages.microsoft.gpg] https://packages.microsoft.com/repos/code stable main" | \
    sudo tee /etc/apt/sources.list.d/vscode.list >/dev/null
    sudo apt update
    sudo apt install -y code
    # Docker (simple Ubuntu Docker package; adjust as needed)
    sudo apt install -y docker.io
    sudo systemctl enable --now docker
    sudo usermod -aG docker "$USER" || true
    }
    install_arch_like() {
    echo "Running Arch setup..."
    sudo pacman -Syu --noconfirm
    sudo pacman -S --noconfirm --needed \
    base-devel \
    git \
    curl \
    wget \
    ca-certificates \
    code \
    docker
    sudo systemctl enable --now docker
    sudo usermod -aG docker "$USER" || true
    }
    case "$DISTRO_ID" in
    ubuntu|debian)
    install_ubuntu_like
    ;;
    arch|manjaro|endeavouros)
    install_arch_like
    ;;
    *)
    echo "This script only supports Ubuntu/Debian and Arch-like distros right now."
    echo "Detected ID='$DISTRO_ID'. Exiting."
    exit 1
    ;;
    esac
    # Miniconda (user install)
    if [ ! -d "$HOME/miniconda3" ]; then
    echo "Installing Miniconda into $HOME/miniconda3 ..."
    TMP_INSTALLER="/tmp/miniconda.sh"
    curl -fsSL https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o "$TMP_INSTALLER"
    bash "$TMP_INSTALLER" -b -p "$HOME/miniconda3"
    rm -f "$TMP_INSTALLER"
    if ! grep -q 'miniconda3' "$HOME/.bashrc" 2>/dev/null; then
    echo 'export PATH="$HOME/miniconda3/bin:$PATH"' >> "$HOME/.bashrc"
    fi
    else
    echo "Miniconda directory already exists at $HOME/miniconda3, skipping."
    fi
    echo
    echo "Done."
    echo "- You may need to log out and back in for Docker group changes to take effect."
    echo "- Open a new shell so the Miniconda PATH in .bashrc is picked up."
  5. Go back to the terminal where nano is open and paste

    • Click inside the terminal window with nano.
    • Paste:
      • Often right-click → Paste, or
      • Shift+Insert depending on terminal.

    You should now see the full script inside nano.

  6. Save and exit the editor

    In nano:

    • Ctrl+O → Enter (save)
    • Ctrl+X (exit)
  7. Make the script executable

    Terminal window
    chmod +x setup_dcp_env.sh
  8. Run the script

    Terminal window
    ./setup_dcp_env.sh
    • Enter your password when sudo prompts.
    • Let it finish; watch for obvious errors.
  9. Log out and back in

    • Log out of your session (or reboot) and log back in.
    • This makes sure:
      • your docker group membership is active
      • your shell picks up the Miniconda PATH added to .bashrc
  10. Quick checks

    Terminal window
    code --version
    git --version
    docker ps
    conda --version

    If those print versions or basic output, your base environment is ready.

At a glance, these tools fall into three groups that together make up a digital pathology workstation:

  • OS & Core: Linux/WSL or macOS terminal, Git, Docker, SSH/remote development.
  • Code & AI: VS Code, Python + Miniconda/conda, NVIDIA GPU & CUDA.
  • Pathology engines (new): OpenSlide and libvips, QuPath, and Java (JDK) so you can actually open, view, and annotate whole‑slide images.

Linux environment (native, WSL2, macOS terminal)

Section titled “Linux environment (native, WSL2, macOS terminal)”

The OS context where all your tools run. In practice:

  • Native Linux (for example Ubuntu Desktop / Server)
  • Windows with WSL2 (Linux userland inside Windows)
  • macOS terminal (Unix-like, close enough for most user-level tasks)

Think of this as the building where your lab lives.

  • Windows, macOS, Linux are different buildings.
  • Inside you place your equipment: Python, Docker, libvips, etc.
  • Most hospital servers and research clusters use the Linux building.
  • Almost all serious back-end systems and clusters you will touch are Linux.
  • Most ML examples and GitHub repos assume Linux commands and paths.
  • If you get comfortable in a Linux shell (native, WSL2, or macOS), working on hospital servers feels much less alien.
  • For a dedicated GPU workstation, native Ubuntu keeps drivers and Docker simpler than Windows.

On Windows (from an elevated PowerShell) to enable WSL2 with default distro:

Terminal window
wsl --install

On Ubuntu/Arch you normally install the whole OS from ISO; there is no single one-liner beyond boot + installer wizard. Use the official guides instead.


A cross-platform editor / lightweight IDE to edit code, configs, notes, and use Git, with a built-in terminal and powerful extensions.

Think of VS Code as a good lab notebook and pen for code:

  • One place to write scripts and notes
  • See your project folders
  • Run commands in a small terminal pane
  • Same tool on Windows, macOS, Linux, and inside WSL2.
  • Nice Git integration for committing and reviewing changes.
  • Great Python support and Jupyter integration.
  • With “Remote” extensions, you can edit files on a remote GPU server from your laptop.

Normally you just download the installer from the website. On Ubuntu, after adding Microsoft’s repo (as the script does), you can:

Terminal window
sudo apt update
sudo apt install -y code

On Arch (community repo):

Terminal window
sudo pacman -Syu --noconfirm
sudo pacman -S --noconfirm code

After install:

  • Extensions: “Python”, “Docker”, “Remote - SSH”, “GitHub Pull Requests”, “YAML”, “Quarto” (if writing notebooks).
  • Enable autosave (File -> Auto Save) and set a default formatter (Prettier/Black).
  • Use “Remote - SSH” to work on GPU servers without copying files locally.

  • Python is the main programming language you will use for data handling, tiling, ML, and evaluation.
  • Miniconda / conda is a package and environment manager that installs Python and keeps each project’s dependencies in its own environment.
  • Python is the language you talk to the computer in.
  • conda is the medicine cabinet system that keeps each project’s drugs (packages) in its own labeled drawer, so they do not mix.
  • Most modern pathology ML tooling (PyTorch, MONAI, scikit-learn, etc.) is Python-based.
  • Separate environments prevent “I installed this for project A and broke project B”.
  • An environment.yml or requirements.txt makes it easy to recreate a working setup months later or on another machine.
Section titled “Recommended base Conda environment for this pipeline”

Most code examples in Steps 1–2 assume a small set of Python packages:

  • numpy
  • pandas
  • scikit-learn
  • matplotlib
  • Pillow
  • openslide-python

A minimal environment.yml that works for the examples up to Step 2 could look like this:

name: histopath-core
channels:
- conda-forge
- defaults
dependencies:
- python=3.11
- pip
- numpy
- pandas
- scikit-learn
- matplotlib
- pillow
- pip:
- openslide-python

Save this file as environment.yml at the root of your project and create the environment with:

Terminal window
conda env create -f environment.yml
conda activate histopath-core

You can add extra packages later (for example opencv-python in Step 3), then export an updated file with:

Terminal window
conda env export > environment.yml

After installing Miniconda from the official installer:

Terminal window
# create a project environment
conda create -n dcp-env python=3.11
# activate it
conda activate dcp-env
# install some common packages
conda install numpy pandas matplotlib
# optional: add conda-forge for broader packages
conda config --add channels conda-forge
conda config --set channel_priority strict

The all-in-one script above already installs Miniconda into ~/miniconda3 and adds it to .bashrc.


A version control system that tracks changes to files in a project, lets you create named snapshots (“commits”), and roll back if needed.

Git is a time machine for your project folder:

  • Every save (commit) records what changed and a short message.
  • You can later say “show me the project as it looked last Monday”.
  • Undo bad changes without losing everything.
  • See exactly what changed between two versions of a pipeline.
  • Share code and configs with collaborators via GitHub/GitLab.
  • Necessary if you want your computational pipelines to be reproducible and reviewable.

On Ubuntu/Debian:

Terminal window
sudo apt update
sudo apt install -y git

On Arch:

Terminal window
sudo pacman -Syu --noconfirm
sudo pacman -S --noconfirm git

Basic first-time use in a project directory:

Terminal window
git init
git add .
git commit -m "Initial snapshot"

Configure identity and GitHub access:

Terminal window
git config --global user.name "Your Name"
git config --global user.email "you@example.com"
git config --global init.defaultBranch main # optional but recommended
# create an SSH key
ssh-keygen -t ed25519 -C "you@example.com"
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
# copy the public key and paste into GitHub Settings > SSH and GPG keys
cat ~/.ssh/id_ed25519.pub
# test
ssh -T git@github.com

Tips:

  • git status to see changes; git log --oneline to review history.
  • Use GitHub CLI (gh auth login) if you prefer HTTPS/pat flows.
  • Set git config --global pull.rebase false (or true) depending on your workflow.

Docker is a platform for building and running containers: packaged environments that include your code plus all required system and Python dependencies.

A container is like a small, self-contained lab room in a box:

  • Same tools and reagents wherever you ship it
  • Less “it works on my machine, not on the server”
  • Freeze a working environment so you can rerun or deploy it later without reinstalling everything.
  • Give IT a concrete artifact (“run this container”) instead of a long “how to set up my pipeline” document.
  • Aligns with how many hospital IT teams run internal services.

On Ubuntu (simple case):

Terminal window
sudo apt update
sudo apt install -y docker.io
sudo systemctl enable --now docker
sudo usermod -aG docker "$USER" # then log out and back in

On Arch:

Terminal window
sudo pacman -Syu --noconfirm
sudo pacman -S --noconfirm docker
sudo systemctl enable --now docker
sudo usermod -aG docker "$USER"

Basic check:

Terminal window
docker run hello-world

If Docker needs sudo, add yourself to the docker group and re-login:

Terminal window
sudo usermod -aG docker "$USER"
newgrp docker

  • An NVIDIA GPU is a hardware accelerator for heavy compute tasks (matrix multiplications, convolutions, etc.).
  • CUDA Toolkit is NVIDIA’s software stack that ML libraries use to talk to the GPU.
  • A GPU is like a room full of many simple assistants who all do tiny calculations at the same time.
  • CUDA is the instruction set and tools they understand.
  • Cuts training time from days to hours for realistic WSI-scale models.
  • Makes it feasible to run deep models on large cohorts on your own hardware.
  • Allows more experimentation with model architecture and hyperparameters on real data, not just toy patches.

This is hardware- and distro-specific. In practice you will:

  • Install an appropriate NVIDIA driver for your card and OS.
  • Install a CUDA Toolkit version compatible with your deep learning framework (or use a container image that bundles the right stack).

The exact commands change over time; rely on official docs and framework “get started” pages.

Typical sanity check once things are installed:

Terminal window
nvidia-smi

and in Python:

import torch
print(torch.cuda.is_available())

  • SSH (OpenSSH) is a protocol and toolset to securely log into another machine and run commands.
  • VS Code Remote SSH lets you use VS Code to edit files on that remote machine.

SSH is a secure hallway to another computer:

  • you stay at your desk,
  • but your commands run in the hospital server room where the slides and GPUs live.
  • Run code close to where WSIs and databases live instead of copying terabytes to your laptop.
  • Use your familiar editor (VS Code) while the work runs on a powerful remote server.
  • This is how IT will usually give you access to institutional compute resources.

On most Linux/macOS systems, an SSH client is already installed.

On Ubuntu (if needed):

Terminal window
sudo apt update
sudo apt install -y openssh-client

On Arch:

Terminal window
sudo pacman -Syu --noconfirm
sudo pacman -S --noconfirm openssh

Basic usage:

Terminal window
ssh yourname@your-hospital-server

For VS Code Remote SSH, install the “Remote - SSH” extension from the VS Code marketplace and follow its first-run prompts.


Two core C/C++ imaging libraries for whole-slide images:

  • OpenSlide – low-level reader for scanner formats (for example .svs, .ndpi).
  • libvips – high-performance image processing engine used for tiling, resizing, and format conversion.

Python packages such as openslide-python and command-line tiling tools depend on these system libraries being installed correctly.

These are the device drivers for your microscope images:

  • They translate vendor slide formats into pixels your code can read.
  • Without them, pip install openslide-python or libvips-based pipelines will error or crash.
  • Essential for Step 2 – Slides & Viewing and Step 3 – Preprocessing & QC: viewing slides, generating thumbnails, cropping/tiling, and converting formats.
  • Enable Python scripts (for example VIEW-01 and QC-01) to read the exact scanner files that pathology scanners produce.
  • Form the backbone for browser viewers like OpenSeadragon by preparing multi-resolution image tiles.

On Ubuntu / Debian:

Terminal window
sudo apt update
sudo apt install -y libopenslide0 libvips-tools

On macOS (Homebrew):

Terminal window
brew install openslide vips

On Windows:

  • Download:
  • Install (crucial):
    • Unzip both archives to a permanent location, for example C:\tools\openslide and C:\tools\vips.
    • Update your PATH so Python and other tools can find the libraries:
      • Search Windows for “Edit the system environment variables” → “Environment Variables…” → under “System variables” select Path → “Edit” → “New”.
      • Add: C:\tools\openslide\bin
      • Add: C:\tools\vips\bin
    • Restart your terminal or VS Code so the changes take effect.

Basic check in Python (inside your environment):

import openslide
print(openslide.__version__)

If this import fails, your system libraries are not configured correctly.


QuPath is a cross-platform graphical application for viewing, annotating, and running basic analyses on whole-slide images.

Think of QuPath as your digital microscope and annotation pen:

  • Open and scroll around slides just like on a scanner viewer.
  • Draw regions, tiles, and cell-level annotations for training and evaluation.
  • You can actually see your data in Step 2, not just trust command-line tools.
  • Provides the main interface for Step 4 – Annotation & Labeling (drawing tumor regions, masks, ROIs).
  • Even in unsupervised workflows, you will often return to QuPath later (for example in Step 7) to visualise clusters, heatmaps, or model outputs.
  • Widely used in research and education; many tutorials and community scripts exist.

QuPath is distributed as a desktop application for Windows, macOS, and Linux. There is no single package-manager command for all platforms; instead:

  • Download the installer or archive for your OS from the QuPath website: https://qupath.github.io/
  • Follow the platform-specific instructions (for example .dmg on macOS, installer on Windows).
  • On Linux you may run a .jar or bundled app image depending on the distribution.

After installation, confirm that QuPath launches and can open a test .svs or .ndpi slide.


The Java Development Kit (JDK) provides the Java runtime and tooling that QuPath and many of its advanced plugins (for example StarDist) rely on.

Java is the engine room under QuPath:

  • QuPath needs Java to run at all.
  • Many plugins bundle or require specific Java versions for reliable behaviour.
  • Ensures QuPath starts reliably on all platforms.
  • Required for more advanced workflows: scripting, deep-learning-based segmentation plugins, or batch processing tools.
  • Avoids hard-to-debug crashes where QuPath fails silently because Java is missing or incompatible.

First, check whether Java is already available:

Terminal window
java -version

If this reports a Java version (for example openjdk version "17..."), you are generally ready for QuPath. If the command is not found or you have a very old version, install a current JDK:

On Ubuntu / Debian:

Terminal window
sudo apt update
sudo apt install -y default-jdk

On macOS (Homebrew):

Terminal window
brew install openjdk

On Windows:

  • Install a current LTS JDK (for example from Adoptium/Temurin or your institution’s preferred vendor).
  • After installation, open a new terminal and re-run java -version to confirm.

When configuring QuPath or plugins, prefer the Java version they recommend in their documentation.