Reproducibility

Environment and Run Workflow

The benchmark is designed for deterministic reruns with explicit config files, exported artifacts, and HPC-friendly launch scripts.

Local

Developer Setup (uv + Python ≥3.12)

# setup local environment with UV
uv sync --extra dev --python 3.12

# run unit, integration, and coverage tests
uv run pytest -m unit --no-cov
uv run pytest -m integration --no-cov
uv run pytest

# run ViT training as an example, not recommended locally
uv run python scripts/train_vit.py --config config/vit_b16.yml --threshold 0.5

HPC

Monsoon Runtime Flow

# on Monsoon (HPC)
module purge
module load anaconda3
source "$(conda info --base)/etc/profile.d/conda.sh"
conda env create -f environment.yml -p /scratch/$USER/conda/envs/lesionshiftai
conda activate /scratch/$USER/conda/envs/lesionshiftai

# on local machine, build and transfer baseline CNN requirements as an example
uv run python scripts/build_pyz.py
scp dist/lesionshiftai.pyz <USER>@monsoon.hpc.nau.edu:~/lesionshiftai
scp scripts/hpc/train_baseline_cnn.sh <USER>@monsoon.hpc.nau.edu:~/lesionshiftai
scp config/baseline_cnn.yml <USER>@monsoon.hpc.nau.edu:~/lesionshiftai

Launch

Training Jobs and Artifacts

Baseline

Submit sbatch train_baseline_cnn.sh to produce split metrics, predictions, and generalization-gap JSON.

Ensemble

Set ENSEMBLE_RUN_ID, submit sbatch train_ensemble_cnn.sh, then inspect member and aggregate artifacts.

ViT

Submit sbatch train_vit.sh for ViT checkpoints, metrics, curves, and resume metadata.