Project Site

LesionShiftAI

LesionShiftAI benchmarks how skin-lesion classifiers trained on ISIC 2019 generalize to HAM10000. This site extends the repository README with methods, pinned results, reproducibility flow, and implementation context.

View Results Inspect Methods

Overview

Benchmark Framing

Problem

Binary skin-lesion classifiers can appear strong on internal validation while degrading on external datasets with different acquisition and patient characteristics.

Datasets

Training and validation are built from ISIC 2019. External generalization is measured on HAM10000 without domain adaptation.

Pipelines

Four pipelines are benchmarked: baseline CNN, 5-fold ensemble CNN, pretrained ViT-B16, and pretrained ViT-L16.

Artifacts

Each run exports checkpoints, prediction CSVs, metrics JSON, ROC and PR curves, and generalization-gap summaries.

Pinned Snapshot

Key Takeaways

Best ISIC ROC AUC

0.9658

Vision Transformer (ViT-L16) on internal validation

Best HAM10000 ROC AUC

0.8470

Vision Transformer (ViT-B16) on external test

Largest Recall Drop

32.03%

Validation-to-external shift across models

Pinned runs: Baseline 20260428_042520, Ensemble ensemble_ens_20260429_b, ViT-B16 20260423_105922, ViT-L16 large_vit_b16_isic_to_ham.