Problem
Binary skin-lesion classifiers can appear strong on internal validation while degrading on external datasets with different acquisition and patient characteristics.
Cross-dataset skin lesion classification under dataset shift
Project Site
LesionShiftAI benchmarks how skin-lesion classifiers trained on ISIC 2019 generalize to HAM10000. This site extends the repository README with methods, pinned results, reproducibility flow, and implementation context.
Overview
Binary skin-lesion classifiers can appear strong on internal validation while degrading on external datasets with different acquisition and patient characteristics.
Training and validation are built from ISIC 2019. External generalization is measured on HAM10000 without domain adaptation.
Four pipelines are benchmarked: baseline CNN, 5-fold ensemble CNN, pretrained ViT-B16, and pretrained ViT-L16.
Each run exports checkpoints, prediction CSVs, metrics JSON, ROC and PR curves, and generalization-gap summaries.
Pinned Snapshot
Best ISIC ROC AUC
0.9658
Vision Transformer (ViT-L16) on internal validation
Best HAM10000 ROC AUC
0.8470
Vision Transformer (ViT-B16) on external test
Largest Recall Drop
32.03%
Validation-to-external shift across models
Pinned runs: Baseline 20260428_042520, Ensemble ensemble_ens_20260429_b, ViT-B16 20260423_105922, ViT-L16 large_vit_b16_isic_to_ham.