Results

Pinned Benchmark Snapshot

This page hard-codes the v1 benchmark snapshot from pinned experiment runs. This page gets updated when new runs outperform old runs.

Pinned Runs

Run IDs Used in This Snapshot

Baseline CNN

Run ID: 20260428_042520

Single-model ResNet50 baseline with a fixed 0.5 threshold.

Ensemble CNN (5-fold)

Run ID: ensemble_ens_20260429_b

Five member folds aggregated via mean malignancy probability.

Vision Transformer (ViT-B16)

Run ID: 20260423_105922

Pretrained ViT with warmup and cosine-style learning-rate decay.

Vision Transformer (ViT-L16)

Run ID: 20260430_165245

Pretrained larger ViT with warmup and cosine-style learning-rate decay.

Metric Tables

Validation and External-Test Performance

ISIC Validation Metrics

Model	Accuracy	Precision	Recall	F1	ROC AUC	PR AUC	Confusion Matrix
Baseline CNN	85.14%	55.56%	83.87%	66.84%	0.9292	0.8064	TN 3555 \| FP 607 \| FN 146 \| TP 759
Ensemble CNN (5-fold)	83.41%	52.78%	67.07%	59.08%	0.8527	0.6511	TN 3622 \| FP 543 \| FN 298 \| TP 607
Vision Transformer (ViT-B16)	92.40%	79.82%	76.91%	78.33%	0.9554	0.8713	TN 3986 \| FP 176 \| FN 209 \| TP 696
Vision Transformer (ViT-L16)	93.86%	84.45%	80.44%	82.40%	0.9658	0.9014	TN 4028 \| FP 134 \| FN 177 \| TP 728

HAM10000 External-Test Metrics

Model	Accuracy	Precision	Recall	F1	ROC AUC	PR AUC	Confusion Matrix
Baseline CNN	84.34%	61.77%	51.84%	56.37%	0.7965	0.6263	TN 7434 \| FP 627 \| FN 941 \| TP 1013
Ensemble CNN (5-fold)	83.69%	62.75%	40.43%	49.18%	0.7966	0.5739	TN 7592 \| FP 469 \| FN 1164 \| TP 790
Vision Transformer (ViT-B16)	88.96%	91.01%	48.16%	62.99%	0.8470	0.7059	TN 7968 \| FP 93 \| FN 1013 \| TP 941
Vision Transformer (ViT-L16)	89.85%	95.00%	50.61%	66.04%	0.8355	0.7159	TN 8009 \| FP 52 \| FN 965 \| TP 989

Shift Analysis

Generalization Gap Summary

Generalization Gap (Validation - External Test)

Model	Accuracy Gap	Precision Gap	Recall Gap	F1 Gap	ROC AUC Gap	PR AUC Gap
Baseline CNN	+0.80%	-6.20%	+32.03%	+10.47%	+13.27%	+18.01%
Ensemble CNN (5-fold)	-0.28%	-9.97%	+26.64%	+9.90%	+5.61%	+7.72%
Vision Transformer (ViT-B16)	+3.45%	-11.19%	+28.75%	+15.35%	+10.84%	+16.54%
Vision Transformer (ViT-L16)	+4.02%	-10.55%	+29.83%	+16.36%	+13.03%	+18.55%

Positive values indicate the model performed better on validation than on external test. This is expected under dataset shift and is one of the core benchmark signals.

Curves