novo_boot_ci() estimates the precision of an observed
binary classification effect by resampling the confusion matrix. The
model is held fixed; only sampling variation in the classification
outcome is modelled.
Fixed-confusion bootstrap: concept and scope
Standard bootstrap confidence intervals for classification performance resample observations and refit the model on each resample - capturing both sampling variation and model-selection variability. The fixed-confusion NOVOmetric bootstrap takes a different position: once the model is trained and the confusion matrix is observed, the question is how precisely that matrix characterises the true classification performance in the population.
The fixed-confusion approach:
- Expands the 2 x 2 confusion matrix to a row-per-observation table of (actual, predicted) pairs.
- On each bootstrap replicate, resamples k rows with replacement from that table (preserving the observed joint distribution).
- Computes ESS, sensitivity, specificity, and other metrics for each replicate, yielding a model distribution.
- Separately resamples actual and predicted labels independently (breaking their association) to build a chance distribution.
- Declares novometric significance (Axiom 1) when the model 95% CI lower bound exceeds the chance 95% CI upper bound.
What this CI captures:
- Sampling precision of the observed confusion matrix proportions.
- Whether the observed effect could plausibly be a chance result.
What it does not capture:
- Model-selection variability (different data -> different rule).
- Overfitting: if the model was overfit, the confusion matrix used as input understates the true error rate.
- Uncertainty in the confusion matrix structure (cells are treated as fixed parameters, not as random variables with their own uncertainty).
novo_boot_ci() call structure
novo_boot_ci(
x, # 2x2 integer matrix: actual (rows) x predicted (cols)
nboot = 5000L, # bootstrap replicates
seed = NULL, # integer seed for reproducibility
sample_frac = 0.5, # fraction of n sampled per replicate (NOVOboot default)
probs = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
alternative = "two.sided"
)The input x must be a 2 x 2 integer matrix constructed
with byrow = TRUE using the
[actual, predicted] convention:
Predicted 0 Predicted 1
Actual 0 TN FP
Actual 1 FN TP
Worked example: myeloma MINDENOM=1 confusion
The myeloma MINDENOM=1 pruned tree produces this confusion matrix (verified against CTA.exe):
library(oda)
# Myeloma MINDENOM=1 pruned tree confusion (actual x predicted, byrow = TRUE)
# Rows: actual class (0 and 1). Columns: predicted class (0 and 1).
conf <- matrix(
c(92, 43, # actual 0: 92 correct (TN), 43 wrong (FP)
21, 30), # actual 1: 21 wrong (FN), 30 correct (TP)
nrow = 2L, byrow = TRUE,
dimnames = list(Actual = c("0", "1"),
Predicted = c("0", "1"))
)
print(conf)
#> Predicted
#> Actual 0 1
#> 0 92 43
#> 1 21 30Note: no myeloma fit is needed - the confusion matrix is hardcoded from the known CTA.exe canonical output.
set.seed(42L)
ci <- novo_boot_ci(conf, nboot = 500L, seed = 42L)
print(ci)
#> Novometric fixed-confusion bootstrap
#> n = 186 k = 93 nboot = 500 sample_frac = 0.50
#>
#> Confusion matrix (actual x predicted):
#> Predicted
#> Actual 0 1
#> 0 92 43
#> 1 21 30
#>
#> Observed: ESS = 26.97% Mean PAC = 63.49%
#>
#> 95% CI (2.5% -- 97.5%):
#> ESS (%) Model [ 1.35, 42.54] Chance [-22.14, 20.99] Overlap: TRUE
#> Mean PAC (%) Model [ 50.67, 71.27] Chance [ 38.93, 60.50] Overlap: TRUE
#> Sensitivity (%) Model [ 37.70, 74.07] Chance [ 23.11, 59.26] Overlap: TRUE
#> Specificity (%) Model [ 53.83, 76.33] Chance [ 47.76, 71.22] Overlap: TRUE
#>
#> Novometric significance (ESS CI non-overlap): FALSEInterpreting the CI: ESS bounds
The $ci element provides fixed 95% CI bounds for each
metric under both model and chance distributions:
ci$ci[ci$ci$metric == "ess", ]
#> metric model_lower model_upper chance_lower chance_upper overlap
#> 4 ess 1.349707 42.54235 -22.13714 20.9901 TRUE-
model_lowerandmodel_upper: 2.5th and 97.5th percentiles of ESS under the model bootstrap (resampling the observed confusion). -
chance_lowerandchance_upper: same, under the chance bootstrap (independent resampling of actual and predicted labels). -
overlap: logical; TRUE if the two 95% CIs overlap.
Novometric significance (Axiom 1):
ci$significant == TRUE when the model 95% CI lower bound
exceeds the chance 95% CI upper bound - the intervals are entirely
non-overlapping, providing strong evidence that the observed effect is
not a chance result.
cat("Significant (Axiom 1 CI non-overlap):", ci$significant, "\n")
#> Significant (Axiom 1 CI non-overlap): FALSEThe sample_frac parameter
sample_frac = 0.5 (the NOVOboot default) samples half
the observations per replicate. This conservatively widens the CIs,
making the non-overlap criterion harder to satisfy and reducing the
false-positive rate in small samples.
For large n, sample_frac can be increased toward 1.0
(standard bootstrap). For publication analyses use
nboot = 5000L (the default).
What this CI does not capture
Model-selection variability:
novo_boot_ci() takes the confusion matrix as fixed input.
If the model was chosen based on a data-driven search (as it was in
CTA), the reported confusion matrix reflects in-sample performance. The
CI does not adjust for the fact that a different sample would have
selected a different rule.
Refit uncertainty: No refitting occurs. For
out-of-sample performance estimates, use LOO (loo = "on" in
oda_fit() or loo = "stable" in
cta_fit()) or held-out validation before calling
novo_boot_ci() on the out-of-sample confusion.
Multi-class extension: novo_boot_ci()
operates on 2 x 2 confusion matrices only (binary class). Multi-class
extension is not yet implemented.
Further reading
-
?novo_boot_ci- complete function documentation with the myeloma example -
articles/cta-basics- CTA fitting and confusion table - Yarnold PR (2020). Reformulating the First Axiom of Novometric Theory. Optimal Data Analysis 9, 7-8.
- Yarnold PR, Soltysik RC (2016). Maximizing Predictive Accuracy. ODA Books.