Fixed-Confusion NOVOmetric Bootstrap CIs

novo_boot_ci() estimates the precision of an observed binary classification effect by resampling the confusion matrix. The model is held fixed; only sampling variation in the classification outcome is modelled.

Fixed-confusion bootstrap: concept and scope

Standard bootstrap confidence intervals for classification performance resample observations and refit the model on each resample - capturing both sampling variation and model-selection variability. The fixed-confusion NOVOmetric bootstrap takes a different position: once the model is trained and the confusion matrix is observed, the question is how precisely that matrix characterises the true classification performance in the population.

The fixed-confusion approach:

Expands the 2 x 2 confusion matrix to a row-per-observation table of (actual, predicted) pairs.
On each bootstrap replicate, resamples k rows with replacement from that table (preserving the observed joint distribution).
Computes ESS, sensitivity, specificity, and other metrics for each replicate, yielding a model distribution.
Separately resamples actual and predicted labels independently (breaking their association) to build a chance distribution.
Declares novometric significance (Axiom 1) when the model 95% CI lower bound exceeds the chance 95% CI upper bound.

What this CI captures:

Sampling precision of the observed confusion matrix proportions.
Whether the observed effect could plausibly be a chance result.

What it does not capture:

Model-selection variability (different data -> different rule).
Overfitting: if the model was overfit, the confusion matrix used as input understates the true error rate.
Uncertainty in the confusion matrix structure (cells are treated as fixed parameters, not as random variables with their own uncertainty).

`novo_boot_ci()` call structure

novo_boot_ci(
  x,                       # 2x2 integer matrix: actual (rows) x predicted (cols)
  nboot       = 5000L,     # bootstrap replicates
  seed        = NULL,      # integer seed for reproducibility
  sample_frac = 0.5,       # fraction of n sampled per replicate (NOVOboot default)
  probs       = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
  alternative = "two.sided"
)

The input x must be a 2 x 2 integer matrix constructed with byrow = TRUE using the [actual, predicted] convention:

         Predicted 0   Predicted 1
Actual 0     TN            FP
Actual 1     FN            TP

Worked example: myeloma MINDENOM=1 confusion

The myeloma MINDENOM=1 pruned tree produces this confusion matrix (verified against CTA.exe):

library(oda)

# Myeloma MINDENOM=1 pruned tree confusion (actual x predicted, byrow = TRUE)
# Rows: actual class (0 and 1). Columns: predicted class (0 and 1).
conf <- matrix(
  c(92, 43,   # actual 0: 92 correct (TN), 43 wrong (FP)
    21, 30),  # actual 1: 21 wrong (FN), 30 correct (TP)
  nrow = 2L, byrow = TRUE,
  dimnames = list(Actual    = c("0", "1"),
                  Predicted = c("0", "1"))
)
print(conf)
#>       Predicted
#> Actual  0  1
#>      0 92 43
#>      1 21 30

Note: no myeloma fit is needed - the confusion matrix is hardcoded from the known CTA.exe canonical output.

set.seed(42L)
ci <- novo_boot_ci(conf, nboot = 500L, seed = 42L)
print(ci)
#> Novometric fixed-confusion bootstrap
#>   n = 186   k = 93   nboot = 500   sample_frac = 0.50
#> 
#> Confusion matrix (actual x predicted):
#>       Predicted
#> Actual  0  1
#>      0 92 43
#>      1 21 30
#> 
#> Observed:  ESS = 26.97%   Mean PAC = 63.49%
#> 
#> 95% CI (2.5% -- 97.5%):
#>   ESS (%)            Model [  1.35,  42.54]  Chance [-22.14,  20.99]  Overlap: TRUE
#>   Mean PAC (%)       Model [ 50.67,  71.27]  Chance [ 38.93,  60.50]  Overlap: TRUE
#>   Sensitivity (%)    Model [ 37.70,  74.07]  Chance [ 23.11,  59.26]  Overlap: TRUE
#>   Specificity (%)    Model [ 53.83,  76.33]  Chance [ 47.76,  71.22]  Overlap: TRUE
#> 
#> Novometric significance (ESS CI non-overlap): FALSE

Interpreting the CI: ESS bounds

The $ci element provides fixed 95% CI bounds for each metric under both model and chance distributions:

ci$ci[ci$ci$metric == "ess", ]
#>   metric model_lower model_upper chance_lower chance_upper overlap
#> 4    ess    1.349707    42.54235    -22.13714      20.9901    TRUE

model_lower and model_upper: 2.5th and 97.5th percentiles of ESS under the model bootstrap (resampling the observed confusion).
chance_lower and chance_upper: same, under the chance bootstrap (independent resampling of actual and predicted labels).
overlap: logical; TRUE if the two 95% CIs overlap.

Novometric significance (Axiom 1): ci$significant == TRUE when the model 95% CI lower bound exceeds the chance 95% CI upper bound - the intervals are entirely non-overlapping, providing strong evidence that the observed effect is not a chance result.

cat("Significant (Axiom 1 CI non-overlap):", ci$significant, "\n")
#> Significant (Axiom 1 CI non-overlap): FALSE

The `sample_frac` parameter

sample_frac = 0.5 (the NOVOboot default) samples half the observations per replicate. This conservatively widens the CIs, making the non-overlap criterion harder to satisfy and reducing the false-positive rate in small samples.

For large n, sample_frac can be increased toward 1.0 (standard bootstrap). For publication analyses use nboot = 5000L (the default).

What this CI does not capture

Model-selection variability: novo_boot_ci() takes the confusion matrix as fixed input. If the model was chosen based on a data-driven search (as it was in CTA), the reported confusion matrix reflects in-sample performance. The CI does not adjust for the fact that a different sample would have selected a different rule.

Refit uncertainty: No refitting occurs. For out-of-sample performance estimates, use LOO (loo = "on" in oda_fit() or loo = "stable" in cta_fit()) or held-out validation before calling novo_boot_ci() on the out-of-sample confusion.

Multi-class extension: novo_boot_ci() operates on 2 x 2 confusion matrices only (binary class). Multi-class extension is not yet implemented.

Fixed-confusion bootstrap: concept and scope

novo_boot_ci() call structure

Worked example: myeloma MINDENOM=1 confusion

Interpreting the CI: ESS bounds

The sample_frac parameter

What this CI does not capture

Further reading

`novo_boot_ci()` call structure

The `sample_frac` parameter