Skip to contents

novo_boot_ci() estimates the precision of an observed binary classification effect by resampling the confusion matrix. The model is held fixed; only sampling variation in the classification outcome is modelled.

Fixed-confusion bootstrap: concept and scope

Standard bootstrap confidence intervals for classification performance resample observations and refit the model on each resample - capturing both sampling variation and model-selection variability. The fixed-confusion NOVOmetric bootstrap takes a different position: once the model is trained and the confusion matrix is observed, the question is how precisely that matrix characterises the true classification performance in the population.

The fixed-confusion approach:

  1. Expands the 2 x 2 confusion matrix to a row-per-observation table of (actual, predicted) pairs.
  2. On each bootstrap replicate, resamples k rows with replacement from that table (preserving the observed joint distribution).
  3. Computes ESS, sensitivity, specificity, and other metrics for each replicate, yielding a model distribution.
  4. Separately resamples actual and predicted labels independently (breaking their association) to build a chance distribution.
  5. Declares novometric significance (Axiom 1) when the model 95% CI lower bound exceeds the chance 95% CI upper bound.

What this CI captures:

  • Sampling precision of the observed confusion matrix proportions.
  • Whether the observed effect could plausibly be a chance result.

What it does not capture:

  • Model-selection variability (different data -> different rule).
  • Overfitting: if the model was overfit, the confusion matrix used as input understates the true error rate.
  • Uncertainty in the confusion matrix structure (cells are treated as fixed parameters, not as random variables with their own uncertainty).

novo_boot_ci() call structure

novo_boot_ci(
  x,                       # 2x2 integer matrix: actual (rows) x predicted (cols)
  nboot       = 5000L,     # bootstrap replicates
  seed        = NULL,      # integer seed for reproducibility
  sample_frac = 0.5,       # fraction of n sampled per replicate (NOVOboot default)
  probs       = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
  alternative = "two.sided"
)

The input x must be a 2 x 2 integer matrix constructed with byrow = TRUE using the [actual, predicted] convention:

         Predicted 0   Predicted 1
Actual 0     TN            FP
Actual 1     FN            TP

Worked example: myeloma MINDENOM=1 confusion

The myeloma MINDENOM=1 pruned tree produces this confusion matrix (verified against CTA.exe):

library(oda)

# Myeloma MINDENOM=1 pruned tree confusion (actual x predicted, byrow = TRUE)
# Rows: actual class (0 and 1). Columns: predicted class (0 and 1).
conf <- matrix(
  c(92, 43,   # actual 0: 92 correct (TN), 43 wrong (FP)
    21, 30),  # actual 1: 21 wrong (FN), 30 correct (TP)
  nrow = 2L, byrow = TRUE,
  dimnames = list(Actual    = c("0", "1"),
                  Predicted = c("0", "1"))
)
print(conf)
#>       Predicted
#> Actual  0  1
#>      0 92 43
#>      1 21 30

Note: no myeloma fit is needed - the confusion matrix is hardcoded from the known CTA.exe canonical output.

set.seed(42L)
ci <- novo_boot_ci(conf, nboot = 500L, seed = 42L)
print(ci)
#> Novometric fixed-confusion bootstrap
#>   n = 186   k = 93   nboot = 500   sample_frac = 0.50
#> 
#> Confusion matrix (actual x predicted):
#>       Predicted
#> Actual  0  1
#>      0 92 43
#>      1 21 30
#> 
#> Observed:  ESS = 26.97%   Mean PAC = 63.49%
#> 
#> 95% CI (2.5% -- 97.5%):
#>   ESS (%)            Model [  1.35,  42.54]  Chance [-22.14,  20.99]  Overlap: TRUE
#>   Mean PAC (%)       Model [ 50.67,  71.27]  Chance [ 38.93,  60.50]  Overlap: TRUE
#>   Sensitivity (%)    Model [ 37.70,  74.07]  Chance [ 23.11,  59.26]  Overlap: TRUE
#>   Specificity (%)    Model [ 53.83,  76.33]  Chance [ 47.76,  71.22]  Overlap: TRUE
#> 
#> Novometric significance (ESS CI non-overlap): FALSE

Interpreting the CI: ESS bounds

The $ci element provides fixed 95% CI bounds for each metric under both model and chance distributions:

ci$ci[ci$ci$metric == "ess", ]
#>   metric model_lower model_upper chance_lower chance_upper overlap
#> 4    ess    1.349707    42.54235    -22.13714      20.9901    TRUE
  • model_lower and model_upper: 2.5th and 97.5th percentiles of ESS under the model bootstrap (resampling the observed confusion).
  • chance_lower and chance_upper: same, under the chance bootstrap (independent resampling of actual and predicted labels).
  • overlap: logical; TRUE if the two 95% CIs overlap.

Novometric significance (Axiom 1): ci$significant == TRUE when the model 95% CI lower bound exceeds the chance 95% CI upper bound - the intervals are entirely non-overlapping, providing strong evidence that the observed effect is not a chance result.

cat("Significant (Axiom 1 CI non-overlap):", ci$significant, "\n")
#> Significant (Axiom 1 CI non-overlap): FALSE

The sample_frac parameter

sample_frac = 0.5 (the NOVOboot default) samples half the observations per replicate. This conservatively widens the CIs, making the non-overlap criterion harder to satisfy and reducing the false-positive rate in small samples.

For large n, sample_frac can be increased toward 1.0 (standard bootstrap). For publication analyses use nboot = 5000L (the default).

What this CI does not capture

Model-selection variability: novo_boot_ci() takes the confusion matrix as fixed input. If the model was chosen based on a data-driven search (as it was in CTA), the reported confusion matrix reflects in-sample performance. The CI does not adjust for the fact that a different sample would have selected a different rule.

Refit uncertainty: No refitting occurs. For out-of-sample performance estimates, use LOO (loo = "on" in oda_fit() or loo = "stable" in cta_fit()) or held-out validation before calling novo_boot_ci() on the out-of-sample confusion.

Multi-class extension: novo_boot_ci() operates on 2 x 2 confusion matrices only (binary class). Multi-class extension is not yet implemented.

Further reading

  • ?novo_boot_ci - complete function documentation with the myeloma example
  • articles/cta-basics - CTA fitting and confusion table
  • Yarnold PR (2020). Reformulating the First Axiom of Novometric Theory. Optimal Data Analysis 9, 7-8.
  • Yarnold PR, Soltysik RC (2016). Maximizing Predictive Accuracy. ODA Books.