Novometric bootstrap CI from a fixed 2x2 confusion matrix

Estimates the precision of an observed binary classification effect by comparing model and chance distributions via permutation/resampling bootstrap. Based on the NOVOboot methodology (Yarnold 2020; Yarnold & Soltysik 2016).

Fixed-confusion bootstrap: This function samples from the observed confusion matrix structure. It does not refit ODA or CTA models and does not estimate model-selection variability. The model distribution is generated by resampling paired (actual, predicted) rows from the expanded confusion table; the chance distribution is generated by independently resampling actual and predicted labels, breaking their association. Novometric significance (Axiom 1) is declared when the 95% confidence intervals for model and chance ESS do not overlap.

Usage

novo_boot_ci(x, ...)

# Default S3 method
novo_boot_ci(x,
             nboot       = 5000L,
             seed        = NULL,
             sample_frac = 0.5,
             probs       = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
             alternative = c("two.sided", "greater", "less"),
             ...)

# S3 method for class 'oda_fit'
novo_boot_ci(x,
             nboot       = 5000L,
             seed        = NULL,
             sample_frac = 0.5,
             probs       = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
             alternative = c("two.sided", "greater", "less"),
             ...)

# S3 method for class 'cta_tree'
novo_boot_ci(x,
             nboot       = 5000L,
             seed        = NULL,
             sample_frac = 0.5,
             probs       = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
             alternative = c("two.sided", "greater", "less"),
             node_id     = NULL,
             weighted    = FALSE,
             ...)

# S3 method for class 'cta_ort'
novo_boot_ci(x,
             nboot       = 5000L,
             seed        = NULL,
             sample_frac = 0.5,
             probs       = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
             alternative = c("two.sided", "greater", "less"),
             stratum_id  = NULL,
             weighted    = FALSE,
             ...)

# S3 method for class 'novo_boot_ci'
print(x, ...)

Arguments

x: For the default method: a 2x2 integer matrix, rows = actual class, columns = predicted class. Same [actual, predicted] convention as training_confusion in a cta_tree and as oda_confusion(). Use byrow = TRUE when constructing with matrix(). For S3 methods: a fitted model object (oda_fit, cta_tree, or cta_ort) from which the training confusion matrix is extracted.
nboot: Number of bootstrap replicates. Default 5000.
seed: Integer seed passed to set.seed, or NULL to use the current RNG state. Use a fixed seed for reproducibility.
sample_frac: Fraction of n sampled per replicate (with replacement). Default 0.5, matching NOVOboot.
probs: Quantile probability levels for the summary table.
alternative: Direction for exact Fisher p-values: "two.sided" (default), "greater", or "less".
node_id: Integer node id of a terminal (leaf) node in a cta_tree. When supplied, the bootstrap uses the class counts for that specific terminal node rather than the full-tree confusion. Only valid for novo_boot_ci.cta_tree.
stratum_id: Integer stratum id from cta_ort$strata. When supplied, the bootstrap uses the class counts for that single terminal LORT stratum rather than the full-LORT confusion. Only valid for novo_boot_ci.cta_ort.
weighted: Logical. When node_id or stratum_id is supplied, weighted = TRUE uses case-weighted class counts and weighted = FALSE (default) uses raw integer counts. Ignored for full-tree paths.
...: For the generic and S3 fit methods: additional arguments passed to novo_boot_ci.default. For print.novo_boot_ci: currently ignored.

Value

An object of class novo_boot_ci, a list with:

call: The matched call.
confusion: Input confusion matrix (integer, 2x2).
n: Total observations (sum(x)).
k: Observations sampled per replicate (round(sample_frac * n)).
nboot, sample_frac, probs, alternative: Input parameters.
has_zero_cells: Logical; TRUE if any cell of x is zero. Does not stop computation; NA propagates for affected metrics in affected replicates.
observed: Data frame with one row per metric. Columns: metric, value. Reports the observed (not bootstrapped) sensitivity, specificity, mean_pac, ess, odds_ratio, and risk_ratio computed directly from the input confusion matrix.
model: Data frame (nboot rows). Per-replicate model bootstrap distributions: sensitivity, specificity, mean_pac, ess (all in %), odds_ratio, risk_ratio, p_value. NA for undefined OR/RR.
chance: Data frame (nboot rows). Same columns as model. Generated by independently resampling actual and predicted labels (null of no classification association).
quantiles: Data frame (length(probs) rows). Quantiles of each metric for model and chance across all replicates, including p_value_model and p_value_chance.
ci: Data frame (one row per metric). Fixed 95% CI bounds (2.5th and 97.5th percentiles) for model and chance. Columns: metric, model_lower, model_upper, chance_lower, chance_upper, overlap.
significant: Logical scalar. TRUE if the ESS model 95% CI lower bound exceeds the ESS chance 95% CI upper bound - novometric Axiom 1 CI non-overlap criterion.
source_type: Character. Evidence provenance tag: "matrix", "oda_fit", "cta_tree", "cta_tree_node", "cta_ort", or "cta_ort_stratum".
source_id: Integer or NA. Node or stratum id when evidence came from a specific sub-unit; NA for full-tree paths.
weighted: Logical or NA. TRUE when weighted class counts were used; FALSE for raw counts; NA for the default matrix path.

Details

Model distribution: The input confusion matrix is expanded to n paired (actual, predicted) observation rows. For each replicate, k row indices are drawn with replacement, preserving the observed (actual, predicted) joint distribution. This mirrors the NOVOboot row-resampling approach.

Chance distribution: Actual and predicted labels are resampled independently for each replicate, breaking any association between them. This generates the null distribution against which the model effect is compared.

p-values: An exact 2x2 Fisher p-value is computed for every replicate confusion matrix for both model and chance distributions. These form precision distributions and complement the CI non-overlap criterion; they are not a substitute for it.

Novometric Axiom 1: A statistically significant effect exists when the exact discrete confidence intervals for model and chance performance do not overlap. significant = TRUE indicates the ESS model 95% CI lies entirely above the ESS chance 95% CI.

ESS formula: ESS(%) = 100 * (mean_PAC - 0.5) / 0.5, consistent with oda_ess_from_meanpac.

OR: Diagnostic odds ratio (TP * TN) / (FP * FN). NA when FP = 0 or FN = 0 in a replicate.

RR: Positive predictive value / false omission rate [TP / (TP+FP)] / [FN / (FN+TN)]. NA when undefined.

References

Yarnold PR (2020). Reformulating the First Axiom of Novometric Theory: Assessing Minimum Sample Size in Experimental Design. Optimal Data Analysis 9, 7–8.

Yarnold PR, Soltysik RC (2016). Maximizing Predictive Accuracy. ODA Books.

Examples

# Myeloma MINDENOM=1 confusion (actual x predicted, byrow = TRUE)
conf <- matrix(c(146, 40,
                  36, 33), nrow = 2, byrow = TRUE)
ci <- novo_boot_ci(conf, nboot = 200L, seed = 42L)
ci$significant
#> [1] FALSE
print(ci)
#> Novometric fixed-confusion bootstrap
#>   n = 255   k = 128   nboot = 200   sample_frac = 0.50
#> 
#> Confusion matrix (actual x predicted):
#>      [,1] [,2]
#> [1,]  146   40
#> [2,]   36   33
#> 
#> Observed:  ESS = 26.32%   Mean PAC = 63.16%
#> 
#> 95% CI (2.5% -- 97.5%):
#>   ESS (%)            Model [ 15.11,  51.79]  Chance [-17.11,  21.60]  Overlap: TRUE
#>   Mean PAC (%)       Model [ 57.56,  75.89]  Chance [ 41.45,  60.80]  Overlap: TRUE
#>   Sensitivity (%)    Model [ 35.07,  69.23]  Chance [ 15.38,  44.84]  Overlap: TRUE
#>   Specificity (%)    Model [ 73.17,  88.51]  Chance [ 64.70,  82.29]  Overlap: TRUE
#> 
#> Novometric significance (ESS CI non-overlap): FALSE