Optimal Data Analysis (ODA) finds the single classification rule - a cutpoint for ordered attributes, or a partition for categorical attributes - that maximises Mean Percentage of Accurate Classifications (Mean PAC) across all class values. This article walks through the core concepts using a concrete binary example.
The ODA problem
Given a class variable y (the outcome to classify) and an attribute x (the predictor), ODA asks: what single rule on x best separates the classes in y?
For an ordered attribute the rule is a cutpoint c:
- observations with x <= c -> assigned to one class
- observations with x > c -> assigned to the other class
The “best” rule maximises Mean PAC: the average, over all classes, of the per-class accuracy (PAC = percentage of observations in that class correctly classified). Using the per-class average - rather than overall accuracy - ensures that a minority class is not ignored when one class is much larger than the other.
ESS (Effect Strength for Sensitivity) scales Mean PAC against the chance benchmark of 50%:
A model that does no better than random chance has ESS = 0%; perfect classification has ESS = 100%.
Fitting with oda_fit()
The dataset below reconstructs cell frequencies from Appleton (1995), a clinical trial in which 67 patients with migraine were randomised to two treatments and the number of attacks (0-7) recorded.1 The class variable is treatment arm (0 or 1); the attribute is the ordered attack count.
library(oda)
# Cross-classification: rows = attacks (0-7), cols = treatment arm.
# T1 (0) T2 (1)
# 0 att: 13 5
# 1 att: 9 13
# 2 att: 4 6
# 3 att: 2 1
# 4 att: 1 2
# 5 att: 1 3
# 6 att: 3 3
# 7 att: 0 1
treatment <- c(
rep(0L, 13), rep(1L, 5), # attacks = 0
rep(0L, 9), rep(1L, 13), # attacks = 1
rep(0L, 4), rep(1L, 6), # attacks = 2
rep(0L, 2), rep(1L, 1), # attacks = 3
rep(0L, 1), rep(1L, 2), # attacks = 4
rep(0L, 1), rep(1L, 3), # attacks = 5
rep(0L, 3), rep(1L, 3), # attacks = 6
rep(0L, 0), rep(1L, 1) # attacks = 7
)
attacks <- c(
rep(0L, 18), rep(1L, 22), rep(2L, 10),
rep(3L, 3), rep(4L, 3), rep(5L, 4),
rep(6L, 6), rep(7L, 1)
)oda_fit() takes x (attribute),
y (class variable), and an attr_type
indicating whether x is "ordered",
"binary", or "categorical".
fit <- oda_fit(
x = attacks,
y = treatment,
attr_type = "ordered",
mc_iter = 500L, # CRAN-safe; use 25000L for publication
mc_seed = 42L,
loo = "on"
)Reading the output: rule, PAC, ESS
print(fit)
#>
#> ODA (binary) attr_type=ordered priors=TRUE n=67
#>
#> Rule: <= 0.5 --> 0 | > 0.5 --> 1
#>
#> CLASS n PAC
#> 0 33 39.4%
#> 1 34 85.3%
#>
#> Mean PAC: 62.34% ESS: 24.69% p(MC): 0.096
#>
#> -- LOO --
#> CLASS n PAC
#> 0 33 39.4%
#> 1 34 85.3%
#>
#> LOO ESS: 24.69% p(LOO): 0.022ODA identified a cut at 0.5 (zero attacks vs. one or more). The
$rule element records the classification rule:
fit$rule$cut_value # cutpoint
#> [1] 0.5
fit$rule$direction # which side maps to which class
#> [1] "0->1""right_class_1" means observations with attacks > 0.5
are assigned to class 1 (Treatment 2); observations <= 0.5 (zero
attacks) are assigned to class 0 (Treatment 1).
The full summary() displays per-class PAC, Mean PAC,
ESS, and the MC p-value:
summary(fit)
#>
#> ODA Summary (binary) status=valid n=67
#> attr_type=ordered priors=TRUE weights=FALSE
#> Rule: <= 0.5 --> 0 | > 0.5 --> 1
#>
#> -- Train --
#> Mean PAC (wt): 62.34% ESS: 24.69%
#> Sensitivity: 0.853 Specificity: 0.394
#> p(MC): 0.096 [MC permutation, two-tailed]
#> -- LOO --
#> CLASS n PAC
#> 0 33 39.4%
#> 1 34 85.3%
#> LOO ESS: 24.69%
#> LOO Mean PAC: 62.34%
#> p(LOO): 0.022 [Fisher exact (2x2), one-tailed]Interpreting PAC and ESS:
- PAC for class 0 (T1) = percentage of T1 patients correctly classified.
- PAC for class 1 (T2) = percentage of T2 patients correctly classified.
- Mean PAC = their unweighted average.
- ESS = (Mean PAC - 50%) / 50% x 100%. Here ESS ~ 24.69%.
Confusion matrix: The raw integer confusion is in
$confusion (binary form: TN, FP, FN, TP):
cat("TN =", fit$confusion$TN, " FP =", fit$confusion$FP, "\n")
#> TN = 13 FP = 20
cat("FN =", fit$confusion$FN, " TP =", fit$confusion$TP, "\n")
#> FN = 5 TP = 29Predictive value (PV): When the model predicts a class, how often is it correct?
Leave-one-out (LOO) validity
loo = "on" runs a true jackknife: each observation is
dropped in turn, the model is refit on the remaining n - 1 cases, and
the held-out observation is predicted by the refitted rule.
cat("Training ESS:", round(fit$ess, 2), "%\n")
#> Training ESS: 24.69 %
cat("LOO ESS: ", round(fit$loo$ess_loo, 2), "%\n")
#> LOO ESS: 24.69 %
loo_stable <- isTRUE(all.equal(fit$ess, fit$loo$ess_loo, tolerance = 1e-4))
cat("LOO status: ", if (loo_stable) "STABLE" else "UNSTABLE", "\n")
#> LOO status: STABLELOO status: STABLE means the LOO ESS matches the
training ESS (within a small numerical tolerance). Complete LOO
stability means no single observation drives the rule - the model
generalises.
For binary ordered attributes where the cut falls at a sparse boundary (e.g. attacks = 0 here), LOO is often trivially stable: every fold produces the same cut.
Notes on MC p-value calibration
The Monte Carlo p-value tests whether the observed ESS could arise by chance. ODA’s chance benchmark is 50% Mean PAC (ESS = 0%), not overall accuracy.
The p(MC) shown reflects mc_iter = 500L.
For publication, use mc_iter = 25000L (the canonical
MegaODA reference run). The training ESS and confusion matrix are
unaffected by mc_iter; only the p-value precision changes
with low iteration counts.
Accessor functions
oda_predictions(), oda_confusion(), and
oda_metrics() provide programmatic access to fit
components:
m <- oda_metrics(fit)
cat("Mean PAC:", round(m$mean_pac * 100, 2), "%\n")
#> Mean PAC: %
cat("ESS: ", round(m$ess, 2), "%\n")
#> ESS: 24.69 %Key ESS thresholds
| ESS range | Conventional interpretation |
|---|---|
| < 25% | Relatively weak effect |
| 25-50% | Moderate effect |
| 50-75% | Relatively strong effect |
| > 75% | Strong effect |
The migraine example (ESS ~ 24.69%) is marginally below the moderate threshold - a weak but non-chance association.2