Fit an ODA model
oda_fit.RdUnified entry point for Optimal Data Analysis. Dispatches to the binary-class engine when the outcome has exactly two distinct values, or the multiclass engine for three or more classes. This is the function CTA nodes call at each split candidate.
Usage
oda_fit(x, y, w = NULL,
attr_type = c("auto","ordered","categorical","binary"),
priors_on = TRUE, K_segments = NULL, degen = FALSE,
miss_codes = NULL, missing_code = NULL,
mcarlo = TRUE, mc_iter = 25000L, mc_target = 0.05,
mc_stop = 99.9, mc_stopup = NA, mc_seed = NULL,
loo = "off",
boundary_mode = c("megaoda_halfopen","right_closed"),
eval_order = c("mc_then_loo","loo_then_mc"),
mindenom = 1L,
direction = c("both","off","greater","less","ascending","descending"),
direction_map = NULL)Arguments
- x
Attribute values (numeric, factor, character, or logical).
- y
Class labels; must have 2 or 3+ distinct values.
- w
Optional numeric case weights. Default: unit weights. These are economic or importance weights, distinct from prior-odds weighting which is controlled by
priors_on.- attr_type
Attribute measurement type:
"auto"(default),"ordered","categorical", or"binary".- priors_on
Logical; if
TRUE(default), weight classes by the reciprocal of their sample frequency so that the objective maximizes mean PAC rather than overall PAC.- K_segments
Number of segments for multiclass ordered models. Default equals the number of classes \(C\).
- degen
Logical; if
FALSE(default) require all \(C\) classes to appear in the predicted labels.- miss_codes
Numeric vector of values to treat as missing (excluded from analysis).
- missing_code
Scalar alias for
miss_codes; accepted for convenience.- mcarlo
Logical; run Monte Carlo Fisher-randomization p-value? Default
TRUE.- mc_iter
Maximum Monte Carlo iterations. Default 25000.
- mc_target
Significance threshold for STOP early stopping. Default 0.05.
- mc_stop
Confidence level (percent) for lower-tail STOP. Default 99.9.
- mc_stopup
Confidence level (percent) for upper-tail STOPUP. Default NA (disabled; matches MegaODA behavior).
- mc_seed
Optional integer RNG seed for reproducibility.
- loo
LOO mode.
"off"(default): no LOO filter."on": synonym for"pvalue"when used with the multiclass engine."stable": binary only; accept when LOO ESS equals training ESS (|WESSL - WESS| <= 0.01 pp); split node reportsloo_status = "STABLE"."pvalue": binary only; accept when LOO Fisher p is strictly less than 0.05 (default threshold); split node reportsloo_status = "PVALUE". Numeric in (0, 1): binary only; accept when LOO Fisher p is strictly less than the supplied value; must be a single finite value strictly in (0, 1); split node reportsloo_status = "PVALUE". Do not describe the p-value gate as "STABLE" - the two modes are distinct.- boundary_mode
Boundary convention for multiclass ordered rules. Default
"megaoda_halfopen"matches MegaODA.exe behaviour.- eval_order
Controls whether Monte Carlo testing is run before LOO validation or whether eligible ordered-cut LOO stability is checked before Monte Carlo. The default
"mc_then_loo"preserves standalone UniODA behaviour. CTA tree building uses"loo_then_mc"internally to reject LOO-unstable ordered-cut candidates before spending MC iterations.- mindenom
Minimum raw observation count required in each child node for a candidate cut to be evaluated. Default 1 (no enforcement).
- direction
Directional hypothesis control.
"both"(default, non-directional) or its synonym"off"."greater"/"less": MPE Chapter 2 binary ordered DIRECTIONAL (high / low attribute values predict class 1; binary class only - error on multiclass)."ascending"/"descending": MPE Chapter 4 DIRECTIONAL for multiclass ordered (segment s maps to class s or C+1-s) or multiclass categorical with L == C (auto-creates identity or reverse mapping). Error on binary class (use"greater"/"less"for binary ordered).- direction_map
Named integer vector for categorical fixed-partition DIRECTIONAL (MPE Chapter 4). Names are attribute levels (character); values are predicted class labels. All attribute levels must be covered exactly once with at least two distinct target classes. When supplied, ODA evaluates only the specified mapping and skips the partition search. For binary class, values should be the original class labels (recoded to 0/1 internally). For multiclass, values should be 1..C class labels. Compatible with
direction = "both"(default). DefaultNULL.
Value
A named list with components:
okLogical;
TRUEif a valid model was found.reasonCharacter reason string if
ok = FALSE.ruleThe fitted rule (list; structure depends on
attr_typeand engine).n_effNumber of observations used (after missing removal).
essEffect Strength for Sensitivity (percent), scaled 0–100.
pacPercentage Accuracy in Classification (training).
p_mcMonte Carlo p-value, or
NAifmcarlo = FALSE.looLOO results list, or
NULLifloo = "off".engineCharacter;
"binary"or"multiclass".confusionConfusion table. For the binary engine this is a list with integer counts
TP,TN,FP,FNplussensitivityandspecificityas proportions [0,1]. For the multiclass engine this is a numeric matrix of (possibly weighted) counts.
Examples
## Binary (C = 2)
x <- c(1,2,3,4,5,6,7,8)
y <- c(0L,0L,0L,0L,1L,1L,1L,1L)
fit <- oda_fit(x, y, mcarlo = FALSE)
fit$ok
#> [1] TRUE
fit$rule$cut_value
#> [1] 4.5
## Multiclass (C = 3)
x3 <- c(1,2,3,4,5,6,7,8,9)
y3 <- c(1L,1L,1L,2L,2L,2L,3L,3L,3L)
fit3 <- oda_fit(x3, y3, mcarlo = FALSE)
fit3$rule$cut_values
#> [1] 3.5 6.5
fit3$rule$seg_classes
#> [1] 1 2 3