Skip to contents

Unified entry point for Optimal Data Analysis. Dispatches to the binary-class engine when the outcome has exactly two distinct values, or the multiclass engine for three or more classes. This is the function CTA nodes call at each split candidate.

Usage

oda_fit(x, y, w = NULL,
        attr_type = c("auto","ordered","categorical","binary"),
        priors_on = TRUE, K_segments = NULL, degen = FALSE,
        miss_codes = NULL, missing_code = NULL,
        mcarlo = TRUE, mc_iter = 25000L, mc_target = 0.05,
        mc_stop = 99.9, mc_stopup = NA, mc_seed = NULL,
        loo = "off",
        boundary_mode = c("megaoda_halfopen","right_closed"),
        eval_order = c("mc_then_loo","loo_then_mc"),
        mindenom = 1L,
        direction = c("both","off","greater","less","ascending","descending"),
        direction_map = NULL)

Arguments

x

Attribute values (numeric, factor, character, or logical).

y

Class labels; must have 2 or 3+ distinct values.

w

Optional numeric case weights. Default: unit weights. These are economic or importance weights, distinct from prior-odds weighting which is controlled by priors_on.

attr_type

Attribute measurement type: "auto" (default), "ordered", "categorical", or "binary".

priors_on

Logical; if TRUE (default), weight classes by the reciprocal of their sample frequency so that the objective maximizes mean PAC rather than overall PAC.

K_segments

Number of segments for multiclass ordered models. Default equals the number of classes \(C\).

degen

Logical; if FALSE (default) require all \(C\) classes to appear in the predicted labels.

miss_codes

Numeric vector of values to treat as missing (excluded from analysis).

missing_code

Scalar alias for miss_codes; accepted for convenience.

mcarlo

Logical; run Monte Carlo Fisher-randomization p-value? Default TRUE.

mc_iter

Maximum Monte Carlo iterations. Default 25000.

mc_target

Significance threshold for STOP early stopping. Default 0.05.

mc_stop

Confidence level (percent) for lower-tail STOP. Default 99.9.

mc_stopup

Confidence level (percent) for upper-tail STOPUP. Default NA (disabled; matches MegaODA behavior).

mc_seed

Optional integer RNG seed for reproducibility.

loo

LOO mode. "off" (default): no LOO filter. "on": synonym for "pvalue" when used with the multiclass engine. "stable": binary only; accept when LOO ESS equals training ESS (|WESSL - WESS| <= 0.01 pp); split node reports loo_status = "STABLE". "pvalue": binary only; accept when LOO Fisher p is strictly less than 0.05 (default threshold); split node reports loo_status = "PVALUE". Numeric in (0, 1): binary only; accept when LOO Fisher p is strictly less than the supplied value; must be a single finite value strictly in (0, 1); split node reports loo_status = "PVALUE". Do not describe the p-value gate as "STABLE" - the two modes are distinct.

boundary_mode

Boundary convention for multiclass ordered rules. Default "megaoda_halfopen" matches MegaODA.exe behaviour.

eval_order

Controls whether Monte Carlo testing is run before LOO validation or whether eligible ordered-cut LOO stability is checked before Monte Carlo. The default "mc_then_loo" preserves standalone UniODA behaviour. CTA tree building uses "loo_then_mc" internally to reject LOO-unstable ordered-cut candidates before spending MC iterations.

mindenom

Minimum raw observation count required in each child node for a candidate cut to be evaluated. Default 1 (no enforcement).

direction

Directional hypothesis control. "both" (default, non-directional) or its synonym "off". "greater" / "less": MPE Chapter 2 binary ordered DIRECTIONAL (high / low attribute values predict class 1; binary class only - error on multiclass). "ascending" / "descending": MPE Chapter 4 DIRECTIONAL for multiclass ordered (segment s maps to class s or C+1-s) or multiclass categorical with L == C (auto-creates identity or reverse mapping). Error on binary class (use "greater" / "less" for binary ordered).

direction_map

Named integer vector for categorical fixed-partition DIRECTIONAL (MPE Chapter 4). Names are attribute levels (character); values are predicted class labels. All attribute levels must be covered exactly once with at least two distinct target classes. When supplied, ODA evaluates only the specified mapping and skips the partition search. For binary class, values should be the original class labels (recoded to 0/1 internally). For multiclass, values should be 1..C class labels. Compatible with direction = "both" (default). Default NULL.

Value

A named list with components:

ok

Logical; TRUE if a valid model was found.

reason

Character reason string if ok = FALSE.

rule

The fitted rule (list; structure depends on attr_type and engine).

n_eff

Number of observations used (after missing removal).

ess

Effect Strength for Sensitivity (percent), scaled 0–100.

pac

Percentage Accuracy in Classification (training).

p_mc

Monte Carlo p-value, or NA if mcarlo = FALSE.

loo

LOO results list, or NULL if loo = "off".

engine

Character; "binary" or "multiclass".

confusion

Confusion table. For the binary engine this is a list with integer counts TP, TN, FP, FN plus sensitivity and specificity as proportions [0,1]. For the multiclass engine this is a numeric matrix of (possibly weighted) counts.

Examples

## Binary (C = 2)
x <- c(1,2,3,4,5,6,7,8)
y <- c(0L,0L,0L,0L,1L,1L,1L,1L)
fit <- oda_fit(x, y, mcarlo = FALSE)
fit$ok
#> [1] TRUE
fit$rule$cut_value
#> [1] 4.5

## Multiclass (C = 3)
x3 <- c(1,2,3,4,5,6,7,8,9)
y3 <- c(1L,1L,1L,2L,2L,2L,3L,3L,3L)
fit3 <- oda_fit(x3, y3, mcarlo = FALSE)
fit3$rule$cut_values
#> [1] 3.5 6.5
fit3$rule$seg_classes
#> [1] 1 2 3