Skip to contents

Internal CTA engine name retained for backward compatibility. Users should prefer cta_fit() as the public entry point.

Builds a classification tree by recursively applying ODA at each node. At each split, all attributes are evaluated and the attribute with the highest ESS passing the significance threshold is selected. Matches MegaODA CTA behaviour including MINDENOM, PRUNE, ENUMERATE, LOO STABLE, and WEIGHT parameters.

Usage

oda_cta_fit(X, y, w = NULL, priors_on = TRUE, miss_codes = NULL,
  alpha_split = 0.05, mindenom = 5L, prune_alpha = 1.0,
  max_depth = 10L, ess_min = 0,
  mc_iter = 25000L, mc_target = 0.05, mc_stop = 99.9, mc_stopup = NULL,
  mc_seed = NULL, loo = "off", attr_names = NULL, K_segments = NULL,
  verbose = FALSE, diag_env = NULL)

Arguments

X

Data frame or matrix of attribute columns.

y

Class variable vector.

w

Optional numeric case weights (MegaODA WEIGHT). Same length as y.

priors_on

Use prior-odds weighting at each node. Default TRUE.

miss_codes

Numeric vector of missing-value codes (MegaODA MISSING).

alpha_split

Significance threshold to split a node (MegaODA MC CUTOFF). Default 0.05.

mindenom

Minimum weighted node size to attempt a split (MegaODA MINDENOM). Default 5.

prune_alpha

Branches with p >= prune_alpha are not grown (MegaODA PRUNE). Default 1.0 = no pruning (unpruned tree).

max_depth

Maximum tree depth. Default 10.

ess_min

Minimum ESS required to split. Default 0.

mc_iter

Maximum MC iterations per node. Default 25000.

mc_target, mc_stop, mc_stopup

MC stopping parameters.

mc_seed

Base RNG seed; each node uses mc_seed + node_id * 1000 + attr_j.

loo

LOO mode per node: "off" (default), "stable" (MegaODA LOO STABLE; accept when |WESSL - WESS| <= 0.01 pp; reports loo_status = "STABLE"), "pvalue" (Fisher p strictly less than 0.05; reports loo_status = "PVALUE"), or a single numeric in (0, 1) (Fisher p strictly less than the supplied threshold; reports loo_status = "PVALUE").

attr_names

Attribute names. Defaults to column names of X.

K_segments

Segments for multiclass ordered splits. Default = C.

verbose

Logical; if TRUE, emit [CTA] progress messages via message() at each major stage. Default FALSE.

diag_env

Internal diagnostic environment used to collect CTA timing and Monte Carlo instrumentation. Intended for development diagnostics only; leave as NULL for normal use.

Value

An object of class cta_tree containing:

nodes

Named list of node objects, each with fields: node_id, parent_id, depth, n_obs, n_weighted, attribute, rule, ess, p_mc, loo_status, loo_ess, confusion, child_ids, split_labels, majority_class, leaf.

root_id

Integer ID of the root node.

n_nodes

Total number of nodes grown.

Use predict.cta_tree to classify new data and cta_node_table to extract the node summary table.

Examples

## Binary CTA on mtcars
data(mtcars)
mt <- mtcars
X  <- mt[, c("cyl","disp","hp","wt")]
y  <- as.integer(mt$am)
tree <- oda_cta_fit(X, y, alpha_split = 0.05, mindenom = 5L,
                    mc_iter = 500L, mc_seed = 42L)
print(tree)
#> 
#> CTA Tree  alpha_split=0.050  mindenom=5  prune=1.000  max_depth=10  loo=off
#> 
#> ATTRIBUTE      NODE  LEV    OBS       p      ESS     WESS      LOO  MODEL
#> ---------------------------------------------------------------------------------- 
#> wt                1    1     32    .000   81.78%   81.78%      OFF  <=3.18-->1; >3.18-->0
#>   Node-local split confusion (this rule only, observations at this node)
#>                    0      1 
#>              -------------- 
#>        0  |      17      1 |  94.44%
#>        1  |       2     12 |  85.71%
#>              -------------- 
#>       NP  |      19     13
#> 
#> Nodes: 3 total  (1 split  2 leaf)
#> 
#> Terminal endpoints (*):
#> * endpoint 1  node 2:  path=wt>3.18  n=14  counts=[0:2 1:12]  predicted=1  target_prop=85.7%
#> * endpoint 2  node 3:  path=wt<=3.18  n=18  counts=[0:17 1:1]  predicted=0  target_prop=5.6%
#> ESS: 81.78%  D: 0.4455  strata: 2  min_denom: 14
preds <- predict(tree, X)
mean(preds == y)   # training accuracy
#> [1] 0.90625