Fit a Classification Tree Analysis (CTA) model (internal engine)
oda_cta_fit.RdInternal CTA engine name retained for backward compatibility.
Users should prefer cta_fit() as the public entry point.
Builds a classification tree by recursively applying ODA at each node. At each split, all attributes are evaluated and the attribute with the highest ESS passing the significance threshold is selected. Matches MegaODA CTA behaviour including MINDENOM, PRUNE, ENUMERATE, LOO STABLE, and WEIGHT parameters.
Usage
oda_cta_fit(X, y, w = NULL, priors_on = TRUE, miss_codes = NULL,
alpha_split = 0.05, mindenom = 5L, prune_alpha = 1.0,
max_depth = 10L, ess_min = 0,
mc_iter = 25000L, mc_target = 0.05, mc_stop = 99.9, mc_stopup = NULL,
mc_seed = NULL, loo = "off", attr_names = NULL, K_segments = NULL,
verbose = FALSE, diag_env = NULL)Arguments
- X
Data frame or matrix of attribute columns.
- y
Class variable vector.
- w
Optional numeric case weights (MegaODA WEIGHT). Same length as y.
- priors_on
Use prior-odds weighting at each node. Default TRUE.
- miss_codes
Numeric vector of missing-value codes (MegaODA MISSING).
- alpha_split
Significance threshold to split a node (MegaODA MC CUTOFF). Default 0.05.
- mindenom
Minimum weighted node size to attempt a split (MegaODA MINDENOM). Default 5.
- prune_alpha
Branches with p >= prune_alpha are not grown (MegaODA PRUNE). Default 1.0 = no pruning (unpruned tree).
- max_depth
Maximum tree depth. Default 10.
- ess_min
Minimum ESS required to split. Default 0.
- mc_iter
Maximum MC iterations per node. Default 25000.
- mc_target, mc_stop, mc_stopup
MC stopping parameters.
- mc_seed
Base RNG seed; each node uses mc_seed + node_id * 1000 + attr_j.
- loo
LOO mode per node:
"off"(default),"stable"(MegaODA LOO STABLE; accept when |WESSL - WESS| <= 0.01 pp; reportsloo_status = "STABLE"),"pvalue"(Fisher p strictly less than 0.05; reportsloo_status = "PVALUE"), or a single numeric in (0, 1) (Fisher p strictly less than the supplied threshold; reportsloo_status = "PVALUE").- attr_names
Attribute names. Defaults to column names of X.
- K_segments
Segments for multiclass ordered splits. Default = C.
- verbose
Logical; if
TRUE, emit[CTA]progress messages viamessage()at each major stage. DefaultFALSE.- diag_env
Internal diagnostic environment used to collect CTA timing and Monte Carlo instrumentation. Intended for development diagnostics only; leave as
NULLfor normal use.
Value
An object of class cta_tree containing:
nodesNamed list of node objects, each with fields: node_id, parent_id, depth, n_obs, n_weighted, attribute, rule, ess, p_mc, loo_status, loo_ess, confusion, child_ids, split_labels, majority_class, leaf.
root_idInteger ID of the root node.
n_nodesTotal number of nodes grown.
Use predict.cta_tree to classify new data and
cta_node_table to extract the node summary table.
Examples
## Binary CTA on mtcars
data(mtcars)
mt <- mtcars
X <- mt[, c("cyl","disp","hp","wt")]
y <- as.integer(mt$am)
tree <- oda_cta_fit(X, y, alpha_split = 0.05, mindenom = 5L,
mc_iter = 500L, mc_seed = 42L)
print(tree)
#>
#> CTA Tree alpha_split=0.050 mindenom=5 prune=1.000 max_depth=10 loo=off
#>
#> ATTRIBUTE NODE LEV OBS p ESS WESS LOO MODEL
#> ----------------------------------------------------------------------------------
#> wt 1 1 32 .000 81.78% 81.78% OFF <=3.18-->1; >3.18-->0
#> Node-local split confusion (this rule only, observations at this node)
#> 0 1
#> --------------
#> 0 | 17 1 | 94.44%
#> 1 | 2 12 | 85.71%
#> --------------
#> NP | 19 13
#>
#> Nodes: 3 total (1 split 2 leaf)
#>
#> Terminal endpoints (*):
#> * endpoint 1 node 2: path=wt>3.18 n=14 counts=[0:2 1:12] predicted=1 target_prop=85.7%
#> * endpoint 2 node 3: path=wt<=3.18 n=18 counts=[0:17 1:1] predicted=0 target_prop=5.6%
#> ESS: 81.78% D: 0.4455 strata: 2 min_denom: 14
preds <- predict(tree, X)
mean(preds == y) # training accuracy
#> [1] 0.90625