Fit a Classification Tree Analysis (CTA) model (public wrapper)
cta_fit.RdPublic entry point for CTA. Currently supports binary (two-class) outcome variables only.
When recursive = FALSE (default), validates the class variable and
delegates to oda_cta_fit. When recursive = TRUE,
runs the Locally Optimal Recursive Tree (LORT) engine: at each endpoint a full MDSA
family scan (cta_descendant_family) is performed, the min-D
member is selected, and recursion continues until no further structure is
found or a compute guard fires. Returns a dual-tagged
cta_ort / cta_tree object.
Usage
cta_fit(X, y, verbose = FALSE,
recursive = FALSE,
min_n = 30L,
max_depth = 8L,
max_nodes = 31L,
family_max_steps = 20L,
...)Arguments
- X
Data frame or matrix of attribute columns. For recursive CTA,
Xis the declared candidate predictor frame; pass only variables eligible for CTA search. Prediction may be performed on widernewdataas long as the split variable names are present.- y
Integer class variable vector. Must have exactly two distinct values.
- verbose
Logical; if
TRUE, emit[CTA]and[ORT]progress messages. DefaultFALSE.- recursive
Logical; if
TRUE, run the Locally Optimal Recursive Tree (LORT) engine. DefaultFALSE. Cannot be combined with an explicitmindenomargument (error).- min_n
Integer; minimum endpoint n to attempt recursion. Endpoints smaller than
min_nbecome terminal with stop reason"min_n". Default30L. Only used whenrecursive = TRUE.- max_depth
Integer; safety cap on recursion depth. Nodes at
depth >= max_depthbecome terminal with stop reason"max_depth". Default8L. Only used whenrecursive = TRUE.- max_nodes
Integer; safety cap on total ORT nodes allocated. When the node count exceeds
max_nodesthe current endpoint becomes terminal with stop reason"max_nodes". Default31L. Only used whenrecursive = TRUE.- family_max_steps
Integer or
NULL(default20L); maximum number of MDSA family members evaluated at each recursive node. This bounds the per-nodecta_descendant_familyscan (itsmax_stepsargument). The default of20Lpreservescta_descendant_family()behavior. Smaller values reduce the per-node compute budget at the cost of possibly missing the true min-D member if the family is long. Stored inort_settings$family_max_steps. Only used whenrecursive = TRUE; error if explicitly supplied withrecursive = FALSE.- ...
Additional arguments passed to
oda_cta_fit(non-recursive) or used as ORT settings (recursive). Supported ORT settings via...:w,mc_seed,mc_iter,alpha_split,prune_alpha,loo. Whenrecursive = TRUE,mc_seedinitializes the RNG once at ORT start; child-node MDSA scans consume the stream in deterministic right-then-left traversal order without resetting the seed.
Value
Non-recursive: a cta_tree object.
Recursive: a dual-tagged cta_ort / cta_tree object.
All existing cta_tree S3 methods (predict, print,
summary, plot) operate on the root-level model.
cta_ort-aware methods (predict.cta_ort,
print.cta_ort, summary.cta_ort, plot.cta_ort) operate
on the full composite tree. Use predict(obj, newdata, type="all")
to retrieve stratum assignments.
Note
oda_cta_fit() is the internal engine name; cta_fit() is the
preferred public entry point for non-recursive CTA. Both are exported and
functionally equivalent for non-recursive use.
cta_fit(..., recursive = TRUE) is a legacy-compatible interface for
the LORT workflow layer. Prefer lort_fit() for new code.
SORT and GORT are reserved and not implemented.
Examples
# Small synthetic two-class example (non-recursive)
X <- data.frame(
x1 = c(1, 2, 3, 4, 5, 6, 7, 8),
x2 = c(0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L)
)
y <- c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L)
tree <- cta_fit(X, y,
priors_on = TRUE,
mindenom = 1L,
mc_iter = 500L,
mc_seed = 42L,
loo = "off",
attr_names = c("x1", "x2")
)
print(tree)
#>
#> CTA Tree alpha_split=0.050 mindenom=1 prune=1.000 max_depth=10 loo=off
#>
#> ATTRIBUTE NODE LEV OBS p ESS WESS LOO MODEL
#> ----------------------------------------------------------------------------------
#> x1 1 1 8 0.026 100.00% 100.00% OFF <=4.5-->0; >4.5-->1
#> Node-local split confusion (this rule only, observations at this node)
#> 0 1
#> --------------
#> 0 | 4 0 | 100.00%
#> 1 | 0 4 | 100.00%
#> --------------
#> NP | 4 4
#>
#> Nodes: 3 total (1 split 2 leaf)
#>
#> Terminal endpoints (*):
#> * endpoint 1 node 2: path=x1<=4.5 n=4 counts=[0:4 1:0] predicted=0 target_prop=0.0%
#> * endpoint 2 node 3: path=x1>4.5 n=4 counts=[0:0 1:4] predicted=1 target_prop=100.0%
#> ESS: 100.00% D: 0.0000 strata: 2 min_denom: 4
# Recursive ORT - two-level synthetic dataset
X2 <- data.frame(
A = c(rep(0, 20), rep(1, 20), rep(1, 20)),
B = c(rep(0, 20), rep(0, 20), rep(1, 20))
)
y2 <- c(rep(0L, 20), rep(0L, 20), rep(1L, 20))
ort <- cta_fit(X2, y2, recursive = TRUE,
mc_iter = 100L, mc_seed = 42L, loo = "off",
min_n = 5L)
print(ort)
#> Locally Optimal Recursive Tree (LORT)
#> selection: greedy local min-D per recursive node
#> global optimization: no
#> SDA anchored: no
#> Strata: 3 terminal strata
#> Guards: min_n=5, max_depth=8, max_nodes=31
#> mc_seed=42, mc_iter=100
#> Strata consistency check: PASSED
#>
#> Terminal strata (ascending class-1 proportion):
#> Stage 1 n=20 prop=1.0000 stop=no_tree A>0.5 AND B>0.5
#> Stage 2 n=20 prop=1.0000 stop=no_tree A>0.5 AND B<=0.5
#> Stage 3 n=20 prop=1.0000 stop=no_tree A<=0.5