Skip to contents

Public entry point for CTA. Currently supports binary (two-class) outcome variables only.

When recursive = FALSE (default), validates the class variable and delegates to oda_cta_fit. When recursive = TRUE, runs the Locally Optimal Recursive Tree (LORT) engine: at each endpoint a full MDSA family scan (cta_descendant_family) is performed, the min-D member is selected, and recursion continues until no further structure is found or a compute guard fires. Returns a dual-tagged cta_ort / cta_tree object.

Usage

cta_fit(X, y, verbose = FALSE,
        recursive        = FALSE,
        min_n            = 30L,
        max_depth        = 8L,
        max_nodes        = 31L,
        family_max_steps = 20L,
        ...)

Arguments

X

Data frame or matrix of attribute columns. For recursive CTA, X is the declared candidate predictor frame; pass only variables eligible for CTA search. Prediction may be performed on wider newdata as long as the split variable names are present.

y

Integer class variable vector. Must have exactly two distinct values.

verbose

Logical; if TRUE, emit [CTA] and [ORT] progress messages. Default FALSE.

recursive

Logical; if TRUE, run the Locally Optimal Recursive Tree (LORT) engine. Default FALSE. Cannot be combined with an explicit mindenom argument (error).

min_n

Integer; minimum endpoint n to attempt recursion. Endpoints smaller than min_n become terminal with stop reason "min_n". Default 30L. Only used when recursive = TRUE.

max_depth

Integer; safety cap on recursion depth. Nodes at depth >= max_depth become terminal with stop reason "max_depth". Default 8L. Only used when recursive = TRUE.

max_nodes

Integer; safety cap on total ORT nodes allocated. When the node count exceeds max_nodes the current endpoint becomes terminal with stop reason "max_nodes". Default 31L. Only used when recursive = TRUE.

family_max_steps

Integer or NULL (default 20L); maximum number of MDSA family members evaluated at each recursive node. This bounds the per-node cta_descendant_family scan (its max_steps argument). The default of 20L preserves cta_descendant_family() behavior. Smaller values reduce the per-node compute budget at the cost of possibly missing the true min-D member if the family is long. Stored in ort_settings$family_max_steps. Only used when recursive = TRUE; error if explicitly supplied with recursive = FALSE.

...

Additional arguments passed to oda_cta_fit (non-recursive) or used as ORT settings (recursive). Supported ORT settings via ...: w, mc_seed, mc_iter, alpha_split, prune_alpha, loo. When recursive = TRUE, mc_seed initializes the RNG once at ORT start; child-node MDSA scans consume the stream in deterministic right-then-left traversal order without resetting the seed.

Value

Non-recursive: a cta_tree object.

Recursive: a dual-tagged cta_ort / cta_tree object. All existing cta_tree S3 methods (predict, print, summary, plot) operate on the root-level model. cta_ort-aware methods (predict.cta_ort, print.cta_ort, summary.cta_ort, plot.cta_ort) operate on the full composite tree. Use predict(obj, newdata, type="all") to retrieve stratum assignments.

Note

oda_cta_fit() is the internal engine name; cta_fit() is the preferred public entry point for non-recursive CTA. Both are exported and functionally equivalent for non-recursive use.

cta_fit(..., recursive = TRUE) is a legacy-compatible interface for the LORT workflow layer. Prefer lort_fit() for new code. SORT and GORT are reserved and not implemented.

Examples

# Small synthetic two-class example (non-recursive)
X <- data.frame(
  x1 = c(1, 2, 3, 4, 5, 6, 7, 8),
  x2 = c(0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L)
)
y <- c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L)

tree <- cta_fit(X, y,
  priors_on   = TRUE,
  mindenom    = 1L,
  mc_iter     = 500L,
  mc_seed     = 42L,
  loo         = "off",
  attr_names  = c("x1", "x2")
)
print(tree)
#> 
#> CTA Tree  alpha_split=0.050  mindenom=1  prune=1.000  max_depth=10  loo=off
#> 
#> ATTRIBUTE      NODE  LEV    OBS       p      ESS     WESS      LOO  MODEL
#> ---------------------------------------------------------------------------------- 
#> x1                1    1      8   0.026  100.00%  100.00%      OFF  <=4.5-->0; >4.5-->1
#>   Node-local split confusion (this rule only, observations at this node)
#>                    0      1 
#>              -------------- 
#>        0  |       4      0 | 100.00%
#>        1  |       0      4 | 100.00%
#>              -------------- 
#>       NP  |       4      4
#> 
#> Nodes: 3 total  (1 split  2 leaf)
#> 
#> Terminal endpoints (*):
#> * endpoint 1  node 2:  path=x1<=4.5  n=4  counts=[0:4 1:0]  predicted=0  target_prop=0.0%
#> * endpoint 2  node 3:  path=x1>4.5  n=4  counts=[0:0 1:4]  predicted=1  target_prop=100.0%
#> ESS: 100.00%  D: 0.0000  strata: 2  min_denom: 4

# Recursive ORT  -  two-level synthetic dataset
X2 <- data.frame(
  A = c(rep(0, 20), rep(1, 20), rep(1, 20)),
  B = c(rep(0, 20), rep(0, 20), rep(1, 20))
)
y2 <- c(rep(0L, 20), rep(0L, 20), rep(1L, 20))
ort <- cta_fit(X2, y2, recursive = TRUE,
               mc_iter = 100L, mc_seed = 42L, loo = "off",
               min_n = 5L)
print(ort)
#> Locally Optimal Recursive Tree (LORT)
#>   selection: greedy local min-D per recursive node
#>   global optimization: no
#>   SDA anchored: no
#>   Strata: 3 terminal strata
#>   Guards: min_n=5, max_depth=8, max_nodes=31
#>   mc_seed=42, mc_iter=100
#>   Strata consistency check: PASSED
#> 
#> Terminal strata (ascending class-1 proportion):
#>   Stage 1   n=20      prop=1.0000  stop=no_tree     A>0.5 AND B>0.5
#>   Stage 2   n=20      prop=1.0000  stop=no_tree     A>0.5 AND B<=0.5
#>   Stage 3   n=20      prop=1.0000  stop=no_tree     A<=0.5