Assign observations to CTA terminal endpoints
cta_assign_endpoints.RdTraverses the fitted cta_tree for each row of newdata and
returns the terminal leaf reached, expressed as both its stored node
identifier (endpoint_node_id) and its sequential endpoint index
(endpoint_id) matching cta_endpoint_summary.
No endpoint membership is stored at fit time. This function performs the
traversal on demand so the cta_tree object remains lean. The
returned endpoint_id can be joined with the output of
cta_propensity_weights to assign endpoint-level stabilized
weights to individual observations.
Column order requirement: newdata must have the same
attribute column order as the X matrix passed to
oda_cta_fit. Traversal uses the stored integer column
positions (attr_col) from the fit, not column names. If both
names(newdata) and tree$attr_names are non-NULL, a warning is
issued when they disagree at the split attribute positions.
Missingness:
"na"(default)Canonical path-local behaviour: when a split attribute value is
NAor a stored miss-code on the observation's actual traversal path, the row returnsNAfor both output columns. This matches the canonicalmissing_action = "na"semantics ofpredict."majority"Routes the observation to the child subtree with the larger
n_obs, then continues traversal to a terminal leaf. Ties are resolved by selecting the first child.
Usage
cta_assign_endpoints(tree, newdata, missing_action = c("na", "majority"))Arguments
- tree
A
cta_treefromoda_cta_fit.- newdata
A
data.frame(or coercible object) with the same column order as the trainingXsupplied tooda_cta_fit.- missing_action
Character; one of
"na"(default) or"majority". See Description.
Value
A data.frame with one row per row of newdata and columns:
row_idInteger; positional row index in
newdata(1 tonrow(newdata)).endpoint_node_idInteger;
node_idof the terminal leaf reached by traversal.NA_integer_when the observation cannot be routed to a terminal leaf (missing split attribute withmissing_action = "na", or no-tree fit).endpoint_idInteger; sequential endpoint index matching
cta_endpoint_summary.NA_integer_under the same conditions asendpoint_node_id.
For no-tree fits all rows have endpoint_node_id = NA_integer_ and
endpoint_id = NA_integer_.
Details
Observation-level propensity weights (workflow sketch):
ep <- cta_assign_endpoints(tree, X_train, missing_action = "na")
pw <- cta_propensity_weights(tree, target_class = 1L, adjusted = TRUE)
# One row per classified training observation with its weight:
obs <- merge(
data.frame(row_id = seq_len(nrow(X_train)),
class = as.character(y_train)),
merge(ep, pw[, c("endpoint_id", "class", "adjusted_propensity_weight")],
by = "endpoint_id"),
by = c("row_id", "class")
)
# Rows with NA endpoint_id (missing root attribute) drop naturally.Observation-level propensity weight expansion is intentionally left to the
caller so that the cta_tree object stores no observation indices.
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
ep <- cta_assign_endpoints(tree, X)
head(ep)
#> row_id endpoint_node_id endpoint_id
#> 1 1 3 2
#> 2 2 3 2
#> 3 3 3 2
#> 4 4 2 1
#> 5 5 2 1
#> 6 6 2 1