Assign per-observation CTA propensity weights
cta_observation_weights.RdConvenience wrapper that calls cta_assign_endpoints and
cta_propensity_weights and returns a joined observation-level
data frame. The cta_tree object is not mutated; all computation is
on demand.
Column order requirement: newdata must have the same attribute
column order as the X matrix passed to oda_cta_fit.
Traversal uses the stored integer column positions (attr_col) from the
fit, not column names.
Unroutable observations: Observations with NA endpoint
(missing root split attribute under missing_action = "na") or
NA class label receive assigned = FALSE and NA for all
weight columns. The output always contains nrow(newdata) rows.
Unmatched classified observations: When a non-NA endpoint
observation's class is not present in the propensity weight table (e.g.,
a class unseen at fit time), a warning is issued and assigned = FALSE.
Usage
cta_observation_weights(tree, newdata, y, target_class = NULL,
adjusted = TRUE,
missing_action = c("na", "majority"))Arguments
- tree
A
cta_treefromoda_cta_fit.- newdata
A
data.frame(or coercible object) with the same column order as the trainingX.- y
Class labels for each row of
newdata. Any type coercible to character; length must equalnrow(newdata).- target_class
Passed to
cta_propensity_weightsas an annotation parameter. Identifies which class is treated as the design target (high-risk class) for thetarget_classoutput column; it does not filter the endpoint \(\times\) class rows used for the join. Each observation is matched to its ownactual_classregardless of this value.NULL(default) letscta_propensity_weightsresolve the target class automatically (numerically largest class for binary trees; must be supplied explicitly for trees with three or more classes).- adjusted
Logical; passed to
cta_propensity_weights. DefaultTRUE.- missing_action
Character; one of
"na"(default) or"majority". Passed tocta_assign_endpoints.
Value
A data.frame with nrow(newdata) rows and columns:
row_idInteger; positional row index (1 to
nrow(newdata)).actual_classCharacter; class label from
y, coerced to character.endpoint_node_idInteger; node ID of the terminal leaf reached by traversal, or
NA_integer_when unroutable.endpoint_idInteger; sequential endpoint index matching
cta_endpoint_summary, orNA_integer_.target_classInteger; resolved design target class annotation from
cta_propensity_weights, orNA_integer_when unassigned.propensity_weightNumeric; unadjusted propensity weight for the observation's endpoint–class cell, or
NAwhen unassigned.adjusted_propensity_weightNumeric; adjusted propensity weight (Yarnold-Linden correction for perfectly predicted endpoints), or
NAwhen unassigned.undefined_empiricalLogical;
TRUEwhen the endpoint–class cell has zero observed frequency, orNAwhen unassigned.perfectly_predicted_endpointLogical;
TRUEwhen all observations at the endpoint belong to one class, orNAwhen unassigned.adjustedLogical;
TRUEwhen the adjusted weight was applied at this endpoint, orNAwhen unassigned.assignedLogical;
TRUEwhen a propensity weight was successfully matched for this observation.
Details
No observation-level data are stored in the cta_tree object at fit
time. This function performs traversal and weight lookup on demand.
No-tree fits: When the tree has no splits (leaf-only), all rows have
endpoint_id = NA_integer_ and assigned = FALSE.
Join semantics: The join key is
paste(endpoint_id, actual_class). Each observation is matched to the
propensity weight row whose class equals its actual_class.
The target_class parameter annotates all rows with the resolved design
target class but does not affect which rows participate in the join.
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
ow <- cta_observation_weights(tree, X, y)
head(ow)
#> row_id actual_class endpoint_node_id endpoint_id target_class
#> 1 1 1 3 2 1
#> 2 2 1 3 2 1
#> 3 3 1 3 2 1
#> 4 4 0 2 1 1
#> 5 5 0 2 1 1
#> 6 6 0 2 1 1
#> propensity_weight adjusted_propensity_weight undefined_empirical
#> 1 7.31250 7.31250 FALSE
#> 2 7.31250 7.31250 FALSE
#> 3 7.31250 7.31250 FALSE
#> 4 4.15625 4.15625 FALSE
#> 5 4.15625 4.15625 FALSE
#> 6 4.15625 4.15625 FALSE
#> perfectly_predicted_endpoint adjusted assigned
#> 1 FALSE FALSE TRUE
#> 2 FALSE FALSE TRUE
#> 3 FALSE FALSE TRUE
#> 4 FALSE FALSE TRUE
#> 5 FALSE FALSE TRUE
#> 6 FALSE FALSE TRUE