Assign per-observation CTA propensity weights

Convenience wrapper that calls cta_assign_endpoints and cta_propensity_weights and returns a joined observation-level data frame. The cta_tree object is not mutated; all computation is on demand.

Column order requirement: newdata must have the same attribute column order as the X matrix passed to oda_cta_fit. Traversal uses the stored integer column positions (attr_col) from the fit, not column names.

Unroutable observations: Observations with NA endpoint (missing root split attribute under missing_action = "na") or NA class label receive assigned = FALSE and NA for all weight columns. The output always contains nrow(newdata) rows.

Unmatched classified observations: When a non-NA endpoint observation's class is not present in the propensity weight table (e.g., a class unseen at fit time), a warning is issued and assigned = FALSE.

Usage

cta_observation_weights(tree, newdata, y, target_class = NULL,
                        adjusted = TRUE,
                        missing_action = c("na", "majority"))

Arguments

tree: A cta_tree from oda_cta_fit.
newdata: A data.frame (or coercible object) with the same column order as the training X.
y: Class labels for each row of newdata. Any type coercible to character; length must equal nrow(newdata).
target_class: Passed to cta_propensity_weights as an annotation parameter. Identifies which class is treated as the design target (high-risk class) for the target_class output column; it does not filter the endpoint \(\times\) class rows used for the join. Each observation is matched to its own actual_class regardless of this value. NULL (default) lets cta_propensity_weights resolve the target class automatically (numerically largest class for binary trees; must be supplied explicitly for trees with three or more classes).
adjusted: Logical; passed to cta_propensity_weights. Default TRUE.
missing_action: Character; one of "na" (default) or "majority". Passed to cta_assign_endpoints.

Value

A data.frame with nrow(newdata) rows and columns:

row_id: Integer; positional row index (1 to nrow(newdata)).
actual_class: Character; class label from y, coerced to character.
endpoint_node_id: Integer; node ID of the terminal leaf reached by traversal, or NA_integer_ when unroutable.
endpoint_id: Integer; sequential endpoint index matching cta_endpoint_summary, or NA_integer_.
target_class: Integer; resolved design target class annotation from cta_propensity_weights, or NA_integer_ when unassigned.
propensity_weight: Numeric; unadjusted propensity weight for the observation's endpoint–class cell, or NA when unassigned.
adjusted_propensity_weight: Numeric; adjusted propensity weight (Yarnold-Linden correction for perfectly predicted endpoints), or NA when unassigned.
undefined_empirical: Logical; TRUE when the endpoint–class cell has zero observed frequency, or NA when unassigned.
perfectly_predicted_endpoint: Logical; TRUE when all observations at the endpoint belong to one class, or NA when unassigned.
adjusted: Logical; TRUE when the adjusted weight was applied at this endpoint, or NA when unassigned.
assigned: Logical; TRUE when a propensity weight was successfully matched for this observation.

Details

No observation-level data are stored in the cta_tree object at fit time. This function performs traversal and weight lookup on demand.

No-tree fits: When the tree has no splits (leaf-only), all rows have endpoint_id = NA_integer_ and assigned = FALSE.

Join semantics: The join key is paste(endpoint_id, actual_class). Each observation is matched to the propensity weight row whose class equals its actual_class. The target_class parameter annotates all rows with the resolved design target class but does not affect which rows participate in the join.

Examples

data(mtcars)
X    <- mtcars[, c("cyl", "disp", "hp", "wt")]
y    <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
ow   <- cta_observation_weights(tree, X, y)
head(ow)
#>   row_id actual_class endpoint_node_id endpoint_id target_class
#> 1      1            1                3           2            1
#> 2      2            1                3           2            1
#> 3      3            1                3           2            1
#> 4      4            0                2           1            1
#> 5      5            0                2           1            1
#> 6      6            0                2           1            1
#>   propensity_weight adjusted_propensity_weight undefined_empirical
#> 1           7.31250                    7.31250               FALSE
#> 2           7.31250                    7.31250               FALSE
#> 3           7.31250                    7.31250               FALSE
#> 4           4.15625                    4.15625               FALSE
#> 5           4.15625                    4.15625               FALSE
#> 6           4.15625                    4.15625               FALSE
#>   perfectly_predicted_endpoint adjusted assigned
#> 1                        FALSE    FALSE     TRUE
#> 2                        FALSE    FALSE     TRUE
#> 3                        FALSE    FALSE     TRUE
#> 4                        FALSE    FALSE     TRUE
#> 5                        FALSE    FALSE     TRUE
#> 6                        FALSE    FALSE     TRUE

Usage

Arguments

Value

Details

See also

Examples