Endpoint-level propensity-score weights for a fitted CTA tree

Returns one row per terminal endpoint per actual class, containing the CTA-derived stabilized propensity-style weights described in Yarnold and Linden (2017). All values are computed on demand from the stored leaf class counts; no refitting, no prediction, and no training-data recomputation is performed.

Formula: For endpoint $s$ and actual class $z$, $$w_{s,z} = \frac{n_s \cdot \Pr(Z=z)}{n_{s,z}}$$ where $n_s$ is the endpoint denominator, $n_{s,z}$ is the raw count of class $z$ observations at endpoint $s$, and $\Pr(Z=z)$ is the marginal class probability across the full classified analytic sample.

Perfect endpoints: When $n_{s,z} = 0$ for some class, the empirical weight is undefined (Inf). When adjusted = TRUE (default), one hypothetical misclassified observation is added to the absent class profile - and to the global marginal totals - so that all endpoint x class cells yield finite adjusted weights. This is the canon remedy from Yarnold and Linden (2017).

Scope: Raw observation counts (n_raw) are used exclusively. The function does not return observation-level weights; those require endpoint membership per training observation, which is not stored on the fitted tree.

Usage

cta_propensity_weights(tree, target_class = NULL, adjusted = TRUE)

Arguments

tree: A cta_tree from oda_cta_fit.
target_class: Integer (or coercible); annotation column only - does not filter output rows. NULL (default) uses the numerically largest class label for binary trees, and stops for trees with three or more classes.
adjusted: Logical. TRUE (default) applies the one-hypothetical-misclassification adjustment so that all cells yield finite adjusted weights. FALSE leaves undefined weights as Inf and adjusted columns equal to empirical.

Value

A data.frame with one row per terminal endpoint per actual class, with columns:

endpoint_id: Integer sequential endpoint index.
endpoint_node_id: Integer tree node identifier.
path: Character; AND-joined branch labels from root.
terminal_prediction: Integer majority-class prediction.
class: Character; actual class label for this row.
target_class: Integer; design-annotation class label.
class_n: Integer; raw count of this class at this endpoint (empirical $n_{s,z}$).
endpoint_n: Integer; total raw observations at this endpoint (empirical $n_s$).
marginal_class_n: Integer; total raw observations of this class across all endpoints (empirical $N_z$).
marginal_total_n: Integer; total classified observations across all endpoints (empirical $N$).
marginal_class_probability: Numeric; empirical marginal class probability $\Pr(Z=z) = N_z / N$.
propensity_weight: Numeric; empirical stabilized weight $n_s \cdot \Pr(Z=z) / n_{s,z}$. Inf when class_n == 0.
undefined_empirical: Logical; TRUE when class_n == 0.
perfectly_predicted_endpoint: Logical; TRUE when any class has class_n == 0 at this endpoint.
adjusted: Logical; TRUE when the one-hypothetical-observation adjustment was applied to this row.
adjusted_class_n: Numeric; class_n + 1 where adjusted, otherwise class_n.
adjusted_endpoint_n: Numeric; endpoint denominator after adjustment.
adjusted_marginal_class_n: Numeric; global class count after all hypothetical additions.
adjusted_marginal_total_n: Numeric; global total after all hypothetical additions.
adjusted_marginal_class_probability: Numeric; adjusted marginal class probability.
adjusted_propensity_weight: Numeric; adjusted weight. Finite whenever adjusted_class_n > 0.

For a no-tree fit the returned data frame has zero rows but the correct column structure and types.

References

Yarnold PR, Linden A (2017). Computing propensity score weights for CTA models involving perfectly predicted endpoints. Optimal Data Analysis, 6, 43-46.

Examples

data(mtcars)
X    <- mtcars[, c("cyl", "disp", "hp", "wt")]
y    <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_propensity_weights(tree)
#>   endpoint_id endpoint_node_id     path terminal_prediction class target_class
#> 1           1                2  wt>3.18                   1     0            1
#> 2           1                2  wt>3.18                   1     1            1
#> 3           2                3 wt<=3.18                   0     0            1
#> 4           2                3 wt<=3.18                   0     1            1
#>   class_n endpoint_n marginal_class_n marginal_total_n
#> 1       2         14               19               32
#> 2      12         14               13               32
#> 3      17         18               19               32
#> 4       1         18               13               32
#>   marginal_class_probability propensity_weight undefined_empirical
#> 1                    0.59375         4.1562500               FALSE
#> 2                    0.40625         0.4739583               FALSE
#> 3                    0.59375         0.6286765               FALSE
#> 4                    0.40625         7.3125000               FALSE
#>   perfectly_predicted_endpoint adjusted adjusted_class_n adjusted_endpoint_n
#> 1                        FALSE    FALSE                2                  14
#> 2                        FALSE    FALSE               12                  14
#> 3                        FALSE    FALSE               17                  18
#> 4                        FALSE    FALSE                1                  18
#>   adjusted_marginal_class_n adjusted_marginal_total_n
#> 1                        19                        32
#> 2                        13                        32
#> 3                        19                        32
#> 4                        13                        32
#>   adjusted_marginal_class_probability adjusted_propensity_weight
#> 1                             0.59375                  4.1562500
#> 2                             0.40625                  0.4739583
#> 3                             0.59375                  0.6286765
#> 4                             0.40625                  7.3125000