Endpoint-level propensity-score weights for a fitted CTA tree
cta_propensity_weights.RdReturns one row per terminal endpoint per actual class, containing the CTA-derived stabilized propensity-style weights described in Yarnold and Linden (2017). All values are computed on demand from the stored leaf class counts; no refitting, no prediction, and no training-data recomputation is performed.
Formula: For endpoint \(s\) and actual class \(z\), $$w_{s,z} = \frac{n_s \cdot \Pr(Z=z)}{n_{s,z}}$$ where \(n_s\) is the endpoint denominator, \(n_{s,z}\) is the raw count of class \(z\) observations at endpoint \(s\), and \(\Pr(Z=z)\) is the marginal class probability across the full classified analytic sample.
Perfect endpoints: When \(n_{s,z} = 0\) for some class, the
empirical weight is undefined (Inf). When adjusted = TRUE
(default), one hypothetical misclassified observation is added to the
absent class profile - and to the global marginal totals - so that all
endpoint x class cells yield finite adjusted weights. This is the canon
remedy from Yarnold and Linden (2017).
Scope: Raw observation counts (n_raw) are used exclusively.
The function does not return observation-level weights; those require
endpoint membership per training observation, which is not stored on the
fitted tree.
Arguments
- tree
A
cta_treefromoda_cta_fit.- target_class
Integer (or coercible); annotation column only - does not filter output rows.
NULL(default) uses the numerically largest class label for binary trees, and stops for trees with three or more classes.- adjusted
Logical.
TRUE(default) applies the one-hypothetical-misclassification adjustment so that all cells yield finite adjusted weights.FALSEleaves undefined weights asInfand adjusted columns equal to empirical.
Value
A data.frame with one row per terminal endpoint per actual class,
with columns:
endpoint_idInteger sequential endpoint index.
endpoint_node_idInteger tree node identifier.
pathCharacter; AND-joined branch labels from root.
terminal_predictionInteger majority-class prediction.
classCharacter; actual class label for this row.
target_classInteger; design-annotation class label.
class_nInteger; raw count of this class at this endpoint (empirical \(n_{s,z}\)).
endpoint_nInteger; total raw observations at this endpoint (empirical \(n_s\)).
marginal_class_nInteger; total raw observations of this class across all endpoints (empirical \(N_z\)).
marginal_total_nInteger; total classified observations across all endpoints (empirical \(N\)).
marginal_class_probabilityNumeric; empirical marginal class probability \(\Pr(Z=z) = N_z / N\).
propensity_weightNumeric; empirical stabilized weight \(n_s \cdot \Pr(Z=z) / n_{s,z}\).
Infwhenclass_n == 0.undefined_empiricalLogical;
TRUEwhenclass_n == 0.perfectly_predicted_endpointLogical;
TRUEwhen any class hasclass_n == 0at this endpoint.adjustedLogical;
TRUEwhen the one-hypothetical-observation adjustment was applied to this row.adjusted_class_nNumeric;
class_n + 1whereadjusted, otherwiseclass_n.adjusted_endpoint_nNumeric; endpoint denominator after adjustment.
adjusted_marginal_class_nNumeric; global class count after all hypothetical additions.
adjusted_marginal_total_nNumeric; global total after all hypothetical additions.
adjusted_marginal_class_probabilityNumeric; adjusted marginal class probability.
adjusted_propensity_weightNumeric; adjusted weight. Finite whenever
adjusted_class_n > 0.
For a no-tree fit the returned data frame has zero rows but the correct column structure and types.
References
Yarnold PR, Linden A (2017). Computing propensity score weights for CTA models involving perfectly predicted endpoints. Optimal Data Analysis, 6, 43-46.
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_propensity_weights(tree)
#> endpoint_id endpoint_node_id path terminal_prediction class target_class
#> 1 1 2 wt>3.18 1 0 1
#> 2 1 2 wt>3.18 1 1 1
#> 3 2 3 wt<=3.18 0 0 1
#> 4 2 3 wt<=3.18 0 1 1
#> class_n endpoint_n marginal_class_n marginal_total_n
#> 1 2 14 19 32
#> 2 12 14 13 32
#> 3 17 18 19 32
#> 4 1 18 13 32
#> marginal_class_probability propensity_weight undefined_empirical
#> 1 0.59375 4.1562500 FALSE
#> 2 0.40625 0.4739583 FALSE
#> 3 0.59375 0.6286765 FALSE
#> 4 0.40625 7.3125000 FALSE
#> perfectly_predicted_endpoint adjusted adjusted_class_n adjusted_endpoint_n
#> 1 FALSE FALSE 2 14
#> 2 FALSE FALSE 12 14
#> 3 FALSE FALSE 17 18
#> 4 FALSE FALSE 1 18
#> adjusted_marginal_class_n adjusted_marginal_total_n
#> 1 19 32
#> 2 13 32
#> 3 19 32
#> 4 13 32
#> adjusted_marginal_class_probability adjusted_propensity_weight
#> 1 0.59375 4.1562500
#> 2 0.40625 0.4739583
#> 3 0.59375 0.6286765
#> 4 0.40625 7.3125000