Skip to contents

Returns one row per terminal endpoint per actual class, containing the CTA-derived stabilized propensity-style weights described in Yarnold and Linden (2017). All values are computed on demand from the stored leaf class counts; no refitting, no prediction, and no training-data recomputation is performed.

Formula: For endpoint \(s\) and actual class \(z\), $$w_{s,z} = \frac{n_s \cdot \Pr(Z=z)}{n_{s,z}}$$ where \(n_s\) is the endpoint denominator, \(n_{s,z}\) is the raw count of class \(z\) observations at endpoint \(s\), and \(\Pr(Z=z)\) is the marginal class probability across the full classified analytic sample.

Perfect endpoints: When \(n_{s,z} = 0\) for some class, the empirical weight is undefined (Inf). When adjusted = TRUE (default), one hypothetical misclassified observation is added to the absent class profile - and to the global marginal totals - so that all endpoint x class cells yield finite adjusted weights. This is the canon remedy from Yarnold and Linden (2017).

Scope: Raw observation counts (n_raw) are used exclusively. The function does not return observation-level weights; those require endpoint membership per training observation, which is not stored on the fitted tree.

Usage

cta_propensity_weights(tree, target_class = NULL, adjusted = TRUE)

Arguments

tree

A cta_tree from oda_cta_fit.

target_class

Integer (or coercible); annotation column only - does not filter output rows. NULL (default) uses the numerically largest class label for binary trees, and stops for trees with three or more classes.

adjusted

Logical. TRUE (default) applies the one-hypothetical-misclassification adjustment so that all cells yield finite adjusted weights. FALSE leaves undefined weights as Inf and adjusted columns equal to empirical.

Value

A data.frame with one row per terminal endpoint per actual class, with columns:

endpoint_id

Integer sequential endpoint index.

endpoint_node_id

Integer tree node identifier.

path

Character; AND-joined branch labels from root.

terminal_prediction

Integer majority-class prediction.

class

Character; actual class label for this row.

target_class

Integer; design-annotation class label.

class_n

Integer; raw count of this class at this endpoint (empirical \(n_{s,z}\)).

endpoint_n

Integer; total raw observations at this endpoint (empirical \(n_s\)).

marginal_class_n

Integer; total raw observations of this class across all endpoints (empirical \(N_z\)).

marginal_total_n

Integer; total classified observations across all endpoints (empirical \(N\)).

marginal_class_probability

Numeric; empirical marginal class probability \(\Pr(Z=z) = N_z / N\).

propensity_weight

Numeric; empirical stabilized weight \(n_s \cdot \Pr(Z=z) / n_{s,z}\). Inf when class_n == 0.

undefined_empirical

Logical; TRUE when class_n == 0.

perfectly_predicted_endpoint

Logical; TRUE when any class has class_n == 0 at this endpoint.

adjusted

Logical; TRUE when the one-hypothetical-observation adjustment was applied to this row.

adjusted_class_n

Numeric; class_n + 1 where adjusted, otherwise class_n.

adjusted_endpoint_n

Numeric; endpoint denominator after adjustment.

adjusted_marginal_class_n

Numeric; global class count after all hypothetical additions.

adjusted_marginal_total_n

Numeric; global total after all hypothetical additions.

adjusted_marginal_class_probability

Numeric; adjusted marginal class probability.

adjusted_propensity_weight

Numeric; adjusted weight. Finite whenever adjusted_class_n > 0.

For a no-tree fit the returned data frame has zero rows but the correct column structure and types.

References

Yarnold PR, Linden A (2017). Computing propensity score weights for CTA models involving perfectly predicted endpoints. Optimal Data Analysis, 6, 43-46.

Examples

data(mtcars)
X    <- mtcars[, c("cyl", "disp", "hp", "wt")]
y    <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_propensity_weights(tree)
#>   endpoint_id endpoint_node_id     path terminal_prediction class target_class
#> 1           1                2  wt>3.18                   1     0            1
#> 2           1                2  wt>3.18                   1     1            1
#> 3           2                3 wt<=3.18                   0     0            1
#> 4           2                3 wt<=3.18                   0     1            1
#>   class_n endpoint_n marginal_class_n marginal_total_n
#> 1       2         14               19               32
#> 2      12         14               13               32
#> 3      17         18               19               32
#> 4       1         18               13               32
#>   marginal_class_probability propensity_weight undefined_empirical
#> 1                    0.59375         4.1562500               FALSE
#> 2                    0.40625         0.4739583               FALSE
#> 3                    0.59375         0.6286765               FALSE
#> 4                    0.40625         7.3125000               FALSE
#>   perfectly_predicted_endpoint adjusted adjusted_class_n adjusted_endpoint_n
#> 1                        FALSE    FALSE                2                  14
#> 2                        FALSE    FALSE               12                  14
#> 3                        FALSE    FALSE               17                  18
#> 4                        FALSE    FALSE                1                  18
#>   adjusted_marginal_class_n adjusted_marginal_total_n
#> 1                        19                        32
#> 2                        13                        32
#> 3                        19                        32
#> 4                        13                        32
#>   adjusted_marginal_class_probability adjusted_propensity_weight
#> 1                             0.59375                  4.1562500
#> 2                             0.40625                  0.4739583
#> 3                             0.59375                  0.6286765
#> 4                             0.40625                  7.3125000