Staging table for a fitted CTA tree
cta_staging_table.RdReturns one row per terminal endpoint ordered by ascending target-class
propensity (lowest to highest risk stratum). Empirical counts,
proportions, and odds are computed from the stored leaf class counts.
When an endpoint is perfectly predicted (100 percent one class), the
empirical odds and proportion are undefined; the adjust_perfect
option adds one hypothetical misclassified observation to the undefined
profile so all endpoints can be ranked and compared - a canon remedy
anchored in Yarnold and Linden (2017).
Scope: The two-class case is handled automatically when
target_class = NULL (defaults to the numerically larger class
label, typically 1). For trees with three or more classes
target_class must be supplied explicitly.
Arguments
- tree
A
cta_treefromoda_cta_fit.- target_class
Integer (or coercible); the class label treated as the target (positive / high-risk) class.
NULL(default) uses the numerically largest class label for binary trees, and stops for trees with three or more classes.- weighted
Logical.
FALSE(default) uses raw observation counts;TRUEuses case-weighted counts.- adjust_perfect
Logical.
TRUE(default) applies the one-hypothetical-misclassification adjustment to perfectly predicted endpoints so that all endpoints can be ordered by propensity.
Value
A data.frame with one row per terminal endpoint, ordered by
ascending target-class propensity (lowest to highest risk stratum),
with columns:
stageInteger rank 1..n, ascending by target proportion.
endpoint_idInteger sequential endpoint index, matching
cta_endpoint_summary.endpoint_node_idInteger tree node identifier.
pathCharacter; AND-joined branch labels from root.
terminal_predictionInteger majority-class prediction.
target_classInteger; the target class used for this table.
target_nNumeric; raw (or weighted) count of target-class observations at this endpoint.
denominatorNumeric; total raw (or weighted) observations at this endpoint.
target_proportionNumeric; empirical target-class proportion (
target_n / denominator).non_target_nNumeric; denominator minus target_n.
oddsNumeric; empirical odds (
target_n / non_target_n);NAwhenperfectly_predictedisTRUE.perfectly_predictedLogical;
TRUEwhen the endpoint is 100 percent one class (target_n == 0ornon_target_n == 0).adjustedLogical;
TRUEwhen the one-hypothetical-misclassification adjustment has been applied. AlwaysFALSEwhenadjust_perfect = FALSE.adjusted_target_nNumeric; target_n after adjustment. Equal to
target_nwhenadjustedisFALSE.adjusted_denominatorNumeric; denominator after adjustment.
adjusted_target_proportionNumeric; adjusted proportion.
adjusted_non_target_nNumeric; adjusted non-target count.
adjusted_oddsNumeric; adjusted odds.
weightedLogical; the value of the
weightedargument.n_obsInteger; raw observation count at this endpoint (from
cta_endpoint_summary).n_weightedNumeric; weighted observation count.
For a no-tree fit the returned data frame has zero rows but the correct column structure and types.
References
Yarnold PR, Linden A (2017). Computing propensity score weights for CTA models involving perfectly predicted endpoints. Optimal Data Analysis, 6, 43-46.
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_staging_table(tree)
#> stage endpoint_id endpoint_node_id path terminal_prediction target_class
#> 1 1 2 3 wt<=3.18 0 1
#> 2 2 1 2 wt>3.18 1 1
#> target_n denominator target_proportion non_target_n odds
#> 1 1 18 0.05555556 17 0.05882353
#> 2 12 14 0.85714286 2 6.00000000
#> perfectly_predicted adjusted adjusted_target_n adjusted_denominator
#> 1 FALSE FALSE 1 18
#> 2 FALSE FALSE 12 14
#> adjusted_target_proportion adjusted_non_target_n adjusted_odds weighted n_obs
#> 1 0.05555556 17 0.05882353 FALSE 18
#> 2 0.85714286 2 6.00000000 FALSE 14
#> n_weighted
#> 1 18
#> 2 14