Staging table for a fitted CTA tree

Returns one row per terminal endpoint ordered by ascending target-class propensity (lowest to highest risk stratum). Empirical counts, proportions, and odds are computed from the stored leaf class counts. When an endpoint is perfectly predicted (100 percent one class), the empirical odds and proportion are undefined; the adjust_perfect option adds one hypothetical misclassified observation to the undefined profile so all endpoints can be ranked and compared - a canon remedy anchored in Yarnold and Linden (2017).

Scope: The two-class case is handled automatically when target_class = NULL (defaults to the numerically larger class label, typically 1). For trees with three or more classes target_class must be supplied explicitly.

Usage

cta_staging_table(tree, target_class = NULL, weighted = FALSE,
                   adjust_perfect = TRUE)

Arguments

tree: A cta_tree from oda_cta_fit.
target_class: Integer (or coercible); the class label treated as the target (positive / high-risk) class. NULL (default) uses the numerically largest class label for binary trees, and stops for trees with three or more classes.
weighted: Logical. FALSE (default) uses raw observation counts; TRUE uses case-weighted counts.
adjust_perfect: Logical. TRUE (default) applies the one-hypothetical-misclassification adjustment to perfectly predicted endpoints so that all endpoints can be ordered by propensity.

Value

A data.frame with one row per terminal endpoint, ordered by ascending target-class propensity (lowest to highest risk stratum), with columns:

stage: Integer rank 1..n, ascending by target proportion.
endpoint_id: Integer sequential endpoint index, matching cta_endpoint_summary.
endpoint_node_id: Integer tree node identifier.
path: Character; AND-joined branch labels from root.
terminal_prediction: Integer majority-class prediction.
target_class: Integer; the target class used for this table.
target_n: Numeric; raw (or weighted) count of target-class observations at this endpoint.
denominator: Numeric; total raw (or weighted) observations at this endpoint.
target_proportion: Numeric; empirical target-class proportion (target_n / denominator).
non_target_n: Numeric; denominator minus target_n.
odds: Numeric; empirical odds (target_n / non_target_n); NA when perfectly_predicted is TRUE.
perfectly_predicted: Logical; TRUE when the endpoint is 100 percent one class (target_n == 0 or non_target_n == 0).
adjusted: Logical; TRUE when the one-hypothetical-misclassification adjustment has been applied. Always FALSE when adjust_perfect = FALSE.
adjusted_target_n: Numeric; target_n after adjustment. Equal to target_n when adjusted is FALSE.
adjusted_denominator: Numeric; denominator after adjustment.
adjusted_target_proportion: Numeric; adjusted proportion.
adjusted_non_target_n: Numeric; adjusted non-target count.
adjusted_odds: Numeric; adjusted odds.
weighted: Logical; the value of the weighted argument.
n_obs: Integer; raw observation count at this endpoint (from cta_endpoint_summary).
n_weighted: Numeric; weighted observation count.

For a no-tree fit the returned data frame has zero rows but the correct column structure and types.

References

Yarnold PR, Linden A (2017). Computing propensity score weights for CTA models involving perfectly predicted endpoints. Optimal Data Analysis, 6, 43-46.

Examples

data(mtcars)
X    <- mtcars[, c("cyl", "disp", "hp", "wt")]
y    <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_staging_table(tree)
#>   stage endpoint_id endpoint_node_id     path terminal_prediction target_class
#> 1     1           2                3 wt<=3.18                   0            1
#> 2     2           1                2  wt>3.18                   1            1
#>   target_n denominator target_proportion non_target_n       odds
#> 1        1          18        0.05555556           17 0.05882353
#> 2       12          14        0.85714286            2 6.00000000
#>   perfectly_predicted adjusted adjusted_target_n adjusted_denominator
#> 1               FALSE    FALSE                 1                   18
#> 2               FALSE    FALSE                12                   14
#>   adjusted_target_proportion adjusted_non_target_n adjusted_odds weighted n_obs
#> 1                 0.05555556                    17    0.05882353    FALSE    18
#> 2                 0.85714286                     2    6.00000000    FALSE    14
#>   n_weighted
#> 1         18
#> 2         14

Usage

Arguments

Value

References

See also

Examples