Skip to contents

Returns one row per terminal endpoint ordered by ascending target-class propensity (lowest to highest risk stratum). Empirical counts, proportions, and odds are computed from the stored leaf class counts. When an endpoint is perfectly predicted (100 percent one class), the empirical odds and proportion are undefined; the adjust_perfect option adds one hypothetical misclassified observation to the undefined profile so all endpoints can be ranked and compared - a canon remedy anchored in Yarnold and Linden (2017).

Scope: The two-class case is handled automatically when target_class = NULL (defaults to the numerically larger class label, typically 1). For trees with three or more classes target_class must be supplied explicitly.

Usage

cta_staging_table(tree, target_class = NULL, weighted = FALSE,
                   adjust_perfect = TRUE)

Arguments

tree

A cta_tree from oda_cta_fit.

target_class

Integer (or coercible); the class label treated as the target (positive / high-risk) class. NULL (default) uses the numerically largest class label for binary trees, and stops for trees with three or more classes.

weighted

Logical. FALSE (default) uses raw observation counts; TRUE uses case-weighted counts.

adjust_perfect

Logical. TRUE (default) applies the one-hypothetical-misclassification adjustment to perfectly predicted endpoints so that all endpoints can be ordered by propensity.

Value

A data.frame with one row per terminal endpoint, ordered by ascending target-class propensity (lowest to highest risk stratum), with columns:

stage

Integer rank 1..n, ascending by target proportion.

endpoint_id

Integer sequential endpoint index, matching cta_endpoint_summary.

endpoint_node_id

Integer tree node identifier.

path

Character; AND-joined branch labels from root.

terminal_prediction

Integer majority-class prediction.

target_class

Integer; the target class used for this table.

target_n

Numeric; raw (or weighted) count of target-class observations at this endpoint.

denominator

Numeric; total raw (or weighted) observations at this endpoint.

target_proportion

Numeric; empirical target-class proportion (target_n / denominator).

non_target_n

Numeric; denominator minus target_n.

odds

Numeric; empirical odds (target_n / non_target_n); NA when perfectly_predicted is TRUE.

perfectly_predicted

Logical; TRUE when the endpoint is 100 percent one class (target_n == 0 or non_target_n == 0).

adjusted

Logical; TRUE when the one-hypothetical-misclassification adjustment has been applied. Always FALSE when adjust_perfect = FALSE.

adjusted_target_n

Numeric; target_n after adjustment. Equal to target_n when adjusted is FALSE.

adjusted_denominator

Numeric; denominator after adjustment.

adjusted_target_proportion

Numeric; adjusted proportion.

adjusted_non_target_n

Numeric; adjusted non-target count.

adjusted_odds

Numeric; adjusted odds.

weighted

Logical; the value of the weighted argument.

n_obs

Integer; raw observation count at this endpoint (from cta_endpoint_summary).

n_weighted

Numeric; weighted observation count.

For a no-tree fit the returned data frame has zero rows but the correct column structure and types.

References

Yarnold PR, Linden A (2017). Computing propensity score weights for CTA models involving perfectly predicted endpoints. Optimal Data Analysis, 6, 43-46.

Examples

data(mtcars)
X    <- mtcars[, c("cyl", "disp", "hp", "wt")]
y    <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_staging_table(tree)
#>   stage endpoint_id endpoint_node_id     path terminal_prediction target_class
#> 1     1           2                3 wt<=3.18                   0            1
#> 2     2           1                2  wt>3.18                   1            1
#>   target_n denominator target_proportion non_target_n       odds
#> 1        1          18        0.05555556           17 0.05882353
#> 2       12          14        0.85714286            2 6.00000000
#>   perfectly_predicted adjusted adjusted_target_n adjusted_denominator
#> 1               FALSE    FALSE                 1                   18
#> 2               FALSE    FALSE                12                   14
#>   adjusted_target_proportion adjusted_non_target_n adjusted_odds weighted n_obs
#> 1                 0.05555556                    17    0.05882353    FALSE    18
#> 2                 0.85714286                     2    6.00000000    FALSE    14
#>   n_weighted
#> 1         18
#> 2         14