Executes staged attribute-set identification on binary class data. Traverses the attribute space by class, selecting the best eligible attribute at each step, removing correctly classified observations, and repeating on the unresolved sample until a stopping condition is met. The result identifies which attributes to pass to downstream CTA or MDSA.
Usage
sda_fit(
X,
y,
mode = c("novometric_min_d", "unioda_max_ess"),
attr_types = NULL,
weights = NULL,
mindenom = NULL,
mc_iter = 5000L,
mc_seed = 42L,
mc_stop = 99.9,
mc_stopup = NA,
alpha = 0.05,
loo = "off",
max_steps = NULL,
min_n = NULL,
min_class_n = NULL,
remove_correct = TRUE,
collinearity = c("skip", "warn", "allow"),
verbose = FALSE
)Arguments
- X
Data frame of candidate attribute columns.
- y
Integer class vector. Must have exactly two distinct values.
- mode
SDA mode.
"novometric_min_d"(MPE-canon; per-attribute MDSA viacta_descendant_family(); requiresmindenom) or"unioda_max_ess"(iterative UniODA; default for SDA-1). Must be declared explicitly; do not mix modes.- attr_types
Named character vector of attribute types (
"ordered","categorical","binary"), orNULLfor auto-detection. Names must match column names ofX.- weights
Case weights. Must be
NULLin SDA-1; weighted SDA is not yet implemented and will error if non-NULL.- mindenom
Integer MINDENOM (novometric mode only; ignored with warning in unioda_max_ess mode).
- mc_iter
Maximum Monte Carlo iterations per attribute fit. Default 5000L.
- mc_seed
RNG seed set once before the SDA run. Default 42L.
- mc_stop
Lower-tail early-stop confidence (percent). Default 99.9.
- mc_stopup
Upper-tail early-stop confidence (percent). Default NA (disabled; matches MegaODA behavior).
- alpha
Significance threshold for p-value gate. Default 0.05.
- loo
LOO mode passed to
oda_fit(). Default"off".- max_steps
Maximum number of SDA steps (safety cap). Default
NULL(no cap beyond candidate exhaustion).- min_n
Minimum working-sample size. If unresolved n drops below this, stop with
"min_n". DefaultNULL.- min_class_n
Minimum per-class count. Stop with
"min_class_n"if either class falls below this. DefaultNULL.- remove_correct
Logical. If
TRUE(canonical SDA), remove correctly classified observations after each step. IfFALSE, diagnostic dry-run: step logic executes but working sample is not modified. DefaultTRUE.- collinearity
How to handle duplicate candidate columns:
"skip"(silent),"warn", or"allow". Default"skip".- verbose
Logical. Emit
[SDA]progress messages. DefaultFALSE.