Skip to contents

Executes staged attribute-set identification on binary class data. Traverses the attribute space by class, selecting the best eligible attribute at each step, removing correctly classified observations, and repeating on the unresolved sample until a stopping condition is met. The result identifies which attributes to pass to downstream CTA or MDSA.

Usage

sda_fit(
  X,
  y,
  mode = c("novometric_min_d", "unioda_max_ess"),
  attr_types = NULL,
  weights = NULL,
  mindenom = NULL,
  mc_iter = 5000L,
  mc_seed = 42L,
  mc_stop = 99.9,
  mc_stopup = NA,
  alpha = 0.05,
  loo = "off",
  max_steps = NULL,
  min_n = NULL,
  min_class_n = NULL,
  remove_correct = TRUE,
  collinearity = c("skip", "warn", "allow"),
  verbose = FALSE
)

Arguments

X

Data frame of candidate attribute columns.

y

Integer class vector. Must have exactly two distinct values.

mode

SDA mode. "novometric_min_d" (MPE-canon; per-attribute MDSA via cta_descendant_family(); requires mindenom) or "unioda_max_ess" (iterative UniODA; default for SDA-1). Must be declared explicitly; do not mix modes.

attr_types

Named character vector of attribute types ("ordered", "categorical", "binary"), or NULL for auto-detection. Names must match column names of X.

weights

Case weights. Must be NULL in SDA-1; weighted SDA is not yet implemented and will error if non-NULL.

mindenom

Integer MINDENOM (novometric mode only; ignored with warning in unioda_max_ess mode).

mc_iter

Maximum Monte Carlo iterations per attribute fit. Default 5000L.

mc_seed

RNG seed set once before the SDA run. Default 42L.

mc_stop

Lower-tail early-stop confidence (percent). Default 99.9.

mc_stopup

Upper-tail early-stop confidence (percent). Default NA (disabled; matches MegaODA behavior).

alpha

Significance threshold for p-value gate. Default 0.05.

loo

LOO mode passed to oda_fit(). Default "off".

max_steps

Maximum number of SDA steps (safety cap). Default NULL (no cap beyond candidate exhaustion).

min_n

Minimum working-sample size. If unresolved n drops below this, stop with "min_n". Default NULL.

min_class_n

Minimum per-class count. Stop with "min_class_n" if either class falls below this. Default NULL.

remove_correct

Logical. If TRUE (canonical SDA), remove correctly classified observations after each step. If FALSE, diagnostic dry-run: step logic executes but working sample is not modified. Default TRUE.

collinearity

How to handle duplicate candidate columns: "skip" (silent), "warn", or "allow". Default "skip".

verbose

Logical. Emit [SDA] progress messages. Default FALSE.

Value

Object of class c("sda_fit", "odacore_sda").