Skip to contents

ODA can incorporate a directional hypothesis stated a priori. Specifying a direction constrains both the training search and the Monte Carlo permutation test, yielding a one-tailed p-value consistent with the stated hypothesis. The training ESS and confusion matrix are unaffected - only the p-value changes.

This article covers three directional modes:

Attribute type Mode Argument
Binary / ordered Chapter 2 directional direction = "greater" or "less"
Categorical (fixed partition) Chapter 4 directional direction_map = c(...)
Categorical (identity map, k = C) Chapter 4 directional direction = "ascending" or "descending"

Why directional hypotheses?

ODA’s nondirectional default searches all possible rules and reports the best one, using a two-tailed MC p-value. When theory or prior evidence specifies the direction of the effect before data collection, a directional hypothesis is appropriate:

  • It restricts the search (or the MC evaluation) to the hypothesised direction.
  • The resulting p-value is one-tailed, which is more powerful under the correct direction.
  • Using a directional test after seeing the data is not valid - the direction must be specified a priori.

Chapter 2: binary ordered directional

For a binary or ordered attribute, direction = "greater" asserts that larger attribute values predict class 1; direction = "less" asserts that smaller values predict class 1.

The Refugee Act example

The Refugee Act of 1980 was sponsored by Democrats. The directional hypothesis is that Democratic affiliation (party = 1) predicts a Pro vote (vote = 1).1

library(oda)

# party: 0 = Republican, 1 = Democrat
# vote:  0 = Con, 1 = Pro
vote  <- c(rep(0L, 118), rep(0L,  78), rep(1L,  34), rep(1L, 177))
party <- c(rep(0L, 118), rep(1L,  78), rep(0L,  34), rep(1L, 177))
fit_dir <- oda_fit(
  x         = party,
  y         = vote,
  attr_type = "ordered",
  direction = "greater",   # larger party (Democrat) predicts vote = 1 (Pro)
  mc_iter   = 500L,
  mc_seed   = 42L,
  loo       = "on"
)
print(fit_dir)
#> 
#> ODA (binary)  attr_type=ordered  priors=TRUE  n=407
#> 
#> Rule: <= 0.5 --> 0   |   > 0.5 --> 1
#> 
#>   CLASS       n     PAC
#>       0     196   60.2%
#>       1     211   83.9%
#> 
#>   Mean PAC: 72.05%   ESS: 44.09%  p(MC): < .001
#> 
#>   -- LOO --
#>   CLASS       n     PAC
#>       0     196   60.2%
#>       1     211   83.9%
#> 
#>   LOO ESS: 44.09%  p(LOO): < .001
# For comparison: nondirectional (two-tailed MC p)
fit_nd <- oda_fit(
  x = party, y = vote, attr_type = "ordered",
  mc_iter = 500L, mc_seed = 42L, loo = "on"
)
# ESS is identical; only p(MC) changes (directional p <= nondirectional p / 2)

The training rule (party <= 0.5 -> Con; party > 0.5 -> Pro) and ESS = 44.09% are the same as the nondirectional analysis. The directional p-value is one-tailed and smaller than the two-tailed p.

Notes:

  • The attribute here is binary (0/1). For a binary attribute, attr_type = "ordered" is correct - ODA treats it as ordered and finds the cut at 0.5, which separates the two values.
  • direction = "greater" means higher attribute values predict class 1. For a binary attribute, this is equivalent to “attribute = 1 predicts class 1.”
  • LOO is fully supported for binary ordered directional fits.

Chapter 4: categorical fixed-partition directional

For a nominal attribute, a fixed-partition directional hypothesis assigns each attribute category to a class a priori via direction_map. ODA evaluates only this mapping - no partition search is performed. The MC test permutes the y labels while holding the fixed mapping constant.

The gully erosion example

In a study of gully erosion in southeast Nigeria, four adjustment types were classified according to whether they require organised community effort.2 The hypothesis is that Ridges (1) and Shifting Habitation (2) predict Community motivation, while Relocation (3) and Intensified Cultivation (4) predict Individual motivation.

motivation  <- c(rep(0L,  85), rep(1L, 173),   # adjustment = 1
                 rep(0L,  65), rep(1L, 170),   # adjustment = 2
                 rep(0L, 172), rep(1L,  10),   # adjustment = 3
                 rep(0L,  45), rep(1L,   0))   # adjustment = 4
adjustment  <- c(rep(1L, 258), rep(2L, 235),
                 rep(3L, 182), rep(4L,  45))
fit_gully <- oda_fit(
  x             = adjustment,
  y             = motivation,
  attr_type     = "categorical",
  direction_map = c("1" = 1L, "2" = 1L, "3" = 0L, "4" = 0L),
  mc_iter       = 500L,
  mc_seed       = 42L,
  loo           = "on"
)
print(fit_gully)
#> 
#> ODA (binary)  attr_type=categorical  priors=TRUE  n=720
#> 
#> Rule: {3, 4} --> 0   |   {1, 2} --> 1
#> 
#>   CLASS       n     PAC
#>       0     367   59.1%
#>       1     353   97.2%
#> 
#>   Mean PAC: 78.15%   ESS: 56.30%  p(MC): < .001
#> 
#>   -- LOO --
#>   CLASS       n     PAC
#>       0     367   59.1%
#>       1     353   97.2%
#> 
#>   LOO ESS: 56.30%  p(LOO): < .001

Because the prediction rule is fixed a priori, every LOO fold applies the same mapping to the same x values - LOO ESS equals training ESS exactly, and the fit is trivially stable. The MC p-value is directional (one-tailed) because each permutation evaluates the same fixed partition.

Notes:

  • direction_map is a named integer vector. Names are attribute levels (as character), values are class assignments.
  • All attribute levels present in x must appear in direction_map.
  • The training ESS and confusion are identical to the nondirectional result when the fixed partition happens to be the global optimum - which it is here.

LOO reporting for categorical directional fits:

With loo = "on", binary fixed-map categorical ODA reports the held-out confusion, LOO ESS, and a binary one-tailed Fisher LOO p-value. The one-tailed Fisher LOO p comes from the 2x2 held-out classification table (MPE p. 34); it is not the MC p-value. MC p and LOO p are separate calculations - MC p permutes y labels under the fixed mapping; LOO p comes from the hold-out table. They are directionally consistent but are not numerically identical.

For multicategorical fits (direction = "ascending" or nondirectional with C > 2 classes), loo = "on" reports held-out confusion and ESS but no LOO p-value. There is no canon-aligned C x C Fisher LOO p for multiclass ODA.

Chapter 4: categorical identity map (direction = "ascending")

When the attribute has exactly k = C categories matching C class values, direction = "ascending" imposes the identity mapping (attribute value i predicts class i) without requiring an explicit direction_map.

The protein type example

biological_type <- c(
  rep(1L, 98), rep(2L, 13), rep(3L,  6), rep(4L,  7),
  rep(1L, 16), rep(2L, 50), rep(3L,  4), rep(4L, 19),
  rep(1L,  5), rep(2L,  2), rep(3L, 23), rep(4L, 14),
  rep(1L,  3), rep(2L,  8), rep(3L, 12), rep(4L, 45)
)
amino_acid_type <- c(rep(1L, 124), rep(2L, 89), rep(3L, 44), rep(4L, 68))
fit_protein <- oda_fit(
  x         = amino_acid_type,
  y         = biological_type,
  attr_type = "categorical",
  direction = "ascending",   # identity map: amino acid i -> biological type i
  mc_iter   = 500L,
  mc_seed   = 42L,
  loo       = "off"
)
summary(fit_protein)
#> 
#> ODA Summary (multiclass)  status=valid  n=325
#>   attr_type=categorical  priors=TRUE  weights=FALSE
#>   Rule: 1 --> 1   |   2 --> 2   |   3 --> 3   |   4 --> 4
#> 
#>   -- Train --
#>     Mean PAC: 63.22%   ESS: 50.96%
#>     p(MC): < .001  [MC permutation, two-tailed]

With direction = "ascending" and k = C = 4, the engine does not search alternative partitions. The MC test permutes y labels while holding the identity mapping fixed. Because the identity map is also the globally optimal categorical mapping for this dataset, training ESS = 50.96% is the same as a nondirectional analysis; the p-value is smaller (one-tailed).

Notes:

  • direction = "ascending" requires k (number of unique attribute values) equals C (number of class values). If they differ, an error is raised.
  • direction = "descending" reverses the mapping (attribute value k -> class 1, attribute value 1 -> class C).
  • Multiclass categorical LOO is supported (loo = "on"). LOO reports held-out confusion and ESS but no Fisher LOO p-value (undefined for C > 2). LOO folds search nondirectionally regardless of the direction setting; for this dataset every fold recovers the identity map, so LOO ESS equals training ESS. This example omits LOO for brevity; see protein-type-multiclass-oda for the full LOO analysis.

When NOT to use directional: nondirectional default

The nondirectional default (direction = "both" for ordered, or no direction_map for categorical) is appropriate when:

  • No prior hypothesis specifies the direction.
  • The direction is chosen after seeing the data.
  • The analysis is exploratory.

Applying a directional constraint that matches the observed direction inflates the p-value precision without being a valid one-tailed test - direction must be declared before the data are seen.

Further reading

  • vignettes/refugee-act-oda.Rmd - full CRAN vignette for the binary ordered directional example
  • vignettes/gully-adjustment-oda.Rmd - nondirectional MegaODA parity run for the gully example; this article demonstrates the directional fixed-map alternative
  • vignettes/protein-type-multiclass-oda.Rmd - nondirectional MegaODA parity run for the protein example; this article demonstrates the directional identity-map alternative
  • articles/oda-basics - ODA foundations (nondirectional)
  • articles/multiclass-oda - multiclass ODA