Skip to contents

A data frame with 256 observations and 19 variables, formatted for use with cta_fit and oda_fit. Derived from the publicly available myeloma gene-expression dataset (GEO accession GSE4581), as distributed in the survminer package.

Format

A data frame with 256 rows and 19 columns:

V1

Survival event indicator (0 = censored, 1 = event). Used as the class variable y in CTA/ODA.

V2

Case weight (observation time in months). Use as w in cta_fit; rows with V2 == 0 should be excluded.

V3

CCND1 gene expression.

V4

CRIM1 gene expression.

V5

DEPDC1 gene expression.

V6

IRF4 gene expression.

V7

TP53 expression / mutation burden.

V8

WHSC1 gene expression.

V9

Molecular group: Cyclin D-1 (binary).

V10

Molecular group: Cyclin D-2 (binary).

V11

Molecular group: Hyperdiploid (binary).

V12

Molecular group: Low bone disease (binary).

V13

Molecular group: MAF (binary).

V14

Molecular group: MMSET (binary).

V15

Molecular group: Proliferation (binary).

V16

Chr1q21 status: 2 copies (binary).

V17

Chr1q21 status: 3 copies (binary).

V18

Chr1q21 status: 4+ copies (binary).

V19

Chr1q21 status: NA-coded (binary). Missing values are coded as -9 (miss_codes = -9).

Source

Derived from the myeloma dataset in the survminer package. Original data: NCBI GEO accession GSE4581. No PHI; no institutional data. See tests/testthat/fixtures/myeloma/README.md in the source tree.

Details

This dataset is used throughout the oda documentation and vignettes to illustrate weighted CTA, MINDENOM constraints, LOO STABLE validation, and missing-code handling. Reference CTA.exe golden outputs for MINDENOM = 1, 30, and 56 are used as regression anchors.

Use miss_codes = -9 and w = myeloma$V2 when calling cta_fit. With mindenom = 1, the enumerated CTA tree roots at V14 with a V15 child (OVERALL ESS = 26.32%, WEIGHTED ESS = 27.69%). With mindenom = 30, the selected tree is a V17 stump (WEIGHTED ESS = 16.51%). With mindenom = 56, no admissible tree exists.