A data frame with 256 observations and 19 variables, formatted for use
with cta_fit and oda_fit. Derived from the
publicly available myeloma gene-expression dataset (GEO accession GSE4581),
as distributed in the survminer package.
Format
A data frame with 256 rows and 19 columns:
- V1
Survival event indicator (0 = censored, 1 = event). Used as the class variable
yin CTA/ODA.- V2
Case weight (observation time in months). Use as
wincta_fit; rows with V2 == 0 should be excluded.- V3
CCND1 gene expression.
- V4
CRIM1 gene expression.
- V5
DEPDC1 gene expression.
- V6
IRF4 gene expression.
- V7
TP53 expression / mutation burden.
- V8
WHSC1 gene expression.
- V9
Molecular group: Cyclin D-1 (binary).
- V10
Molecular group: Cyclin D-2 (binary).
- V11
Molecular group: Hyperdiploid (binary).
- V12
Molecular group: Low bone disease (binary).
- V13
Molecular group: MAF (binary).
- V14
Molecular group: MMSET (binary).
- V15
Molecular group: Proliferation (binary).
- V16
Chr1q21 status: 2 copies (binary).
- V17
Chr1q21 status: 3 copies (binary).
- V18
Chr1q21 status: 4+ copies (binary).
- V19
Chr1q21 status: NA-coded (binary). Missing values are coded as -9 (
miss_codes = -9).
Source
Derived from the myeloma dataset in the survminer package.
Original data: NCBI GEO accession GSE4581. No PHI; no institutional data.
See tests/testthat/fixtures/myeloma/README.md in the source tree.
Details
This dataset is used throughout the oda documentation and vignettes to illustrate weighted CTA, MINDENOM constraints, LOO STABLE validation, and missing-code handling. Reference CTA.exe golden outputs for MINDENOM = 1, 30, and 56 are used as regression anchors.
Use miss_codes = -9 and w = myeloma$V2 when calling
cta_fit. With mindenom = 1, the enumerated CTA tree roots
at V14 with a V15 child (OVERALL ESS = 26.32%, WEIGHTED ESS = 27.69%).
With mindenom = 30, the selected tree is a V17 stump
(WEIGHTED ESS = 16.51%). With mindenom = 56, no admissible
tree exists.