Note. All code chunks in this vignette are set to
eval = FALSEto keep CRAN check times within limits, as the bootstrap and permutation procedures are computationally intensive. All code is fully executable in an interactive R session. Precomputed results for all three pipelines are stored ininst/extdata/and can be loaded withreadRDS(system.file("extdata", "results_bin.rds", package = "SEPA"))etc. Full output and figures are reported in the accompanying manuscript (Kim and Grochowalski, 2019, doi:10.1007/s00357-018-9277-7).
The SEPA package implements the Segment Profile Extraction via Pattern Analysis method for row-mean-centered multivariate data. The three automated workflow functions are:
alsi_workflow() — binary data via multiple
correspondence analysis (MCA)alsi_workflow_ordinal() — ordinal Likert-type data via
homals alternating least squares (ALS) optimal scalingcalsi_workflow() — continuous multivariate data via
ipsatized singular value decomposition (SVD)All three pipelines share a common structure:
This example illustrates the alsi_workflow() pipeline
using binary diagnostic data from N = 1,261 individuals assessed for
eating disorders.
data("ANR2", package = "SEPA")
vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD")
head(ANR2[, vars])Diagnostic prevalence varies substantially: MDD is the most common diagnosis (44.3%), followed by DEP and ANX, while DYS is the least prevalent (4.7%).
The following chunk shows the exact call used to generate the
precomputed results stored in
inst/extdata/results_bin.rds.
The first three observed eigenvalues exceed their permutation-based 95th- percentile reference values, supporting retention of a K* = 3-dimensional MCA subspace. These three dimensions account for approximately 48% of total inertia.
Median principal angles are 2.77°, 6.94°, and 15.46° for Dimensions 1–3, all well below the 20° threshold. Tucker congruence coefficients range from phi = 0.978 to phi = 0.992. All three dimensions pass the dual criterion, yielding K* = 3.
Variance weights are 0.4345, 0.2979, and 0.2676 for Dimensions 1–3. ALSI values range from 0.040 to 1.625 (M = 0.373, Mdn = 0.368).
This example illustrates the alsi_workflow_ordinal()
pipeline using the ten Extraversion items (E1–E10) from the Big Five
Inventory (BFI; N = 500).
BFI <- read.csv(system.file("extdata",
"BFI_Original_Ordinal_N500.csv",
package = "SEPA"))
items <- paste0("E", 1:10)
reversed_items <- c("E2", "E4", "E6", "E8", "E10")
head(BFI[, items])freq_table <- sapply(BFI[, items], function(x) table(factor(x, 1:5)))
round(100 * freq_table / nrow(BFI), 1)Response frequencies are well distributed across the 1–5 scale for all ten items, with no category falling below the 2% rare-category threshold.
The first four observed eigenvalues exceed their 95th-percentile reference values, supporting an initial K_PA = 4-dimensional solution.
Dimensions 1–3 satisfy both stability thresholds simultaneously. Dimension 4 fails the angle criterion (median theta = 24.39° > 20°), yielding K* = 3. All 1,000 bootstrap resamples converged successfully (skipped = 0).
print(results_ord)
cat("oALSI summary:\n")
print(summary(results_ord$ALSI_index))
cat("\noALSI (z-scored) summary:\n")
print(summary(results_ord$ALSI_z))Variance weights for K* = 3 are 0.4815, 0.3307, and 0.1878. The ordinal ALSI distribution is slightly negatively skewed, ranging from -0.014 to 0.025 (Mdn = -0.001, M = 0.000).
This example illustrates the calsi_workflow() pipeline
using N = 900 individuals assessed on p = 9 domain scores from the
WAIS-IV and WMS-IV cognitive batteries.
wawm4 <- read.csv(system.file("extdata", "wawm4.csv", package = "SEPA"))
domains <- c("VC", "PR", "WO", "PS", "IM", "DM", "VWM", "VM", "AM")
X <- wawm4[, domains]
cat("N =", nrow(X), " p =", ncol(X), "\n")Domain means ranged from approximately 99 to 101 and standard
deviations from approximately 14 to 16, consistent with the standard
score metric (normative M = 100, SD = 15). Row-mean-centering is applied
internally by calsi_workflow().
Horn’s parallel analysis supported retention of four dimensions, accounting for approximately 78.28% of total variance in the row-mean-centered solution.
All four dimensions satisfy both stability thresholds (median principal angles 0.13°-10.37°, all < 20°; Tucker congruence 0.987-0.999, all >= 0.95), yielding K* = 4.
Variance weights for K* = 4 are 0.3833, 0.2481, 0.2222, and 0.1465. cALSI values range from 1.58 to 32.53 (M = 11.81, Mdn = 10.96, SD = 5.09). Processing Speed (PS, 21.5%) contributes most to the retained profile subspace.