Introduction to easyLSEA

easyLSEA authors

2026-06-08

Overview

easyLSEA provides a complete pipeline for Lipid Set Enrichment Analysis (LSEA) in R. Starting from a table of differential lipid abundances, it annotates lipids into biological groups, tests whether those groups are systematically shifted between conditions, and produces publication-ready bubble plots, distribution plots, and fatty acid chain visualizations.

The package runs two complementary enrichment engines:

Running both engines and comparing their results (convergence analysis) gives a more complete picture of lipid remodeling than either method alone.


Installation

Install the stable version from CRAN:

install.packages("easyLSEA")

Or the development version from GitHub:

# install.packages("remotes")
remotes::install_github("DavidGO464/easyLSEA")

To enable the fgsea engine, install the optional Bioconductor dependency:

BiocManager::install("fgsea")

Input data

easyLSEA expects a data.frame with at least:

Column Description
LipidName Lipid identifier in standard shorthand notation (e.g. PC 36:4, TG 54:3)
logFC log2 fold-change (case vs reference)
P.Value Raw (unadjusted) p-value — used for the fgsea pi-value rank metric

Additional columns are used when present:

Column Used for
adj.P.Val Counting significantly changed lipids
Confidence_rank Annotation confidence filtering
Shorthand Alternative lipid name fallback

Column names are configurable via lipid_col, fc_col, and pval_col arguments — only the defaults are shown above.


Quick start

The entire pipeline runs in a single call to easyLSEA():

library(easyLSEA)

result <- easyLSEA(
  data      = my_lipid_data,
  lipid_col = "LipidName",
  fc_col    = "logFC",
  pval_col  = "P.Value",
  case_lbl  = "NASH",
  ref_lbl   = "Control",
  engine    = "both",   # run KS and fgsea
  min_rank  = "E"       # include all confidence ranks except P and NA (default)
)

This returns an easyLSEA_result object with five slots described in the next section.


Understanding the output

result$meta

A named list with run metadata: date, comparison labels, engine used, number of lipids, and the original function call.

result$meta$date
result$meta$case_lbl
result$meta$n_lipids

result$lsea

Contains the enrichment statistics. The key sub-elements are:

# KS results — one row per lipid set per grouping level
head(result$lsea$ks)

# fgsea results
head(result$lsea$fgsea)

# Combined table with Convergence column
head(result$lsea$combined)

Key columns in the KS results:

Column Description
Group Lipid set name (e.g. PC, Glycerolipids)
Level Grouping level (LipidClass, LipidCategory_LMAPS, LipidCategory_functional)
N_group Number of lipids in the set
DirectionalScore Standardized mean difference (Cohen’s d analog). Positive = up in case.
KS_pval Two-sided KS p-value
FDR_LSEA BH-adjusted FDR
DS_perm_pval Permutation p-value for the DirectionalScore
ContributingLipids_KS Lipids on the enriched side of the CDF divergence point

Key columns in the fgsea results:

Column Description
NES Normalized Enrichment Score. Positive = enriched toward top of ranked list (up in case).
FDR_fgsea BH-adjusted FDR from fgsea
N_leading Leading edge size
LeadingEdge Lipids in the leading edge
rank_metric Rank metric used (pi_value, logFC, or t_stat)

The Convergence column in the combined table classifies each set as:

Value Meaning
KS+fgsea [strongest] Significant by both engines — highest confidence
KS only [distributed effect] Moderate shift across many lipids
fgsea only [extreme-driven] A few strongly regulated lipids drive the signal
Neither Not significant by either engine

result$chains

Fatty acid chain analysis results, available when run_chains = TRUE:

# Long format — one row per acyl chain per lipid
head(result$chains$parsed)

# Parsing status — one row per lipid
head(result$chains$summary)

# Wide format — one row per lipid with sn positions and totals
head(result$chains$wide)

The wide table is the most convenient for reporting. Each row is one lipid, with columns sn1, sn2, sn3, sn4 containing the individual acyl chain positions (e.g. "18:1"), and total_carbons / total_unsat with the summed totals. The chain_type column clarifies how to interpret the sn columns:

chain_type sn1 sn2 sn3 sn4
sn2 (PC, PE, PS…) sn-1 chain sn-2 chain NA NA
nacyl (Cer, SM…) sphingoid base N-acyl chain NA NA
long_format (TG) chain 1 chain 2 chain 3 NA
long_format (CL) chain 1 chain 2 chain 3 chain 4
single (CAR, LPC) the chain NA NA NA

result$plots

A named list of ggplot2 objects, available when plots = TRUE:

# List all available plots
names(result$plots$lsea)
names(result$plots$chains)

Plot naming convention for LSEA bubble plots:

Name pattern Description
bubble_ks_01_Class KS bubble plot — lipid class level
bubble_ks_sig_01_Class KS bubble plot — significant sets only
bubble_fgsea_01_Class fgsea bubble plot — lipid class level
bubble_fgsea_sig_01_Class fgsea bubble plot — significant sets only
dist_01_Class Distribution (boxplot) — lipid class level

Levels: 01_Class (lipid class), 02_LMAPS (LIPID MAPS category), 03_Functional (functional category).

result$input

The annotated input data and the grouping columns used:

# Annotated data with LipidClass, LipidCategory_LMAPS, etc.
head(result$input$data)

# Grouping columns tested
result$input$group_cols

Viewing plots

Individual plots can be displayed directly:

# KS bubble plot — all lipid classes
result$plots$lsea$bubble_ks_01_Class

# fgsea bubble plot — significant sets only
result$plots$lsea$bubble_fgsea_sig_01_Class

# Distribution plot — lipid class level
result$plots$lsea$dist_01_Class

To customize bubble labels:

# Regenerate plots showing only FDR and n
plots <- plot_lsea(
  result$lsea,
  case_lbl     = "NASH",
  ref_lbl      = "Control",
  bubble_label = c("FDR", "n")
)

Exporting results

export_lsea() saves all results to a timestamped folder. Supply the output directory explicitly via dir (here a temporary directory; for a real analysis use a folder of your choice):

export_lsea(
  result,
  dir    = tempdir(),
  format = c("csv", "excel", "pdf")
)

This creates:

easyLSEA_NASH_vs_Control_2024-01-15_1430/
  tables/
    lsea_results_ks.csv
    lsea_results_fgsea.csv
    lsea_combined.csv
    chain_results.csv
  plots/
    lsea/
      01_Class/
        bubble_ks_01_Class.pdf
        bubble_ks_sig_01_Class.pdf
        bubble_fgsea_01_Class.pdf
        bubble_fgsea_sig_01_Class.pdf
        dist_01_Class.pdf
      02_LMAPS/  ...
      03_Functional/  ...
    chains/
      tile/  ...
      trend/  ...
  results.xlsx

Advanced usage

Running engines separately

# Step 1: annotate
annotated <- annotate_lipids(my_lipid_data, lipid_col = "LipidName")

# Step 2: run enrichment
lsea_res <- run_lsea(
  data      = annotated,
  fc_col    = "logFC",
  engine    = "both",
  case_lbl  = "NASH",
  ref_lbl   = "Control"
)

# Step 3: generate plots manually
plots <- plot_lsea(
  lsea_res,
  case_lbl     = "NASH",
  ref_lbl      = "Control",
  fdr_thresh   = 0.05,
  bubble_label = c("FDR", "DS", "NES", "n")
)

# Step 4: distribution plot for a specific level
p_dist <- plot_distribution(
  data        = annotated,
  lsea_result = lsea_res,
  group_col   = "LipidClass",
  case_lbl    = "NASH",
  ref_lbl     = "Control"
)

Changing the fgsea rank metric

# Default: pi-value = sign(logFC) * -log10(P.Value)
# Combines effect size and statistical evidence

# Alternative: logFC only
result_fc <- easyLSEA(
  data        = my_lipid_data,
  engine      = "fgsea",
  fgsea_rank  = "logFC"
)

# Alternative: LIMMA t-statistic (requires a 't' column)
result_t <- easyLSEA(
  data        = my_lipid_data,
  engine      = "fgsea",
  fgsea_rank  = "t_stat"
)

Adjusting significance thresholds

result <- easyLSEA(
  data      = my_lipid_data,
  min_n     = 5L,      # require at least 5 lipids per set
  n_perm    = 5000L,   # more permutations for DS_perm_pval
  fgsea_nperm = 20000L # more permutations for fgsea
)

Filtering by annotation confidence rank

When lipid annotations include a confidence rank (A > B > C > D > E > P), min_rank controls which lipids enter the chain analysis:

# Default: include all except P and NA
result_all <- easyLSEA(data = my_lipid_data, min_rank = "E")

# Strict: include only high-confidence annotations (A and B)
result_strict <- easyLSEA(data = my_lipid_data, min_rank = "B")

# Or apply directly to parse_lipid_chains
chains_strict <- parse_lipid_chains(annotated, min_rank = "B")
table(chains_strict$summary$status)

Lipids excluded by min_rank appear in result$chains$summary with status excluded_rank_below_X (where X is the chosen threshold), making it easy to audit which lipids were filtered.


Interpreting results

KS vs fgsea — which to trust?

Both engines test the same null hypothesis (no systematic enrichment) but are sensitive to different signal patterns:

Convergent results (significant by both) provide the strongest evidence for coordinated lipid class remodeling.

The DirectionalScore

The DirectionalScore is a standardized mean difference (Cohen’s d analog):

DS = (mean_logFC_set - mean_logFC_background) / SD_pooled

It quantifies the direction and magnitude of the shift, independent of the KS p-value. A set can have a significant KS p-value (distributional difference) with a near-zero DirectionalScore (no net direction) — this pattern suggests heterogeneous within-set regulation.

The pi-value rank metric

By default, fgsea ranks lipids by:

pi-value = sign(logFC) × −log10(P.Value)

This combines the direction of change with statistical confidence, giving more weight to lipids that are both strongly and significantly regulated. It is preferred over logFC alone when p-values are available.


Session info

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Tahoe 26.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39     R6_2.6.1          fastmap_1.2.0     xfun_0.57        
#>  [5] cachem_1.1.0      knitr_1.51        htmltools_0.5.9   rmarkdown_2.31   
#>  [9] lifecycle_1.0.5   cli_3.6.6         sass_0.4.10       jquerylib_0.1.4  
#> [13] compiler_4.5.1    rstudioapi_0.18.0 tools_4.5.1       evaluate_1.0.5   
#> [17] bslib_0.11.0      yaml_2.3.12       otel_0.2.0        jsonlite_2.0.0   
#> [21] rlang_1.2.0