MeSH Tables

puremoe ships three MeSH reference tables: a thesaurus of descriptors and entry terms, a tree of hierarchical classifications, and a frequency table of descriptor occurrence across PubMed.

library(puremoe)
library(dplyr)
library(DT)

MeSH thesaurus

data_mesh_thesaurus() downloads and combines the MeSH Descriptor Thesaurus and Supplementary Concept Records (SCR). One row per term, including synonyms and entry terms for each descriptor.

thesaurus <- puremoe::data_mesh_thesaurus()
thesaurus |>
  head(20) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))

MeSH trees

data_mesh_trees() provides the hierarchical classification structure. Each descriptor can appear in multiple branches; tree_location encodes the full path (e.g., I01.880.604 = Social Sciences > Political Science > Political Systems).

trees <- puremoe::data_mesh_trees()
trees |>
  head(20) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))

MeSH descriptor frequencies

data_mesh_frequencies is a bundled dataset giving the frequency of each MeSH descriptor across the full PubMed corpus (39.7 M PMIDs, April 2026). Proportions use the total corpus as denominator, making them suitable as a baseline for enrichment analyses against arbitrary PubMed subsets.

puremoe::data_mesh_frequencies |>
  head(20) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))

Persistent storage

Both datasets are ~10 MB and fetched from GitHub on each call by default. To avoid re-downloading every session, set use_persistent_storage = TRUE — the files are cached to a system data directory and reused on subsequent calls.

thesaurus <- puremoe::data_mesh_thesaurus(use_persistent_storage = TRUE)
trees     <- puremoe::data_mesh_trees(use_persistent_storage = TRUE)