amadeus is a
mechanism for data,
environments, and user
setup for common environmental and weather datasets in
R. amadeus has been developed to improve access to and
utility with large scale, publicly available environmental data in
R.
See the peer-reviewed publication, Amadeus: Accessing and analyzing large scale environmental data in R, for full description and details.
Cite amadeus as: > Manware, M., Song, I., Marques, E.
S., Kassien, M. A., Clark, L. P., & Messier, K. P. (2025). Amadeus:
Accessing and analyzing large scale environmental data in R.
Environmental Modelling & Software, 186, 106352.
amadeus can be installed from CRAN with
install.packages or from GitHub with pak.
install.packages("amadeus")pak::pak("NIEHS/amadeus")download_data accesses and downloads raw geospatial data
from a variety of open source data repositories. The function is a
wrapper that calls source-specific download functions, each of which
account for the source’s unique combination of URL, file naming
conventions, and data types. Download functions cover the following
sources:
| Data Source | File Type | Data Genre | Spatial Extent | Function Suffix |
|---|---|---|---|---|
| Climatology Lab TerraClimate | netCDF | Meteorology | Global | _terraclimate |
| Climatology Lab GridMet | netCDF | Climate Water |
Contiguous United States | _gridmet |
| Köppen-Geiger Climate Classification | GeoTIFF | Climate Classification | Global | _koppen_geiger |
| MRLC1 Consortium National Land Cover Database (NLCD) | GeoTIFF | Land Use | United States | _nlcd |
| USDA CropScape Cropland Data Layer (CDL) | GeoTIFF | Land Use Agriculture |
United States | _cropscape |
| NASA2 Moderate Resolution Imaging Spectroradiometer (MODIS) | HDF | Atmosphere Meteorology Land Use Satellite |
Global | _modis |
| NASA Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) | netCDF | Atmosphere Meteorology |
Global | _merra2 |
| NASA SEDAC3 UN WPP-Adjusted Population Density | GeoTIFF netCDF |
Population | Global | _population |
| NASA SEDAC Global Roads Open Access Data Set | Shapefile Geodatabase |
Roadways | Global | _groads |
| USGS4 Hydrologic Unit Codes (HUC) | Geodatabase Shapefile |
Hydrology | United States | _huc |
| NASA Goddard Earth Observing System Composition Forcasting (GEOS-CF) | netCDF | Atmosphere Meteorology |
Global | _geos |
| EDGAR Emissions Database for Global Atmospheric Research | netCDF TXT |
Emissions | Global | _edgar |
| NOAA Hazard Mapping System Fire and Smoke Product | Shapefile KML |
Wildfire Smoke | North America | _hms |
| NOAA GOES Aerosol Detection Product (ADP) | netCDF | Atmosphere Satellite |
Americas Pacific |
_goes |
| NOAA NCEP5 North American Regional Reanalysis (NARR) | netCDF | Atmosphere Meteorology |
North America | _narr |
| PRISM Climate Group | netCDF ASCII Grid GRIB2 |
Meteorology Climate |
Contiguous United States | _prism |
| Drought indices ([SPEI](https://spei.csic.es), [EDDI](https://downloads.psl.noaa.gov/Projects/EDDI/CONUS_archive/data/), [USDM](https://droughtmonitor.unl.edu)) | netCDF ASCII Grid Shapefile |
Drought | Global Contiguous United States |
_drought |
| US EPA6 Air Data Pre-Generated Data Files | CSV | Air Pollution | United States | _aqs |
| IMPROVE aerosol monitoring program | TXT (pipe-delimited) | Air Pollution Aerosols |
United States | _improve |
| US EPA Ecoregions | Shapefile | Climate Regions | North America | _ecoregions |
| US EPA National Emissions Inventory (NEI) | CSV | Emissions | United States | _nei |
| US EPA Toxic Release Inventory (TRI) Program | CSV | Chemicals Pollution |
United States | _tri |
| USGS7 Global Multi-resolution Terrain Elevation Data (GMTED2010) | ESRI ASCII Grid | Elevation | Global | _gmted |
See the “download_data” vignette for a detailed description of source-specific download functions.
For TRI, download_tri() can retrieve EPA annual basic
data files for the nationwide dataset
(jurisdiction = "US"), individual states or territories
(jurisdiction = "AZ", "NC", etc.), and the
tribal file (jurisdiction = "tbl").
setup_nasa_token()Many NASA-hosted datasets require an Earthdata Login bearer token. In
amadeus, this includes modis,
merra2, geos, and population
(NASA SEDAC). Use setup_nasa_token() to store the token
before calling the corresponding download_*() functions.
See vignette("protected_datasets", package = "amadeus") for
more detail.
setup_nasa_token() supports three storage methods:
method = "renviron" writes
NASA_EARTHDATA_TOKEN to ~/.Renviron for
persistent personal use; method = "file" writes a local
token file such as ~/.nasa_earthdata_token; and
method = "session" uses Sys.setenv() for the
current R session only.
setup_nasa_token() # prompts interactively
setup_nasa_token(method = "renviron", token = "<your_token>")Never commit Earthdata tokens to git or include them in shared
scripts. Prefer method = "renviron" on personal machines,
and method = "session" for shared systems or CI jobs where
the token is supplied from a CI secret.
Example use of download_data using NOAA NCEP North
American Regional Reanalysis’s (NARR) “weasd” (Daily Accumulated Snow at
Surface) variable.
directory <- "/ EXAMPLE / FILE / PATH /"
download_data(
dataset_name = "narr",
year = 2022,
variable = "weasd",
directory_to_save = directory,
acknowledgement = TRUE,
download = TRUE,
hash = TRUE
)Downloading requested files...
Requested files have been downloaded.
[1] "5655d4281b76f4d4d5bee234c2938f720cfec879"
list.files(file.path(directory, "weasd"))[1] "weasd.2022.nc"
process_covariates imports and cleans raw geospatial
data (downloaded with download_data), and returns a single
SpatRaster or SpatVector into the user’s R
environment. process_covariates “cleans” the data by
defining interpretable layer names, ensuring a coordinate reference
system is present, and managing `timedata (if applicable).
To avoid errors when using process_covariates,
do not edit the raw downloaded data files. Passing
user-generated or edited data into process_covariates may
result in errors as the underlying functions are adapted to each
sources’ raw data file type.
Example use of process_covariates using the downloaded
“weasd” data.
weasd_process <- process_covariates(
covariate = "narr",
date = c("2022-01-01", "2022-01-05"),
variable = "weasd",
path = file.path(directory, "weasd"),
extent = NULL
)Detected monolevel data...
Cleaning weasd data for 2022...
Returning daily weasd data from 2022-01-01 to 2022-01-05.
weasd_processclass : SpatRaster
dimensions : 277, 349, 5 (nrow, ncol, nlyr)
resolution : 32462.99, 32463 (x, y)
extent : -16231.49, 11313351, -16231.5, 8976020 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=5632642.22547 +y_0=4612545.65137 +datum=WGS84 +units=m +no_defs
source : weasd.2022.nc:weasd
varname : weasd (Daily Accumulated Snow at Surface)
names : weasd_20220101, weasd_20220102, weasd_20220103, weasd_20220104, weasd_20220105
unit : kg/m^2, kg/m^2, kg/m^2, kg/m^2, kg/m^2
time : 2022-01-01 to 2022-01-05 UTC
calculate_covariates stems from the beethoven
project’s need for various types of data extracted at precise locations.
calculate_covariates, therefore, extracts data from the
“cleaned” SpatRaster or SpatVector object at
user defined locations. Users can choose to buffer the locations. The
function returns a data.frame, sf, or
SpatVector with data extracted at all locations for each
layer or row in the SpatRaster or SpatVector
object, respectively.
Example of calculate_covariates using processed “weasd”
data.
locs <- data.frame(id = "001", lon = -78.8277, lat = 35.95013)
weasd_covar <- calculate_covariates(
covariate = "narr",
from = weasd_process,
locs = locs,
locs_id = "id",
radius = 0,
geom = "sf"
)Detected `data.frame` extraction locations...
Calculating weasd covariates for 2022-01-01...
Calculating weasd covariates for 2022-01-02...
Calculating weasd covariates for 2022-01-03...
Calculating weasd covariates for 2022-01-04...
Calculating weasd covariates for 2022-01-05...
Returning extracted covariates.
weasd_covarSimple feature collection with 5 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 8184606 ymin: 3523283 xmax: 8184606 ymax: 3523283
Projected CRS: unnamed
id time weasd_0 geometry
1 001 2022-01-01 0.000000000 POINT (8184606 3523283)
2 001 2022-01-02 0.000000000 POINT (8184606 3523283)
3 001 2022-01-03 0.000000000 POINT (8184606 3523283)
4 001 2022-01-04 0.000000000 POINT (8184606 3523283)
5 001 2022-01-05 0.001953125 POINT (8184606 3523283)
amadeus builds on terra and
exactextractr, which are C++-backed and efficient for
individual raster, vector, and extraction operations. For large spatial
or temporal domains, however, the cumulative wall-clock cost of many
process_*() or calculate_*() calls can still
be significant.
These workloads are often embarrassingly parallel across dates,
variables, or location chunks. See
vignette("computational_considerations", package = "amadeus")
for examples using sequential baselines, process-level parallelism, and
reproducible pipeline tools.
The amadeus package has been developed as part of the
National Institute of Environmental Health Science’s (NIEHS) Connecting
Health Outcomes Research Data Systems (CHORDS) program. CHORDS aims to
“build and strengthen data infrastructure for patient-centered outcomes
research on environment and health” by providing curated data, analysis
tools, and educational resources. As the CHORDS project comes to an end
in FY26, it is being absorbed into the larger NIH Health and Extreme
Weather program and the NIH Accelerator program
(https://www.niehs.nih.gov/research/programs/chords/hew-data).
amadeus is being actively developed and maintained by
the SET group at NIEHS. Future development will focus on expanding the
number of data sources and datasets covered, improving the efficiency of
download and processing functions, and adding new functionality for
calculating covariates and analyzing data.
amadeus, we will be adding functions to access and process
datasets created by individual researchers. If you are an environmental
health researcher with a dataset that you would like to see added to
amadeus, please reach out via the issues tab
on GitHub and add a tag new dataset to your issue.amadeus, please reach out via the
issues tab on GitHub and add a tag
new covariate calculation to your issue.issues tab on GitHub for bug reports and will work to fix
any bugs that are reported in a timely manner. If you encounter a bug
while using amadeus, please report it via the
issues tab on GitHub and add a tag bug to your
issue.The following R packages can also be used to access environmental and
weather data in R, but each differs from amadeus in the
data sources covered or type of functionality provided.
| Package | Source |
|---|---|
dataRetrieval |
USGS Hydrological Data and EPA Water Quality Data |
daymetr |
Daymet |
ecmwfr |
ECMWF Reanalysis v5 (ERA5) |
rNOMADS |
NOAA Operational Model Archive and Distribution System |
sen2r8 |
Sentinel-2 |
eddi |
EDDI |
heat |
[Harmonized Environmental Exposure Aggregation Tools] (https://github.com/echolab-stanford) |
The long-term sustainability and continuous improvements and
development of amadeus is relying on contributions from
agentic AI products. GitHub Copilot is currently being used to assist
with code development, documentation, and testing. To ensure the quality
and reliability of the package, all contributions are reviewed and
extensively tested by the maintainers before being merged into the main
branch.
To add or edit functionality for new data sources or datasets, open a
Pull request into
the main branch with a detailed description of the proposed changes.
Pull requests must pass all status checks, and then will be approved or
rejected by amadeus’s authors.
Utilize Issues to notify the authors of bugs, questions, or recommendations. Identify each issue with the appropriate label to help ensure a timely response.
Multi-Resolution Land Characteristics↩︎
National Aeronautics and Space Administration↩︎
Socioeconomic Data and Applications Center↩︎
United States Geological Survey↩︎
National Centers for Environmental Prediction↩︎
United States Environmental Protection Agency↩︎
United States Geological Survey↩︎
Archived; no longer maintained.↩︎