Complete survey workflow: Example Bay

Overview

This vignette walks through a complete oystermapR workflow using a simulated survey of Example Bay — a fictional sheltered coastal inlet with realistic physical characteristics. The full pipeline is covered:

Loading acoustic Doppler (ADCP) current data
Loading bathymetric soundings
Loading CTD water-quality data (temperature, salinity, chlorophyll_a)
Merging all sensor datasets onto a common grid
Running quality control
Predicting suitability for Ostrea edulis (European Flat Oyster)
Scoring wave exposure and HAB risk
Exporting a GeoTIFF for QGIS

The example data files (example_bay_adcp.csv and example_bay_soundings.xyz) are included in the package and represent a simulated mid-summer survey.

library(oystermapR)

Survey context

Example Bay is a sheltered inlet approximately 6 km × 4 km in extent, with depths ranging from 2 m at the shoreline to around 22 m in the central channel. A six-hour ADCP transect was run on a flood tide during June. Bathymetric soundings were collected concurrently using a single-beam echosounder.

The target species is Ostrea edulis, the European Flat Oyster, whose optimal conditions include: depths of 1–20 m, current speeds of 0.05–0.4 m/s, temperatures of 5–25°C, and salinities of 25–35 PSU.

Step 1: Load ADCP data

read_nortek_adcp() reads the Nortek Signature 500 merged CSV export. It auto-detects velocity bins, spatially averages ensembles onto a grid, and derives bed shear stress from the near-bed velocity profile. The spatial_res argument controls the decimal places used for lat/lon grid binning; 2 gives approximately 1 km cells, suitable for a bay-scale survey.

adcp_file <- system.file("extdata", "example_bay_adcp.csv", package = "oystermapR")

adcp <- read_nortek_adcp(
  file        = adcp_file,
  spatial_res = 2,
  verbose     = TRUE
)

head(adcp[, c("lat", "lon", "current_velocity", "shear_stress")])

The output contains one row per grid cell. current_velocity is the mean near-bed flow magnitude; shear_stress is the estimated bed shear stress in N/m², used by predict_oyster() for suspension-feeder scoring.

Step 2: Load bathymetric soundings

read_soundings_xyz() reads a space-delimited XYZ point cloud, grids and averages the depths, and derives slope and rugosity via finite differences.

xyz_file <- system.file("extdata", "example_bay_soundings.xyz", package = "oystermapR")

bathy <- read_soundings_xyz(
  file          = xyz_file,
  spatial_res   = 2,
  min_soundings = 5,
  verbose       = TRUE
)

head(bathy[, c("lat", "lon", "depth", "slope", "roughness")])

slope is the maximum downslope gradient in degrees at each grid cell; values above ~15° typically exclude oyster settlement in the scoring model. roughness is a dimensionless rugosity index (1.0 = flat; higher = more complex substrate).

Step 3: Load CTD water-quality data

The ADCP and echosounder provide hydrodynamic and bathymetric variables, but predict_oyster() can also use water-quality data when available. The Example Bay survey included a 5 × 4 grid of CTD casts recording temperature, salinity, and chlorophyll_a, stored as a plain CSV.

read_generic_csv() handles any tabular sensor export with flexible column matching. Column names are recognised automatically when they follow standard conventions (lat/lon, temperature, salinity, chlorophyll_a). For instruments that export non-standard headers, supply an explicit col_map:

# Example with non-standard headers from a YSI EXO sonde export:
ctd <- read_generic_csv(
  "ysi_export.csv",
  col_map = c(lat = "GPS_Lat", lon = "GPS_Lon",
              temperature = "Temp_C", salinity = "Sal_PSU")
)

For Example Bay the column names match the oystermapR conventions directly:

ctd_file <- system.file("extdata", "example_bay_ctd.csv", package = "oystermapR")

ctd <- read_generic_csv(
  file        = ctd_file,
  spatial_res = 2,
  verbose     = TRUE
)

head(ctd[, c("lat", "lon", "temperature", "salinity", "chlorophyll_a")])

Spatially averaging at spatial_res = 2 collapses the four replicate casts per station to a single value per ~1 km grid cell.

Step 4: Merge all sensor datasets

merge_sensor_data() performs a full outer join of any number of sensor dataframes on rounded lat/lon grid keys. Cells present in only one source are retained with NA for the missing variables, so no data is discarded. All three layers — ADCP, bathymetry, and CTD — are combined in a single call:

survey <- merge_sensor_data(adcp = adcp, bathy = bathy, ctd = ctd)

cat("Merged survey:", nrow(survey), "grid cells\n")
cat("Columns:", paste(names(survey), collapse = ", "), "\n")

A substrate_hardness column would typically come from a sidescan mosaic processed via read_sonar_tif(). Here we add a representative value for a mixed shell-gravel substrate, typical of productive inshore oyster ground.

set.seed(101)
n <- nrow(survey)
# 0 = soft mud, 1 = hard rock; 0.3--0.7 = shell/gravel mix
survey$substrate_hardness <- round(runif(n, 0.30, 0.70), 2)

Step 5: Quality control

qc_survey_data() applies three complementary checks to every numeric column:

Range check — values outside physically plausible bounds for that variable (species-agnostic; e.g. salinity above 42 PSU) are flagged "range_fail".
Statistical outlier — values beyond iqr_k × IQR from the median are flagged "outlier" (default k = 3, Tukey’s outer fence).
Cross-variable sanity — physically implausible combinations (e.g. dissolved oxygen above 20 mg/L with temperature below 5°C) are flagged.

Setting apply_flags = TRUE replaces flagged values with NA so they are silently skipped downstream rather than causing erroneous scores.

survey_clean <- qc_survey_data(
  df          = survey,
  apply_flags = TRUE,
  verbose     = TRUE
)

# Count any flags raised across all columns
flag_cols <- grep("^qc_flag_", names(survey_clean), value = TRUE)
n_flagged <- sum(sapply(survey_clean[flag_cols], function(x) sum(!is.na(x) & x != "pass")))
cat("Total flagged values replaced with NA:", n_flagged, "\n")

The QC step is non-destructive by default (apply_flags = FALSE) — it adds qc_flag_<variable> columns so you can inspect which cells were problematic before deciding whether to exclude them.

Step 6: Predict suitability

predict_oyster() applies AHP-weighted scoring rules to each available variable, combines them into a suitability score in [0, 1], and classifies locations as High / Moderate / Low / Very Low / Excluded.

result <- predict_oyster(
  data    = survey_clean,
  species = "ostrea_edulis",
  verbose = TRUE
)

# Summary of suitability classes
table(result$suitability_class)

# Mean score and range
cat(sprintf(
  "Suitability: mean = %.2f, range = %.2f -- %.2f\n",
  mean(result$suitability, na.rm = TRUE),
  min(result$suitability,  na.rm = TRUE),
  max(result$suitability,  na.rm = TRUE)
))

The result dataframe retains all input columns plus suitability, suitability_class, and per-variable component scores (score_depth, score_current_velocity, etc.).

Step 7: Risk and disturbance scoring

oystermapR includes optional risk modules that can be appended to the result. Here we add wave exposure (derived from the current speed data and fetch geometry) and a simple HAB risk score.

# Wave exposure: uses current_velocity and depth as proxies for fetch exposure
result <- score_wave_exposure(result, verbose = FALSE)

# HAB risk: without live ICES data, scores from chlorophyll_a alone
result <- score_hab_risk(result, verbose = FALSE)

cat("Wave exposure range:", round(range(result$wave_exposure, na.rm=TRUE), 3), "\n")
cat("HAB risk range:     ", round(range(result$hab_risk,      na.rm=TRUE), 3), "\n")

Step 8: Export GeoTIFF for QGIS

export_geotiff() interpolates the suitability scores onto a regular raster and writes a GeoTIFF. export_qml_style() writes a matching QGIS colour-ramp style file (.qml) so the layer renders immediately with the standard oystermapR colour scheme (red = excluded, green = high suitability).

# Write GeoTIFF and companion QGIS style file
export_geotiff(
  df         = result,
  path       = "example_bay_suitability.tif",
  resolution = 0.001,
  contours   = TRUE
)
export_qml_style("example_bay_suitability.tif")

Load example_bay_suitability.tif into QGIS via Layer → Add Layer → Add Raster Layer, then right-click the layer and choose Load Layer Style to apply the .qml file.

Inspecting component scores

The result dataframe includes a score_<variable> column for every variable that was scored. Comparing these helps diagnose which environmental factor is the main limiting constraint at a site.

score_cols <- grep("^score_", names(result), value = TRUE)
# Mean component score per variable (higher = more suitable)
col_means <- sort(colMeans(result[score_cols], na.rm = TRUE))
print(round(col_means, 3))

Variables scoring consistently below 0.5 are the main limiting factors for Ostrea edulis at this site. Scores near 1.0 indicate that variable is not constraining growth.

Next steps

Validation — if you have historical presence/absence records for the site, pass them to validate_against_records() to calculate AUC, TSS, and F1 against model predictions.
Bayesian tolerance updating — update_species_tolerances() can refine the O. edulis scoring parameters from field observations at this site.
Multi-species comparison — run the same workflow for "magallana_gigas" and call compare_species() to see which species is better suited to each grid cell.
Live data — fetch_live_environmental_data() can replace the simulated temperature and salinity with real-time CMEMS model output or ICES observational data; see ?oystermapR_live_config for credential setup.