| Title: | Predict and Map Oyster Growth Suitability from Environmental Data |
| Version: | 1.4.0 |
| Description: | Predicts spatial suitability for oyster growth from environmental survey data using Analytic Hierarchy Process (AHP) weighted scoring. Users supply sensor data from Acoustic Doppler Current Profilers (ADCP), Conductivity-Temperature-Depth (CTD) sensors, bathymetric sonar, and sidescan sonar, specify a target species, and receive per-location suitability scores, a 'GeoTIFF' heatmap for 'QGIS', contour lines, and a formatted PDF or HTML report. Supports fourteen species across global aquaculture regions, including Ostrea edulis, Magallana gigas, Crassostrea virginica, Crassostrea hongkongensis, and ten further species; see list_species(). Includes season-aware scoring, tidal height correction, Bayesian tolerance parameter updating from field observations, spatial block cross-validation (Roberts et al., 2017, <doi:10.1111/ecog.02881>), permutation variable importance, wave exposure and sediment stability modules, Harmful Algal Bloom (HAB) risk and anthropogenic disturbance scoring with optional live International Council for the Exploration of the Sea (ICES) data integration, hybrid larval dispersal connectivity scoring (union-find Gaussian kernel plus optional 'OpenDrift' or Finite Volume Community Ocean Model ('FVCOM') connectivity matrix), and batch multi-species comparison. |
| License: | GPL (≥ 3) |
| URL: | https://github.com/trissyboats/oystermapR |
| BugReports: | https://github.com/trissyboats/oystermapR/issues |
| Encoding: | UTF-8 |
| Language: | en-US |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.3.3 |
| Imports: | dplyr (≥ 1.0.0), terra (≥ 1.7.18), sf (≥ 1.0.0), rlang (≥ 1.0.0), cli (≥ 3.0.0), stats, utils |
| Suggests: | ggplot2 (≥ 3.4.0), testthat (≥ 3.0.0), knitr, rmarkdown (≥ 2.20), tinytex, httr (≥ 1.4.0), jsonlite (≥ 1.8.0), ncdf4 (≥ 1.19), gstat (≥ 2.1.0), sp (≥ 1.6.0), lubridate (≥ 1.9.0), suncalc (≥ 0.5.0), oce (≥ 1.7.0) |
| Depends: | R (≥ 4.1.0) |
| NeedsCompilation: | no |
| Packaged: | 2026-05-11 09:40:42 UTC; home |
| Author: | T Tucker |
| Maintainer: | T Tucker <tristantucker48@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-15 19:30:02 UTC |
oystermapR: Predict and Map Oyster Growth Suitability
Description
oystermapR provides a rule-based, AHP-weighted suitability scoring
pipeline for predicting where conditions are most likely to support oyster
growth. Users supply a tabular dataset of environmental and physical
measurements (collected in the field or from remote sensing), specify a
target species, and receive:
Per-location suitability scores [0, 1]
Categorical suitability classes (High / Moderate / Low / Very Low / Excluded)
Per-variable component scores for diagnosis and reporting
A GeoTIFF raster file for heatmap visualisation in QGIS
A QGIS colour-ramp style file (.qml)
Contour lines for overlay mapping
A formatted PDF or HTML report
Supported species
| Key | Latin name | Common name | Region |
ostrea_edulis | Ostrea edulis | European flat oyster | NE Atlantic, North Sea, Mediterranean |
magallana_gigas | Magallana gigas | Pacific oyster | Cosmopolitan (introduced) |
crassostrea_angulata | Crassostrea angulata | Portuguese oyster | Iberian Peninsula, W Atlantic |
ostrea_stentina | Ostrea stentina | Denticulate flat oyster | Mediterranean, Mar Menor |
ostrea_lurida | Ostrea lurida | Olympia oyster | NE Pacific: Alaska to Baja California |
Use list_species() to see all species with their data quality ratings.
Typical workflow
library(oystermapR)
# Load your survey data
df <- read.csv("my_survey.csv")
# Run the prediction
result <- predict_oyster(
data = df,
species = "ostrea_edulis",
output_geotiff = "oyster_suitability.tif",
verbose = TRUE
)
# Inspect high-suitability sites
subset(result, suitability_class == "High")
# Export matching QGIS colour style
export_qml_style("oyster_suitability.tif")
Sensor compatibility
The package is designed around data collectable with:
-
Nortek Signature 500 ADCP – current velocity, shear stress, turbidity proxy (read with
read_nortek_adcp()) -
Ping 3DSS iDX450 PRO Bathymetric Sonar – depth, slope, roughness (read with
read_sonar_tif()) -
Lowrance + BioBase – depth, bottom hardness, vegetation, sidescan
-
Salinity/temperature probes – temperature, salinity, dissolved oxygen
Validation and model diagnostics
Validation functions for research-grade use:
-
validate_against_records(): AUC, TSS, Brier score, F1, sensitivity, specificity vs known presence/absence records. -
spatial_block_cv(): Spatial block cross-validation (Roberts et al. 2017) – avoids inflated AUC from spatially autocorrelated data. -
permutation_importance(): Variable importance by AUC drop after permuting each scored variable. -
sensitivity_analysis(): Partial dependence curve – suitability vs a single variable while others are held at their median.
Bayesian tolerance updating
Tolerance parameters (optimal ranges) can be updated from field observations:
-
update_species_tolerances(): MAP + Laplace approximation (fast) or RWMH MCMC (full posterior). Sequential – posterior becomes next prior. -
get_tolerance_posteriors(): Retrieve updated parameters with 95% CIs. -
save_tolerance_update()/load_tolerance_update(): Persist across sessions. -
reset_tolerance_update(): Revert to built-in prior parameters.
Risk and disturbance modules
Optional scored overlays (all fetch_live = FALSE by default):
-
score_predation_risk(): Starfish, crab, snail predation pressure. -
score_hab_risk(): Harmful algal bloom risk (PSP/ASP/DSP/AZP). -
score_anthropogenic_disturbance(): Bottom trawling (ICES VMS SAR), anchor damage, dredging/extraction. -
score_wave_exposure(): JONSWAP wave height from fetch and wind speed. -
score_sediment_stability(): Shields parameter mobility analysis. -
add_shellfish_classification(): UK/EU harvesting area classification (Class A/B/C/Prohibited; EC Regulation 854/2004).
Live data integration
Set credentials once; all live fetches are off by default:
oystermapR_live_config(cmems_user = "...", cmems_password = "...") result_live <- fetch_live_environmental_data(survey)
Multi-species comparison
-
compare_species(): Compare suitability across multiple species at the same locations. -
compare_surveys(): Compare two survey datasets for the same species. -
composite_seasonal(): Merge summer/winter surveys into a composite score.
Author(s)
Maintainer: T Tucker tristantucker48@gmail.com (ORCID)
References
Roberts D.R. et al. (2017) Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40:913-929. doi:10.1111/ecog.02881
Breiman L. (2001) Random Forests. Machine Learning 45:5-32.
Gelman A. et al. (2013) Bayesian Data Analysis, 3rd ed. CRC Press.
Eigaard O.R. et al. (2017) The footprint of bottom trawling in European waters. ICES J Mar Sci 74:1700-1710.
Hasselmann K. et al. (1973) Measurements of wind-wave growth and swell decay during the Joint North Sea Wave Project (JONSWAP). Dtsch Hydrogr Z Suppl A8.
See Also
Useful links:
Report bugs at https://github.com/trissyboats/oystermapR/issues
Apply any cached Bayesian updates to a tolerance list
Description
Called internally by get_species_tolerances() and predict_oyster() to
transparently substitute updated parameters when a Bayesian update exists
for a species in the session cache. End users rarely need to call this
directly.
Usage
.apply_bayesian_update(tols, species)
Arguments
tols |
Tolerance list from |
species |
Character. Species key. |
Value
Updated tolerance list (or unchanged if no cache entry found).
Print a simple ASCII ROC curve to the console
Description
Print a simple ASCII ROC curve to the console
Usage
.ascii_roc(fpr, tpr, auc, width = 50, height = 20)
Automatic column name matching for common sensor exports
Description
Automatic column name matching for common sensor exports
Usage
.auto_map_columns(df, verbose = TRUE)
Internal: detect sidelobe-contaminated bins
Description
Internal: detect sidelobe-contaminated bins
Usage
.detect_sidelobe_bins(speed_matrix, max_plausible_speed, sidelobe_threshold)
Arguments
speed_matrix |
Numeric matrix, rows = ensembles, cols = bins (shallow to deep). |
max_plausible_speed |
Numeric. Speed threshold in m/s. |
sidelobe_threshold |
Numeric. Contamination fraction threshold [0,1]. |
Value
Logical vector, length = n_bins. TRUE = contaminated.
Internal: distance (metres) from each raster cell to the nearest survey point
Description
For each cell centre, finds the minimum Euclidean distance to any survey point and converts to approximate metres. Used as the uncertainty band. Chunked to stay within memory budget.
Usage
.distance_to_nearest_m(pts_lon, pts_lat, rast)
Find first matching column (case-insensitive)
Description
Find first matching column (case-insensitive)
Usage
.find_col_any(df, candidates)
Retrieve cached tolerance list (or NULL if not updated)
Description
Retrieve cached tolerance list (or NULL if not updated)
Usage
.get_cached_tolerances(species)
Haversine distance in km between two WGS84 points
Description
Haversine distance in km between two WGS84 points
Usage
.haversine_km(lat1, lon1, lat2, lon2)
Internal IDW interpolation (no external dependencies)
Description
Vectorised Inverse Distance Weighting using base-R matrix operations.
Computes a weighted mean of point values for every cell centre in a raster
template, using only points within max_dist degrees of each cell.
Usage
.idw_interpolate(pts_lon, pts_lat, values, rast, power = 2, max_dist = 0.02)
Arguments
pts_lon, pts_lat |
Numeric vectors of point coordinates. |
values |
Numeric vector of values to interpolate (same length as pts_*). |
rast |
A |
power |
Numeric. Distance decay exponent. |
max_dist |
Numeric. Search radius in same units as coordinates. |
Value
Numeric vector of interpolated values, one per raster cell.
Mode helper (most common value in a vector)
Description
Mode helper (most common value in a vector)
Usage
.mode(x)
Compute Schureman nodal amplitude (f) and phase (u) corrections
Description
Computes the 18.6-year lunar nodal modulation factors f (amplitude scale) and u (phase correction in degrees) for M2, S2, N2, K1, O1 at a given epoch.
Formulae from Schureman (1940) Tables 2 & 3; same as those used by UKHO, NOAA, and most modern tidal prediction software.
Usage
.nodal_factors(t_mean)
Arguments
t_mean |
POSIXct. Representative timestamp for the survey (typically the midpoint of the survey period). Nodal factors vary over months, so a single midpoint evaluation is accurate to well within the constituent uncertainty budget. |
Value
Named list with elements f and u, each a named numeric vector
indexed by constituent name (M2, S2, N2, K1, O1).
Compute numerical Hessian by finite differences (central differences)
Description
Compute numerical Hessian by finite differences (central differences)
Usage
.numerical_hessian(f, theta, h = 1e-04)
Predict tidal heights using embedded harmonic constituents (with nodal corrections)
Description
Predict tidal heights using embedded harmonic constituents (with nodal corrections)
Usage
.predict_harmonic_heights(t_posix, port)
Arguments
t_posix |
POSIXct vector of survey timestamps (UTC). |
port |
Named list from .harmonic_ports. |
Value
Numeric vector of predicted heights above Chart Datum (metres).
Random-Walk Metropolis-Hastings MCMC sampler
Description
Random-Walk Metropolis-Hastings MCMC sampler
Usage
.rwmh(
log_post,
theta_init,
n_iter,
n_warmup,
proposal_sd,
lower,
upper,
verbose = TRUE
)
Score a categorical variable
Description
Score a categorical variable
Usage
.score_categorical(x, params)
Score a single numeric variable against an optimal-range definition
Description
Score a single numeric variable against an optimal-range definition
Usage
.score_numeric(x, params)
Arguments
x |
Numeric value to score. |
params |
Named list of scoring parameters. |
Value
Numeric in [0, 1]. Returns NA if x is NA.
Score all weighted factors for a single row (season-aware)
Description
Computes a weighted mean suitability score. Variables of type "seasonal"
have their parameters swapped from tolerances$seasonal_overrides when a
season value is present in the row. Missing columns are skipped and their
weight is excluded from the denominator.
Usage
.score_row(row, tolerances)
Arguments
row |
Named list or single-row dataframe slice. |
tolerances |
Species tolerance list from |
Value
List: score, variable_scores, variable_weights, limiting_factors.
Species Tolerance Data for oystermapR
Description
Internal lookup tables defining exclusion thresholds, weighted scoring parameters, and seasonal scoring overrides for each supported oyster species.
Structure per species entry:
-
exclusions– hard-stop limits; location is excluded if breached -
scored– weighted variables contributing to the 0-1 score -
seasonal_overrides– season-specific parameter swaps for scored variables
Adding a new species: append a named list entry following the same
structure. Use "seasonal_overrides" = list() if no seasonal data exists.
Usage
.species_tolerances
Format
A named list, one entry per species keyed by lowercase Latin name
with underscores (e.g. "ostrea_edulis").
Standardise coordinate column names to lat/lon
Description
Standardise coordinate column names to lat/lon
Usage
.standardise_coords(df)
Find top N spatially distinct introduction sites with patch radius
Description
Identifies the highest-scoring, spatially spread-out locations for oyster
introduction. Uses a greedy selection that prevents clustering – once a site
is chosen, no other site within min_spacing_deg is considered, so the
result always represents distinct areas of the survey.
Patch radius is the equivalent circular radius of all suitable cells
(suitability >= min_suitability) within the search neighbourhood. This
gives a practical sense of how large the suitable area is around each site.
Usage
.top_introduction_sites(
df,
n = 5L,
min_suitability = 0.4,
min_spacing_deg = 0.002,
patch_search_deg = 0.005,
spatial_res = 4L
)
Arguments
df |
Scored dataframe from predict_oyster(). |
n |
Integer. Number of sites to return (default 5). |
min_suitability |
Numeric. Minimum score to consider suitable [0,1]. |
min_spacing_deg |
Numeric. Minimum separation between selected sites in decimal degrees (default 0.002 approx. 220 m). Prevents adjacent cells all being reported as separate "sites". |
patch_search_deg |
Numeric. Radius in degrees to count nearby suitable cells when computing patch size (default 0.005 approx. 550 m). |
spatial_res |
Integer. Survey grid resolution in decimal places (default 4). |
Value
Dataframe of top sites, or NULL if none qualify.
Add an intertidal zone flag to a survey dataframe
Description
Classifies each survey location as intertidal, subtidal, or supratidal based on its depth below Chart Datum (CD) and the tidal range at the nearest harmonic reference port. Chart Datum is approximately Lowest Astronomical Tide (LAT), so:
-
Subtidal:
depth > 0(always submerged) -
Intertidal:
-MHWS_above_CD <= depth <= 0(exposed at some states of the tide; depth is at or above CD) -
Supratidal:
depth < -MHWS_above_CD(above the highest astronomical tide; excluded from scoring)
MHWS above CD is approximated from the harmonic constituents of the
nearest port as: Z0 + 1.1 * (M2_H + S2_H) – the 10% factor accounts
for minor constituents not included in the 5-constituent model.
This flag is used internally by score_locations() to give Magallana
gigas (Pacific oyster) full depth scores for intertidal cells, reflecting
its strong intertidal ecology in NW European waters.
Usage
add_intertidal_flag(
df,
max_port_dist_km = 75,
depth_col = "depth",
verbose = TRUE
)
Arguments
df |
Dataframe with |
max_port_dist_km |
Numeric. Maximum distance to nearest harmonic port in km (default 75). If the survey centroid is further than this from all ports, the function falls back to a conservative intertidal window of 6 m above CD and issues a warning. |
depth_col |
Character. Name of the chart-datum depth column
(default |
verbose |
Logical. Print the nearest port name and derived MHWS
(default |
Value
The input dataframe with an added intertidal_zone column:
-
"subtidal": depth > 0 m CD -
"intertidal": 0 m CD >= depth >= -MHWS_above_CD -
"supratidal": depth < -MHWS_above_CD
Examples
survey <- auto_tidal_correct(survey, datetime_col = "date")
survey <- add_intertidal_flag(survey)
# Check intertidal coverage
table(survey$intertidal_zone)
Detect season for each row in a dataframe
Description
Vectorised wrapper around detect_season() for use on dataframes.
Usage
add_season_column(df, date_col = NULL, lat_col = "lat")
Arguments
df |
A dataframe containing at minimum a date column and a latitude column. |
date_col |
Name of the date column (default |
lat_col |
Name of the latitude column (default |
Value
The input dataframe with an additional column season.
Add shellfish water quality classification to scored result
Description
Attaches a regulatory shellfish harvesting classification (A/B/C/Prohibited) to each row of a scored result dataframe. This classification affects the economic viability score for aquaculture applications: Prohibited sites receive viability = 0; Class C sites receive a 0.60 multiplier; Class B a 0.80 multiplier; Class A is unpenalised.
Classification can be supplied three ways (in priority order):
-
class_col– name of an existing column inresultalready containing classification values ("A", "B", "C", "Prohibited", or NA). -
classified_areas– a dataframe withlat,lon, andshellfish_classcolumns, spatially matched to result rows withinmatch_radius_degdegrees. -
fetch_live = TRUE– queries the FSA England/Wales open data API (requires internet). Requireshttrpackage. Works only for sites within the FSA/DAERA coverage area (England, Wales, Northern Ireland).
Unclassified or unmatched sites receive shellfish_class = "Unclassified"
and a penalty of 0.70 (precautionary principle).
Usage
add_shellfish_classification(
result,
class_col = NULL,
classified_areas = NULL,
fetch_live = FALSE,
match_radius_deg = 0.05,
verbose = TRUE
)
Arguments
result |
Dataframe from |
class_col |
Character. Name of an existing column in |
classified_areas |
Dataframe with |
fetch_live |
Logical. Query live FSA/DAERA API (default FALSE). Requires
internet access and the |
match_radius_deg |
Numeric. Spatial matching tolerance in decimal
degrees when using |
verbose |
Logical. Print matching summary (default TRUE). |
Value
result with two additional columns:
-
shellfish_class: "A", "B", "C", "Prohibited", or "Unclassified" -
shellfish_class_penalty: multiplier applied to economic viability (1.0 for A, 0.80 for B, 0.60 for C, 0.0 for Prohibited, 0.70 for Unclassified)
Examples
# Option 1: manual column already in data
result <- add_shellfish_classification(result, class_col = "water_class")
# Option 2: separate classified areas dataframe
areas <- data.frame(lat = c(51.5), lon = c(-4.2), shellfish_class = c("B"))
result <- add_shellfish_classification(result, classified_areas = areas)
# Option 3: live fetch (internet required)
result <- add_shellfish_classification(result, fetch_live = TRUE)
Add bootstrap confidence intervals to suitability scores
Description
Re-scores each row of a survey dataframe n_boot times, each time adding
Gaussian noise to every numeric variable proportional to its measurement
uncertainty, and returns the 5th and 95th percentile suitability scores as
suit_ci_lower and suit_ci_upper columns (90% interval).
Measurement uncertainty is specified via the uncertainty argument: a named
list where each name matches a variable name in tolerances$scored_factors
and the value is the 1-sigma absolute error (same units as the variable).
Any variable not listed defaults to 2% of its observed value.
Usage
add_suitability_ci(
result,
tolerances,
n_boot = 200L,
uncertainty = NULL,
seed = 42L,
verbose = FALSE
)
Arguments
result |
Dataframe returned by |
tolerances |
Species tolerance list from |
n_boot |
Integer. Number of bootstrap resamples (default 200). |
uncertainty |
Named numeric vector. 1-sigma measurement errors by
variable name (e.g. |
seed |
Integer or NULL. Random seed for reproducibility (default 42). |
verbose |
Logical. Print progress (default FALSE). |
Value
The input dataframe with additional columns:
suit_ci_lower, suit_ci_upper, suit_ci_width.
Examples
result <- predict_oyster(survey, "ostrea_edulis")
tol <- get_species_tolerances("ostrea_edulis")
result <- add_suitability_ci(result, tol,
uncertainty = c(temperature = 0.3,
salinity = 0.5))
# Columns suit_ci_lower, suit_ci_upper, suit_ci_width now present
generate_report(result, "report.html") # map rings scale with CI width
Analyse spatial connectivity of suitable habitat cells
Description
Builds a spatial graph of cells above a suitability threshold and identifies connected components – contiguous patches of suitable habitat. Isolated cells or small patches are flagged because they are at higher risk of:
-
Recruitment failure – larvae from isolated populations may be flushed away before settling back in the patch.
-
Genetic bottlenecks – small isolated populations have lower allelic diversity and reduced adaptive capacity.
-
Local extinction without recolonisation – if a patch is lost, there are no connected source populations to recolonise.
Two cells are considered connected if they are within gap_m metres of
each other (default 500 m – approximately the scale of larval dispersal
during a tidal cycle at moderate current speeds). Connectivity is computed
using a fast union-find (disjoint set) algorithm – no external graph
packages required.
Usage
analyse_connectivity(
result,
min_suitability = 0.5,
gap_m = 500,
min_patch_cells = 3L,
verbose = TRUE
)
Arguments
result |
Dataframe from |
min_suitability |
Numeric. Minimum suitability score for a cell to be included in the connectivity analysis (default 0.5, i.e. Moderate+). |
gap_m |
Numeric. Maximum distance in metres between two suitable cells for them to be considered connected (default 500 m). |
min_patch_cells |
Integer. Patches smaller than this are flagged as isolated (default 3 cells). |
verbose |
Logical. Print connectivity summary (default TRUE). |
Value
Input dataframe with additional columns:
patch_id (integer – unique patch identifier; NA for non-suitable cells),
patch_size (number of cells in the patch),
patch_area_km2 (approximate area based on cell density),
connectivity_class ("isolated", "small", "moderate", "large"),
is_hub (logical – TRUE for the highest-scoring cell in each patch,
useful as candidate introduction points).
Examples
result <- predict_oyster(survey, "ostrea_edulis")
result <- analyse_connectivity(result, gap_m = 500)
# Large well-connected patches are the best restoration targets
good_patches <- subset(result,
connectivity_class == "large" & suitability_class == "High")
# Count patches
table(result$connectivity_class)
# Visualise -- patch_id maps to distinct colours in QGIS
export_geotiff(result, "connectivity.tif")
Assess gear deployment feasibility at survey locations
Description
Evaluates which aquaculture and restoration gear types are physically viable at each survey location based on depth, current velocity, substrate, and (optionally) wave height. Returns a feasibility score and binary flag for each gear type.
This function is part of the planning module aimed at commercial
operators. For pure restoration work, the restoration_reef gear type is
always included and is the most relevant output.
Usage
assess_gear_feasibility(
result,
gear_types = names(.gear_specs),
depth_col = "depth",
current_col = "current_velocity",
substrate_col = "substrate_hardness_class",
wave_col = NULL,
verbose = TRUE
)
Arguments
result |
Dataframe from |
gear_types |
Character vector of gear types to assess. Options:
|
depth_col |
Character. Column name for depth below CD in metres
(default |
current_col |
Character. Column name for current velocity in m/s
(default |
substrate_col |
Character. Column for substrate class
(default |
wave_col |
Character or NULL. Column for significant wave height Hs in metres (default NULL – wave check skipped if absent). |
verbose |
Logical. Print feasibility summary (default TRUE). |
Value
Input dataframe with added columns per gear type:
gear_<type>_feasible (logical),
gear_<type>_score [0-1],
plus best_gear (most feasible option) and n_gear_options (count).
Examples
result <- predict_oyster(survey, "ostrea_edulis")
result <- assess_gear_feasibility(result)
# Sites where longline is feasible AND suitability is High
longline_sites <- subset(result,
gear_longline_feasible & suitability_class == "High")
# Restoration reef sites (no farming licence needed)
reef_sites <- subset(result, gear_restoration_reef_feasible)
Automatically predict and apply tidal correction from harmonic constituents
Description
Finds the nearest standard port to the survey area, checks whether it is
within a configurable distance threshold, predicts tidal heights for every
survey timestamp using embedded 5-constituent harmonic models
(M2, S2, N2, K1, O1), and applies correct_to_chart_datum() automatically.
No external data or internet connection is required. Harmonic constituent data (amplitude and Greenwich phase lag) for 31 UK and European ports are embedded in the package.
Accuracy: 5-constituent predictions are typically accurate to +/-0.15–0.35 m
at the standard port. Accuracy degrades with distance from the port and in
areas with complex tidal dynamics (e.g. the Bristol Channel, inner fjords).
For safety-critical work, use official UKHO EasyTide predictions and supply
them manually via correct_to_chart_datum().
Requirements: The dataframe must contain a datetime column. Values must
be parseable by as.POSIXct() (e.g. "2024-06-15 10:30:00" or
"2024-06-15"). Date-only strings default to noon UTC.
Usage
auto_tidal_correct(
df,
datetime_col = "date",
depth_col = "depth",
max_port_dist_km = 75,
keep_raw = TRUE,
verbose = TRUE
)
Arguments
df |
A dataframe with at minimum a depth and a datetime column.
Typically the output of |
datetime_col |
Character. Name of the datetime column (default
|
depth_col |
Character. Name of the depth column to correct
(default |
max_port_dist_km |
Numeric. Maximum distance in km from the survey
centroid to the nearest standard port. If the nearest port is further
than this, no correction is applied and a warning is issued
(default |
keep_raw |
Logical. Retain original depths in a |
verbose |
Logical. Print port selection, distance, and prediction
summary (default |
Value
The input dataframe with depths corrected to Chart Datum, plus
a tidal_height_pred_m column containing the predicted tidal height
used for each row.
See Also
correct_to_chart_datum() for manual correction,
tidal_port_info() for port datum reference data.
Examples
# Automatic correction -- finds nearest port, predicts heights, corrects depths
survey_corrected <- auto_tidal_correct(survey, datetime_col = "date")
# Tighten the distance threshold (only trust ports within 40 km)
survey_corrected <- auto_tidal_correct(survey, max_port_dist_km = 40)
# See which port was selected and inspect predicted heights
survey_corrected$tidal_height_pred_m
Check exclusion criteria for all rows in a dataset
Description
Applies hard-stop exclusion criteria for a given species tolerance profile. Any location that violates one or more exclusion rules receives a flag and is removed from the suitability scoring pipeline.
Exclusion factors checked (where data columns are present):
General temperature range
Season-specific temperature floors/ceilings (requires
seasoncolumn)Salinity (with temperature-dependent minimum)
Dissolved oxygen (hypoxia threshold)
Usage
check_exclusions(df, tolerances)
Arguments
df |
A dataframe. Must contain at minimum a subset of the standard column names (see Column Naming below). Any exclusion factor for which no column is present is silently skipped with a warning. |
tolerances |
A species tolerance list as returned by
|
Value
The input dataframe with two additional columns:
-
excluded: logical –TRUEif the location fails any exclusion criterion. -
exclusion_reason: character – semicolon-separated list of failed criteria, orNAif not excluded.
Column Naming
The function recognises these column names (case-insensitive):
| Variable | Expected column name(s) |
| Temperature | temperature, temp |
| Salinity | salinity, sal |
| Dissolved oxygen | dissolved_oxygen, do, do_mgl |
| Season | season (auto-added by add_season_column())
|
Examples
tol <- get_species_tolerances("ostrea_edulis")
df <- data.frame(
lat = 51.5, lon = -2.5,
temperature = 8, salinity = 28, dissolved_oxygen = 7
)
check_exclusions(df, tol)
Classify seabed substrate hardness from near-seabed ADCP backscatter
Description
Uses the intensity of acoustic backscatter from the near-seabed bin(s) to classify substrate type. Hard substrates (gravel, rock, shell hash) scatter more sound energy than soft substrates (mud, fine sand). The function converts raw backscatter (dB re 1 m^-1 or instrument counts) to a categorical substrate class and a continuous hardness index [0, 1].
Classification thresholds (default, adjustable):
| Class | Hardness index | Typical substrate |
| Hard | 0.8 -- 1.0 | Rock, cobble, shell hash |
| Mixed | 0.5 -- 0.8 | Gravel/sand mix, maerl beds |
| Soft | 0.2 -- 0.5 | Sand, sandy mud |
| Very Soft | 0.0 -- 0.2 | Mud, fine silt |
The hardness index is a min-max normalisation of the mean near-seabed
backscatter across the bottom n_bottom_bins cells. If the column
contains calibrated volume backscatter (Sv in dB), set is_sv = TRUE
and a sign-flip is applied (less negative Sv = stronger return = harder).
Usage
classify_substrate_from_backscatter(
df,
backscatter_col,
is_sv = FALSE,
bs_min = NULL,
bs_max = NULL,
hard_thresh = 0.8,
mixed_thresh = 0.5,
soft_thresh = 0.2,
output_col = "substrate_hardness",
verbose = TRUE
)
Arguments
df |
Dataframe with at least one column of seabed backscatter values. |
backscatter_col |
Character. Name of the backscatter column. |
is_sv |
Logical. If TRUE, treats values as Sv (dB, typically negative); hardness is derived from abs(Sv) after sign convention correction. Default FALSE (raw instrument counts, higher = harder). |
bs_min |
Numeric. Backscatter value mapped to hardness = 0 (softest). Defaults to 5th percentile of observed data. |
bs_max |
Numeric. Backscatter value mapped to hardness = 1 (hardest). Defaults to 95th percentile of observed data. |
hard_thresh |
Numeric. Hardness index above which substrate is "Hard" (default 0.8). |
mixed_thresh |
Numeric. Hardness index above which substrate is "Mixed" (default 0.5). |
soft_thresh |
Numeric. Hardness index above which substrate is "Soft" (default 0.2); below is "Very Soft". |
output_col |
Character. Base name for output columns. Two columns are
added: |
verbose |
Logical. Print classification summary (default TRUE). |
Value
Input dataframe with substrate_hardness_index and
substrate_hardness_class columns appended. These map directly to the
substrate_hardness scored variable in the tolerance spec.
Examples
# ADCP data with near-seabed backscatter column
survey <- classify_substrate_from_backscatter(survey,
backscatter_col = "bottom_backscatter_db",
is_sv = TRUE)
# The substrate_hardness_index column feeds directly into predict_oyster()
# as the 'substrate_hardness' variable
names(survey)[names(survey) == "substrate_hardness_index"] <- "substrate_hardness"
result <- predict_oyster(survey, "ostrea_edulis")
Compare suitability across multiple oyster species
Description
Runs predict_oyster() for every specified species against the same survey
dataset and returns a combined comparison table – one row per survey
location, one suitability column per species. An optional summary prints
which species is best-suited to each location.
This is the tool to use when you haven't yet decided which species to stock, or when you want to identify areas where multiple species overlap in suitability (robust site selection).
Usage
compare_species(
data,
species = NULL,
output_dir = NULL,
min_data_quality = "low",
verbose = TRUE
)
Arguments
data |
A dataframe or CSV path accepted by |
species |
Character vector of species keys to compare. Defaults to all
species currently in the oystermapR database (see |
output_dir |
Character or |
min_data_quality |
Character. Minimum data quality tier to include:
|
verbose |
Logical. Print a comparison summary per species (default |
Value
A dataframe with all original survey columns plus:
-
suit_<species_key>: suitability score [0, 1] per species -
class_<species_key>: suitability class per species -
best_species: key of the species with the highest suitability at each location (NA if all excluded) -
best_suitability: the highest suitability score across all species -
n_species_high: number of species scoring "High" at each location (useful for identifying robust multi-species sites)
Examples
# Compare all supported species
comparison <- compare_species(survey)
# Compare only well-characterised species
comparison <- compare_species(survey, min_data_quality = "high")
# Compare specific species and export per-species heatmaps
comparison <- compare_species(
survey,
species = c("ostrea_edulis", "magallana_gigas"),
output_dir = "comparison_maps/"
)
# Find sites suitable for both O. edulis AND M. gigas
both_high <- subset(comparison, n_species_high >= 2)
Compare suitability scores across multiple surveys or monitoring years
Description
Takes a named list of predict_oyster() result dataframes and produces a
single comparative summary suitable for trend monitoring, multi-site
selection, or regulatory reporting. Common spatial cells are matched across
surveys and a change analysis is run between consecutive surveys if ordered
chronologically.
The function returns a list with three elements:
-
summary– one row per survey with mean suitability, class breakdown, and data completeness stats. -
spatial– all surveys joined on shared lat/lon cells, with suitability columns per survey andtrend_slope(linear regression of suitability over survey index for each cell). -
change– pairwise change table (only if surveys are in order; each consecutive pair shows mean score difference and fraction of cells that improved, degraded, or were stable).
Usage
compare_surveys(
surveys,
cell_deg = 0.001,
stable_threshold = 0.05,
verbose = TRUE
)
Arguments
surveys |
Named list of dataframes from |
cell_deg |
Numeric. Spatial rounding tolerance in decimal degrees for matching cells across surveys (default 0.001 approx. 111 m). |
stable_threshold |
Numeric. Absolute suitability change below which a cell is considered "stable" rather than improved/degraded (default 0.05). |
verbose |
Logical. Print comparison summary (default TRUE). |
Value
Named list: summary (dataframe), spatial (dataframe),
change (dataframe or NULL if only one survey).
Examples
r2022 <- predict_oyster(survey_2022, "ostrea_edulis")
r2023 <- predict_oyster(survey_2023, "ostrea_edulis")
r2024 <- predict_oyster(survey_2024, "ostrea_edulis")
comp <- compare_surveys(
surveys = list("2022" = r2022, "2023" = r2023, "2024" = r2024)
)
comp$summary # mean scores by year
comp$change # year-on-year change
# Pass to generate_report() for a full comparative HTML report
generate_report(r2024, "monitoring_report.html",
title = "Kames Bay 3-Year Monitoring")
Combine multi-season survey results into a composite suitability score
Description
Oysters are permanent residents; a location is only truly suitable if
conditions are adequate year-round. composite_seasonal() takes a named
list of predict_oyster() result dataframes (one per season or survey
visit) and produces a single composite dataframe with one row per spatial
cell, summarising suitability across all input surveys.
Composite methods:
-
"min"(default) – most conservative; a location must be suitable in every season to score well. Best for regulatory site selection. -
"mean"– average across seasons. Good for monitoring trend summaries. -
"prob"– fraction of seasons with suitability >=prob_threshold. Useful when surveys are irregularly spaced or some seasons are missing.
Spatial matching uses a grid cell tolerance of cell_deg decimal degrees
(default 0.001 approx. 111 m). Cells not present in all seasons are flagged in
n_seasons_present.
Usage
composite_seasonal(
surveys,
method = c("min", "mean", "prob"),
prob_threshold = 0.5,
cell_deg = 0.001,
verbose = TRUE
)
Arguments
surveys |
Named list of dataframes from |
method |
Character. One of |
prob_threshold |
Numeric [0,1]. Suitability threshold used when
|
cell_deg |
Numeric. Spatial tolerance in decimal degrees for matching cells across surveys (default 0.001). |
verbose |
Logical. Print summary (default TRUE). |
Value
A dataframe with columns: lat, lon, suitability_composite,
suitability_class, n_seasons_present, one suit_<season> column per
input survey, and suit_range (max - min across seasons).
Examples
spring <- predict_oyster(survey_spring, "ostrea_edulis")
summer <- predict_oyster(survey_summer, "ostrea_edulis")
autumn <- predict_oyster(survey_autumn, "ostrea_edulis")
composite <- composite_seasonal(
surveys = list(spring = spring, summer = summer, autumn = autumn),
method = "min"
)
generate_report(composite, "composite_report.html",
title = "Year-round Suitability Assessment")
Correct survey depths to Chart Datum (LAT)
Description
Converts water depths recorded during a survey to depths below Chart Datum (approximately Lowest Astronomical Tide, LAT) by adding the tidal height above Chart Datum at the time of survey. This removes bias introduced by surveying at different states of the tide.
Formula: depth_CD = depth_surveyed + tidal_height_m
Finding your tidal height:
UK: UKHO EasyTide (easytide.ukho.gov.uk) – free 7-day predictions for any standard port. Download the predicted heights for your survey period.
Ireland: Marine Institute tide gauges (data.marine.ie)
France/EU: SHOM (shom.fr) or Copernicus Marine (marine.copernicus.eu)
Any port: apps such as PocketTides, Tides Near Me, or TidePod give height-above-CD predictions for named ports.
For a typical loch or sheltered bay survey, reading the tidal height from a tide table for the nearest standard port at the mid-point of the survey is sufficient. For multi-day surveys or those spanning a full tidal cycle, supply per-row heights matched to the survey timestamp.
Usage
correct_to_chart_datum(
df,
tidal_height_m = NULL,
depth_col = "depth",
datetime_col = NULL,
keep_raw = TRUE,
verbose = TRUE
)
Arguments
df |
A dataframe containing at minimum a depth column. Typically the
output of |
tidal_height_m |
Numeric scalar or vector. Tidal height above Chart
Datum at the time of survey, in metres. If a scalar, the same offset
is applied to all rows. If a vector, must be the same length as |
depth_col |
Character. Name of the depth column to correct
(default |
datetime_col |
Character or |
keep_raw |
Logical. If |
verbose |
Logical. Print correction summary (default |
Value
The input dataframe with the depth column corrected to Chart Datum.
If keep_raw = TRUE, the original values are preserved as depth_raw_m.
Examples
# Survey conducted around high water at Oban -- tidal height ~3.1 m above CD
survey_corrected <- correct_to_chart_datum(survey, tidal_height_m = 3.1)
# Per-row heights from a tide gauge CSV matched to survey timestamps
tide_series <- read.csv("oban_tide_gauge.csv")
# (match to survey rows by timestamp -- user responsibility)
survey_corrected <- correct_to_chart_datum(
survey,
tidal_height_m = tide_series$height_m
)
Detect meteorological season from date and latitude
Description
Returns the meteorological season ("winter", "spring", "summer", "autumn") for a given date and latitude. Hemisphere is inferred from the sign of latitude (positive = Northern Hemisphere, negative = Southern Hemisphere).
Usage
detect_season(date, lat)
Arguments
date |
A |
lat |
Numeric. Latitude in decimal degrees. Positive = N hemisphere. |
Value
A character string: one of "winter", "spring", "summer",
"autumn".
Examples
detect_season("2024-01-15", lat = 51.5) # "winter" (UK)
detect_season("2024-07-15", lat = 51.5) # "summer"
detect_season("2024-01-15", lat = -33.9) # "summer" (Sydney)
Estimate chlorophyll-a concentration from ADCP acoustic backscatter
Description
Uses the empirical log-linear relationship between volume backscatter
strength (Sv) and chlorophyll-a concentration to add an estimated
chlorophyll_a column to a survey dataframe. Intended for surveys where
a fluorometer was not deployed and the ADCP backscatter intensity is
available.
The conversion is: Chl_a (ugg/L) = 10 ^ (a * Sv + b)
Default a and b constants are derived from Deines (1999) for three
common ADCP frequencies (300, 600, 1200 kHz). For best results, calibrate
against concurrent water samples from your survey using the calibration
argument.
Usage
estimate_chlorophyll_from_backscatter(
df,
backscatter_col,
frequency_khz = 600,
calibration = NULL,
min_chl = 0.05,
max_chl = 50,
keep_raw = TRUE,
verbose = TRUE
)
Arguments
df |
Dataframe containing a backscatter column and optional temperature and depth columns. |
backscatter_col |
Character. Name of the column containing volume
backscatter strength values (Sv in dB re 1 |
frequency_khz |
Numeric. ADCP operating frequency in kHz. Used to
select default calibration constants if |
calibration |
Numeric vector of length 2: |
min_chl |
Numeric. Floor for estimated chlorophyll-a in ugg/L
(default |
max_chl |
Numeric. Ceiling for estimated chlorophyll-a in ugg/L
(default |
keep_raw |
Logical. If |
verbose |
Logical. Print conversion summary and warnings (default |
Value
The input dataframe with an additional chlorophyll_a column
(ugg/L). If a chlorophyll_a column already exists, a warning is issued
and the existing column is overwritten.
Improving accuracy with water samples
Collect 5–15 grab/Niskin water samples distributed across your survey area and depth range, then:
# Fit calibration from samples
fit <- lm(log10(sample_chl) ~ backscatter, data = samples_df)
a <- coef(fit)[["backscatter"]]
b <- coef(fit)[["(Intercept)"]]
df <- estimate_chlorophyll_from_backscatter(
df, "amp_mean_dB", frequency_khz = 600,
calibration = c(a, b))
Examples
# Default calibration (300 kHz Nortek Signature)
adcp <- read_nortek_adcp("survey.csv")
adcp <- estimate_chlorophyll_from_backscatter(
adcp, backscatter_col = "amp_mean_dB",
frequency_khz = 300)
# Custom calibration from water samples
adcp <- estimate_chlorophyll_from_backscatter(
adcp, "amp_mean_dB", frequency_khz = 300,
calibration = c(a = 0.041, b = -0.52))
Estimate effective fetch from a set of lat/lon points and a bearing
Description
Simple great-circle ray-tracing to estimate how far open water extends
from a point in a given direction, up to max_fetch_km. This is a
lightweight GIS-free approximation – for precise fetch, use QGIS or
the fetchR R package.
Usage
estimate_fetch(
lat,
lon,
bearing_deg,
land_polygons = NULL,
max_fetch_km = 200,
step_km = 1
)
Arguments
lat |
Numeric. Site latitude (decimal degrees). |
lon |
Numeric. Site longitude (decimal degrees). |
bearing_deg |
Numeric. Prevailing wind direction to check (0 = North). |
land_polygons |
SpatialPolygons or sf POLYGON object representing land. If NULL, returns max_fetch_km (fully open ocean assumed). |
max_fetch_km |
Numeric. Maximum fetch to search (default 200 km). |
step_km |
Numeric. Ray step size (default 1 km). |
Value
Numeric. Estimated fetch in km.
Export optional contour lines as a GeoPackage for QGIS
Description
Derives contour lines from the IDW suitability raster and saves them as a
vector GeoPackage (.gpkg). Load alongside the heatmap in QGIS for the
depth-contour look shown in BioBase outputs.
Usage
export_contours(tif_path, interval = 0.1, gpkg_path = NULL)
Arguments
tif_path |
Character. Path to the GeoTIFF produced by |
interval |
Numeric. Contour interval in suitability units (default |
gpkg_path |
Character. Output |
Value
The .gpkg file path (invisibly).
Examples
export_geotiff(result, "oyster_heatmap.tif")
export_contours("oyster_heatmap.tif")
Export suitability scores as a smooth interpolated GeoTIFF heatmap for QGIS
Description
Takes the scored point dataset from predict_oyster() and produces a
smooth, continuous raster heatmap using Inverse Distance Weighting (IDW)
interpolation – the same technique used by systems like Lowrance BioBase.
The result looks like a fluid heat surface over your survey area rather than
isolated point squares.
The output GeoTIFF contains three bands:
-
suitability – IDW-interpolated score [0, 1]. Use this for the heatmap.
-
excluded_mask – 1 where a survey point was hard-excluded, 0 otherwise.
-
n_observations – how many survey points fell within each raster cell (data density diagnostic).
Usage
export_geotiff(
df,
path,
resolution = 2e-04,
idw_power = 2,
idw_max_dist = NULL,
buffer = 0.001,
contours = TRUE,
crs = "EPSG:4326",
overwrite = TRUE
)
Arguments
df |
A dataframe processed by |
path |
Character. Output |
resolution |
Numeric. Raster cell size in decimal degrees. Default
|
idw_power |
Numeric. IDW distance decay exponent (default |
idw_max_dist |
Numeric or |
buffer |
Numeric. Padding in degrees added around the survey extent
before building the raster grid (default |
contours |
Logical. If |
crs |
Character. CRS string (default |
overwrite |
Logical. Overwrite existing file (default |
Value
The output file path (invisibly). Also writes a .qml QGIS style
file alongside the .tif automatically.
Examples
result <- predict_oyster("survey.csv", "ostrea_edulis")
# Standard export -- smooth heatmap with auto radius and contours
export_geotiff(result, "oyster_heatmap.tif")
# Finer resolution for a small harbour survey
export_geotiff(result, "oyster_heatmap.tif", resolution = 0.0001)
# Override IDW radius explicitly (e.g. sparse transect data)
export_geotiff(result, "oyster_heatmap.tif", idw_max_dist = 0.005)
Export a QGIS colour style (.qml) matching the BioBase orange/red heatmap
Description
Writes a QGIS Layer Style file (.qml) using a yellow -> orange -> red ->
dark red colour ramp, closely matching the BioBase/Lowrance heatmap style.
The file is automatically applied when its name matches the .tif – no
manual styling needed in QGIS.
Usage
export_qml_style(tif_path)
Arguments
tif_path |
Character. Path to the |
Value
The .qml file path (invisibly).
Fetch live environmental data for a survey extent
Description
Convenience wrapper that fetches data from one or more live sources and returns a named list of dataframes that can be passed to the relevant scoring functions. Each source can be fetched independently or all at once.
This function requires internet access and registered API credentials where
applicable (see oystermapR_live_config()). If a source fails, it is
silently skipped and not included in the output - the function never aborts
due to a single failed source.
Usage
fetch_live_environmental_data(
survey,
sources = "all",
date_range = NULL,
verbose = TRUE
)
Arguments
survey |
Dataframe with |
sources |
Character vector. Which data sources to fetch. Options:
|
date_range |
Character vector of length 2: |
verbose |
Logical. Print progress (default TRUE). |
Value
Named list. Each successfully fetched source produces one element:
-
$cmems: SST, salinity, chlorophyll grid -
$ices_hab: HAB event records within extent -
$ices_vms: Swept-area ratio per ICES c-square -
$emodnet_predators: Predator occurrence records -
$fsa_shellfish: Shellfish classification polygons / area descriptions
Examples
## Not run:
oystermapR_live_config(cmems_user = "u", cmems_password = "p")
live <- fetch_live_environmental_data(survey, sources = c("ices_hab", "ices_vms"))
result <- score_hab_risk(result, hab_data = live$ices_hab, species = "ostrea_edulis")
result <- score_anthropogenic_disturbance(result, trawling_data = live$ices_vms)
## End(Not run)
Generate a formatted survey report from predict_oyster() results
Description
Renders a structured assessment report from the output of predict_oyster().
The report includes an executive summary, suitability class breakdown,
per-variable scoring table, most common limiting factors, top introduction
sites, and a methodology section.
Output format is PDF by default (requires a LaTeX installation – see Note below). HTML is also supported and requires no additional system software.
Usage
generate_report(
result,
output = "oyster_report.html",
title = "Oyster Suitability Assessment",
author = "",
species = "ostrea_edulis",
top_n = 5L,
validation = NULL,
comparison = NULL,
open = TRUE
)
Arguments
result |
A dataframe returned by |
output |
Character. Output file path. Extension determines format:
|
title |
Character. Report title. Default |
author |
Character. Author name(s) to appear on the title page.
Default |
species |
Character. Species key used in |
top_n |
Integer. Number of top introduction sites to include in the
report (default |
validation |
Named list or |
comparison |
Dataframe or |
open |
Logical. Open the report in your default viewer after rendering
(default |
Value
The output file path (invisibly).
Note
HTML reports (recommended) require plotly and scales:
install.packages(c("plotly","scales")).
PDF reports additionally require a LaTeX installation; see the
Note section below.
PDF output requires LaTeX. If you don't have LaTeX installed,
either use output = "report.html" (no extra software needed), or install
a minimal LaTeX distribution with:
install.packages("tinytex")
tinytex::install_tinytex()
Examples
result <- predict_oyster(survey, "ostrea_edulis",
output_geotiff = "kames_bay.tif")
# Standard HTML report (recommended -- Plotly charts, no LaTeX needed)
generate_report(result, "kames_bay_report.html",
title = "Kames Bay Oyster Habitat Assessment",
author = "T. Tucker")
# With validation results embedded
val <- validate_against_records(result, nbn_records)
generate_report(result, "kames_bay_validated.html",
validation = val)
# With competition adjustment (when comparing species)
comp <- compare_species(survey,
species = c("ostrea_edulis", "magallana_gigas"))
generate_report(result, "kames_bay_competition.html",
comparison = comp)
# PDF report (requires LaTeX / tinytex)
generate_report(result, "kames_bay_report.pdf",
title = "Kames Bay Oyster Habitat Assessment")
Generate a compact printable PDF summary of oystermapR results
Description
Produces a self-contained multi-page A4 PDF directly from the output of
predict_oyster() without requiring LaTeX, pandoc, or Rmd rendering.
The PDF is designed for printing and contains four pages:
-
Page 1 – Title header, executive summary table, class breakdown bar
-
Page 2 – Suitability heatmap (survey points, continuous colour ramp)
-
Page 3 – Per-variable mean score chart (horizontal bar)
-
Page 4 – Score distribution histogram + class count bar chart
Usage
generate_summary_pdf(
result,
output = "oyster_summary.pdf",
species = "ostrea_edulis",
title = "Oyster Suitability Summary",
author = "",
open = TRUE,
verbose = TRUE
)
Arguments
result |
A dataframe returned by |
output |
Character. Output |
species |
Character. Species key used in |
title |
Character. Report title shown in the header.
Default |
author |
Character. Author name (optional, shown in header). |
open |
Logical. Open the PDF after writing (default |
verbose |
Logical. Print progress messages (default |
Value
The output file path (invisibly).
Examples
result <- predict_oyster(survey_df, "ostrea_edulis")
generate_summary_pdf(result, "kames_bay_summary.pdf",
title = "Kames Bay -- Spring Survey",
author = "T. Tucker")
Get species tolerance parameters
Description
Get species tolerance parameters
Usage
get_species_tolerances(species)
Arguments
species |
Character. Species key (e.g. |
Value
Named list of tolerance parameters for the species.
Examples
tol <- get_species_tolerances("ostrea_edulis")
tol$exclusions$temperature
Retrieve current posterior tolerance parameters for a species
Description
Returns the tolerance parameter list currently held in the session cache,
after any update_species_tolerances() calls. If no update has been run,
returns the built-in prior parameters unchanged.
Usage
get_tolerance_posteriors(species, var = NULL)
Arguments
species |
Character. Species key (e.g. |
var |
Character or NULL. If specified, returns parameters for that
variable only; otherwise returns the full |
Value
Named list of tolerance parameters (same structure as
get_species_tolerances()$scored).
Examples
# After calling update_species_tolerances()
post <- get_tolerance_posteriors("ostrea_edulis")
post$temperature # updated optimal_min, optimal_max with CIs
Identify climate-resilient locations
Description
From a project_suitability() output, identifies locations that maintain
at least min_class suitability across all projected scenarios. These are
the most robust sites for long-term restoration or aquaculture investment.
Usage
identify_resilient_sites(projections, baseline, min_class = "Moderate")
Arguments
projections |
Named list returned by |
baseline |
Dataframe from |
min_class |
Character. Minimum suitability class required in all
scenarios. One of |
Value
Subset of baseline for locations that meet min_class in every
scenario, with an added n_scenarios_qualifying column.
Examples
proj <- project_suitability(result, tol)
resilient <- identify_resilient_sites(proj, result, min_class = "Moderate")
nrow(resilient) # sites that stay Moderate+ even under RCP8.5
Interpolate survey measurements to a regular grid before scoring
Description
Converts sparse point observations (CTD casts, ADCP profiles) to a regular
grid by spatial interpolation. The interpolated grid is returned in the same
dataframe format as the raw survey, and can be passed directly to
predict_oyster().
Method selection (applied per variable independently):
-
Ordinary kriging is used when >= 50 non-NA observations are available for a variable, the
gstatandsppackages are installed, and the variogram can be fitted without error. Kriging is the best linear unbiased estimator (BLUE) under the assumption of spatial stationarity – it minimises mean squared prediction error and returns a kriging variance for each cell. -
IDW (Inverse Distance Weighting, p = 2) is used as a fallback when kriging cannot be applied. It is fast, deterministic, and requires no model fitting, but produces no uncertainty estimate and can create bull's-eye artefacts around isolated observations.
Kriging variograms are fitted automatically using gstat::fit.variogram()
with a Matern model (default) – the most flexible and statistically
justified model for environmental data. If automatic fitting fails, a
spherical model is tried; if that also fails, IDW is used.
Usage
interpolate_survey(
survey,
vars = NULL,
resolution_m = 100,
method = c("auto", "kriging", "idw"),
kriging_min_n = 50L,
idw_power = 2,
idw_max_radius_m = NULL,
verbose = TRUE
)
Arguments
survey |
Dataframe with |
vars |
Character vector of column names to interpolate. Default: all
numeric columns except |
resolution_m |
Numeric. Grid cell size in metres (default 100 m). Coarsen for sparse surveys (> 500 m) or refine for dense ADCP grids (50 m). |
method |
Character. |
kriging_min_n |
Integer. Minimum observations required to attempt kriging (default 50). |
idw_power |
Numeric. IDW distance decay exponent (default 2). |
idw_max_radius_m |
Numeric. Maximum search radius for IDW neighbours in metres (default 5 * resolution_m). |
verbose |
Logical. Print per-variable method used and diagnostics (default TRUE). |
Value
A dataframe on a regular grid with interpolated values, a method
column per variable (interp_method_<var>), and kriging variance columns
(krige_var_<var>) where kriging was used.
Examples
# Auto method -- kriging for dense variables, IDW for sparse
grid <- interpolate_survey(survey, resolution_m = 100)
result <- predict_oyster(grid, "ostrea_edulis")
# Force IDW (fast, no gstat needed)
grid <- interpolate_survey(survey, resolution_m = 200, method = "idw")
# Inspect which method was used per variable
unique(grid$interp_method_temperature)
# Kriging variance -- high values = uncertain interpolation
hist(grid$krige_var_temperature)
List all supported species
Description
List all supported species
Usage
list_species()
Value
Named character vector: names = latin names, values = species keys.
Examples
list_species()
Load saved Bayesian tolerance updates into session cache
Description
Reads a previously saved .rds file (from save_tolerance_update()) and
restores it into the session cache. Subsequent calls to predict_oyster()
and score_locations() will automatically use the loaded parameters.
Usage
load_tolerance_update(
species = "all",
path = tools::R_user_dir("oystermapR", which = "data"),
verbose = TRUE
)
Arguments
species |
Character. Species key or |
path |
Character. Directory to load from (default: user-specific data
directory from |
verbose |
Logical. Default TRUE. |
Value
Invisibly: the loaded tolerance list.
Examples
load_tolerance_update("ostrea_edulis")
# Now predict_oyster() uses the updated tolerances automatically
result <- predict_oyster(survey, "ostrea_edulis")
Merge multiple sensor datasets into a single survey table
Description
Takes any number of sensor dataframes (from read_nortek_adcp(),
read_generic_csv(), or custom sources) and merges them spatially into a
single table ready for predict_oyster().
Each dataset is first rounded to a common spatial resolution grid. Datasets
are then joined using the grid cell key – cells that exist in multiple
datasets are merged; cells that exist in only one dataset carry NA for
variables from other sensors.
Usage
merge_sensor_data(..., spatial_res = 4L, date_source = NULL, verbose = TRUE)
Arguments
... |
Two or more dataframes to merge. Each must contain Multiple runs of the same sensor: If you have several surveys from the
same sensor type, pass them as a list and they will be automatically
stacked via merge_sensor_data( adcp = list(adcp_run1, adcp_run2, adcp_run3), bathy = bathy_df, biobase = biobase_df ) |
spatial_res |
Integer. Decimal places for the common spatial grid
(default |
date_source |
Character. Name of the dataset to take the |
verbose |
Logical. Print a merge summary (default |
Value
A single dataframe with all oystermapR-compatible columns populated
from their respective sensor sources. Columns that weren't available from
any sensor will be absent from the result – predict_oyster() handles
missing columns gracefully.
Examples
# Single run per sensor
adcp <- read_nortek_adcp("adcp.csv")
biobase <- read_generic_csv("biobase.csv")
survey <- merge_sensor_data(adcp = adcp, biobase = biobase)
# Multiple ADCP runs (different survey days) -- automatically stacked
adcp1 <- read_nortek_adcp("survey_jan.csv")
adcp2 <- read_nortek_adcp("survey_feb.csv")
survey <- merge_sensor_data(adcp = list(adcp1, adcp2), biobase = biobase)
result <- predict_oyster(survey, "ostrea_edulis", output_geotiff = "map.tif")
Print an interactive getting-started guide for oystermapR
Description
Prints a step-by-step workflow guide to the console, covering sensor data ingestion, spatial merging, suitability prediction, and QGIS export. Run this any time you need a reminder of the available functions and typical usage patterns.
Usage
oystermapR_help(topic = NULL)
Arguments
topic |
Character. Optionally focus on a specific topic:
|
Value
Called for its side-effect (console output). Returns invisible(NULL).
Examples
oystermapR_help() # full guide
oystermapR_help("sensors") # sensor ingestion only
oystermapR_help("qgis") # QGIS output only
Configure live data API credentials for oystermapR
Description
Stores API credentials in R session options so live data fetching can authenticate with external services. Credentials are not written to disk and are lost when the R session ends.
All parameters are optional - only set credentials for the services you intend to use. Functions that require a missing credential will warn and fall back to manual data rather than aborting.
Usage
oystermapR_live_config(
cmems_user = NULL,
cmems_password = NULL,
ices_key = NULL,
show_config = FALSE
)
Arguments
cmems_user |
Character. CMEMS username. |
cmems_password |
Character. CMEMS password. |
ices_key |
Character. ICES API key (currently optional; ICES APIs are largely open). |
show_config |
Logical. Print current configuration (credentials masked). Default FALSE. |
Value
Invisibly returns a named list of the options set.
Free registrations
-
CMEMS: Register at marine.copernicus.eu
-
ICES: API access is open (no key required for HAB/VMS endpoints)
-
EMODnet Biology: Open WFS, no key required
-
FSA/DAERA: Open REST API, no key required
Examples
## Not run:
oystermapR_live_config(
cmems_user = "myusername",
cmems_password = "mypassword"
)
# Then use fetch_live = TRUE in any supporting function
score_hab_risk(result, species = "ostrea_edulis", fetch_live = TRUE)
## End(Not run)
Parse an OpenDrift particle tracking output into a connectivity matrix
Description
Converts the output of an OpenDrift settlement simulation (a CSV file with
particle origin and final position) into the connectivity matrix format
expected by score_larval_connectivity(connectivity_matrix = ...).
OpenDrift setup:
Run OpenDrift (OceanDrift or OpenOil adapted for larvae) seeding particles
at each source reef location. Export a CSV with at minimum:
-
origin_lat,origin_lon– seed position -
final_lat,final_lon– settlement position (stranded or settled) -
status– filter to"stranded"or"active"as appropriate
The function bins particles into a grid and computes the fraction of particles from each source bin that arrive in each destination bin.
Usage
parse_opendrift_connectivity(
opendrift_csv,
bin_deg = 0.05,
status_col = NULL,
status_keep = c("stranded", "settled"),
min_weight = 0.005,
verbose = TRUE
)
Arguments
opendrift_csv |
Character. Path to OpenDrift settlement CSV file. Must
contain columns |
bin_deg |
Numeric. Grid cell size in decimal degrees for binning (default 0.05 approx. 5 km at mid-latitudes). |
status_col |
Character or NULL. Name of a status column to filter on. If NULL (default), all rows are used. |
status_keep |
Character. Value(s) of |
min_weight |
Numeric. Minimum connectivity weight to include in output (default 0.005 = 0.5 percent). Filters very weak connections. |
verbose |
Logical. Default TRUE. |
Value
A dataframe with columns source_lat, source_lon, dest_lat,
dest_lon, weight – ready for passing to
score_larval_connectivity().
Examples
## Not run:
# After running OpenDrift simulation and exporting CSV:
cm <- parse_opendrift_connectivity("opendrift_output.csv", bin_deg = 0.05)
result <- predict_oyster(survey, "ostrea_edulis")
result <- score_larval_connectivity(result,
species = "ostrea_edulis",
connectivity_matrix = cm)
## End(Not run)
Permutation variable importance for suitability model
Description
Estimates the importance of each scored environmental variable by randomly permuting its values and measuring the resulting drop in AUC against known presence/absence records. A large drop indicates the model strongly depends on that variable; a small drop indicates it could be removed with minimal impact.
This approach (Breiman 2001) is model-agnostic, requires no model refit, and is directly interpretable: "how much does AUC fall if I destroy the information in variable X?"
Usage
permutation_importance(
predicted,
records,
presence_col = "presence",
n_permutations = 50L,
match_radius_deg = 0.002,
seed = 42L,
verbose = TRUE
)
Arguments
predicted |
Dataframe from |
records |
Dataframe of presence/absence records (same format as
|
presence_col |
Character. Name of presence/absence column (default
|
n_permutations |
Integer. Number of permutation replicates per variable (default 50). More = more stable importance estimates; >= 100 recommended for publication. |
match_radius_deg |
Numeric. Matching radius (default 0.002degrees). |
seed |
Integer. Random seed (default 42). |
verbose |
Logical. Default TRUE. |
Value
A dataframe (sorted by importance, descending) with columns:
-
variable: scored variable name -
baseline_auc: AUC of unperturbed model -
mean_permuted_auc: mean AUC after permuting this variable -
importance: baseline_auc - mean_permuted_auc (drop in AUC) -
importance_sd: SD of AUC drop across permutation replicates -
importance_pct: importance as percentage of baseline AUC -
rank: rank by importance (1 = most important)
Interpretation
importance > 0.10 (> 10% AUC drop): high importance – variable is load- bearing for model discrimination.
importance 0.02–0.10: moderate importance.
importance < 0.02: low importance – consider whether this variable adds value relative to its data collection cost.
References
Breiman (2001) Machine Learning 45:5-32. Zurell et al. (2020) Ecography 43:1261-1277.
Examples
records <- read.csv("nbn_ostrea_edulis.csv")
result <- predict_oyster(survey, "ostrea_edulis")
imp <- permutation_importance(result, records, n_permutations = 100)
print(imp)
# Variable Importance Rank
# temperature 0.183 1
# salinity 0.091 2
# depth 0.045 3
# ...
Predict oyster growth suitability from environmental survey data
Description
The primary user-facing function of oystermapR. Accepts a tabular dataset
(CSV file path or dataframe) containing spatial coordinates and environmental
measurements, applies species-specific exclusion and weighted scoring rules,
and returns a scored dataframe alongside an optional GeoTIFF heatmap for
QGIS visualisation.
Usage
predict_oyster(
data,
species,
output_geotiff = FALSE,
resolution = 2e-04,
contours = TRUE,
verbose = FALSE
)
Arguments
data |
A dataframe or a file path to a CSV file. Must contain at minimum:
|
species |
Character string identifying the target oyster species.
Accepts the key ( |
output_geotiff |
Logical or character. If |
resolution |
Numeric. Raster cell size in degrees for GeoTIFF output
(default |
contours |
Logical. If |
verbose |
Logical. If |
Value
A dataframe (invisibly) with all original columns plus:
-
season: detected season at each location/date. -
excluded: logical,TRUEif the location fails a hard exclusion. -
exclusion_reason: character, reason(s) for exclusion orNA. -
suitability: numeric [0, 1], overall weighted suitability score. -
suitability_class: factor, one of "High", "Moderate", "Low", "Very Low", "Excluded". -
score_<variable>: per-variable component scores.
If output_geotiff is set, a GeoTIFF is written to disk and its path is
printed to the console.
Column Naming
Column names are matched case-insensitively. Recognised synonyms:
| Variable | Recognised column names |
| Longitude | lon, lng, longitude, x |
| Latitude | lat, latitude, y |
| Date | date, datetime, timestamp |
| Temperature (degrees C) | temperature, temp, temp_c |
| Salinity (PSU) | salinity, sal, salinity_psu |
| Dissolved oxygen (mg/L) | dissolved_oxygen, do, do_mgl, oxygen |
| Depth (m) | depth, depth_m |
| Current velocity (m/s) | current_velocity, velocity, current, u_mean |
| Shear stress (N/m^2) | shear_stress, tau, bed_shear, shear |
| Chlorophyll-a (ugg/L) | chlorophyll_a, chla, chl_a, chlorophyll |
| Turbidity (NTU) | turbidity, ntu, turb |
| Slope (degrees) | slope, slope_deg |
| Roughness / rugosity | roughness, rugosity |
| Substrate hardness | substrate_hardness, hardness, bottom_hardness |
| Sediment type | sediment_type, sediment, substrate_type |
| Benthic communities | benthic_communities, benthic, community |
| Biotope | biotope, biotopes, habitat |
| Fishing intensity | fishing_intensity, fishing, fishing_observed
|
Examples
# Minimal runnable example using the bundled Example Bay survey data
adcp_f <- system.file("extdata", "example_bay_adcp.csv", package = "oystermapR")
bathy_f <- system.file("extdata", "example_bay_soundings.xyz", package = "oystermapR")
ctd_f <- system.file("extdata", "example_bay_ctd.csv", package = "oystermapR")
adcp <- read_nortek_adcp(adcp_f, verbose = FALSE)
bathy <- read_soundings_xyz(bathy_f, verbose = FALSE)
ctd <- read_generic_csv(ctd_f, verbose = FALSE)
survey <- merge_sensor_data(adcp = adcp, bathy = bathy, ctd = ctd)
result <- predict_oyster(survey, "ostrea_edulis", verbose = FALSE)
table(result$suitability_class)
# Using your own CSV file
result <- predict_oyster(
data = "my_survey.csv",
species = "ostrea_edulis",
output_geotiff = "oyster_suitability.tif"
)
Project suitability under future climate scenarios
Description
Re-scores survey locations under user-specified climate perturbations to assess how habitat suitability may change under warming and related oceanographic shifts. Perturbations are applied as additive deltas to temperature and salinity, and multiplicative factors to other variables.
Built-in scenarios align with UKCP18 marine projections for NW European shelf seas:
| Scenario | SST delta | Salinity delta | Notes |
| RCP2.6/SSP1 | +0.5degrees C | 0.0 PSU | Low emissions, best case |
| RCP4.5/SSP2 | +1.2degrees C | -0.2 PSU | Intermediate emissions |
| RCP8.5/SSP5 | +2.5degrees C | -0.5 PSU | High emissions, worst case |
| Custom | user | user | Supply your own deltas |
The function runs score_locations() for each scenario and returns a list
with one result dataframe per scenario, plus a comparison summary and a
suitability_delta column (projected minus baseline) for each.
Usage
project_suitability(
result,
tolerances,
scenarios = c("rcp26", "rcp45", "rcp85"),
custom_deltas = NULL,
horizon = "2050s",
verbose = TRUE
)
Arguments
result |
Dataframe from |
tolerances |
Species tolerance list from |
scenarios |
Character vector of built-in scenario names, or NULL to
use |
custom_deltas |
Named list of custom scenarios. Each element is a
named numeric vector of deltas keyed to column names, e.g.
|
horizon |
Character. Label for the time horizon, used in output names
only (default |
verbose |
Logical. Print scenario comparison table (default TRUE). |
Value
Named list:
One dataframe per scenario (named by scenario key) with
suitability,suitability_class, andsuitability_deltacolumns.-
"summary"– dataframe comparing mean suitability, class breakdown, and net change vs baseline across all scenarios.
Examples
result <- predict_oyster(survey, "ostrea_edulis")
tol <- get_species_tolerances("ostrea_edulis")
# Project under all three UKCP18-aligned scenarios
proj <- project_suitability(result, tol)
# Mean suitability change by scenario
proj$summary
# High-risk cells: currently High but drops to Low under RCP8.5
vulnerable <- subset(proj$rcp85,
result$suitability_class == "High" & suitability_class == "Low")
# Custom scenario (e.g. local downscaled projection)
proj2 <- project_suitability(result, tol, scenarios = NULL,
custom_deltas = list(
ukcp18_p95 = c(temperature = 3.2, salinity = -0.8)))
Run automated quality control on raw survey data
Description
Detects and flags suspect sensor readings before they enter the suitability model. Three complementary checks are applied to each numeric column:
-
Range check – values outside the physically plausible range for that variable (e.g. temperature above 35degrees C in temperate coastal water, salinity above 42 PSU) are flagged as
"range_fail". -
Statistical outlier – values beyond
iqr_kx IQR from the median are flagged as"outlier"(default k = 3, Tukey's outer fence). -
Temporal gradient – when a
datetimecolumn is present and data is sorted chronologically, sequential differences exceeding themax_gradientthreshold are flagged as"gradient_fail"(instrument spike or data entry error).
Additionally, cross-variable sanity checks flag physically implausible combinations:
Dissolved oxygen > 20 mg/L with temperature < 5degrees C (likely sensor error)
Salinity < 5 PSU with temperature > 25degrees C (likely freshwater intrusion instrument error, not a real coastal reading)
Chlorophyll_a > 50 ugg/L (extreme bloom or sensor fouling)
Flagged values are not removed – they are marked in a qc_flag_<col>
column so the user can decide whether to replace with NA, correct, or
accept them. A summary qc_n_flags column counts total flags per row.
Usage
qc_survey_data(
df,
datetime_col = "datetime",
iqr_k = 3,
max_gradients = NULL,
apply_flags = FALSE,
verbose = TRUE
)
Arguments
df |
Dataframe of raw survey measurements. |
datetime_col |
Character or NULL. Name of the datetime column for
temporal gradient checks (default |
iqr_k |
Numeric. IQR multiplier for outlier detection (default 3.0; use 1.5 for stricter QC of high-precision CTD data). |
max_gradients |
Named numeric vector. Maximum allowed absolute change per unit time (seconds) per variable. Defaults are conservative for typical towed or lowered deployments. |
apply_flags |
Logical. If TRUE, replace flagged values with NA in the returned dataframe. If FALSE (default), flags are added but values preserved for manual review. |
verbose |
Logical. Print QC summary (default TRUE). |
Value
Input dataframe with additional columns:
qc_flag_<varname> for each checked variable ("ok", "range_fail",
"outlier", "gradient_fail", "cross_fail"),
qc_n_flags (integer count of flags per row),
qc_status ("pass", "review", "fail" – fail = 3+ flags on one row).
Examples
# Run QC before modelling
survey_qc <- qc_survey_data(survey, datetime_col = "timestamp")
# Inspect flagged rows
flagged <- subset(survey_qc, qc_status %in% c("review","fail"))
View(flagged)
# Apply flags (replace flagged values with NA) before scoring
survey_clean <- qc_survey_data(survey, apply_flags = TRUE)
result <- predict_oyster(survey_clean, "ostrea_edulis")
Read an Aanderaa RCM current meter CSV export
Description
Reads the CSV file exported by Aanderaa Data Studio from an Aanderaa Recording Current Meter (RCM Blue 5450, SeaGuard RCM, or compatible model). The export typically contains time, current speed, current direction, and optionally temperature, pressure/depth, salinity, and dissolved oxygen.
Because Aanderaa instruments are moored at a fixed location, latitude and
longitude must be supplied by the user via lat and lon.
Current speed is returned as current_speed (m/s). If the export uses
cm/s (common in older instrument firmware), set speed_unit = "cm/s" or
leave as "auto" and the function will detect units from the column header
or by range-checking the data.
Usage
read_aanderaa_csv(
file,
lat = NULL,
lon = NULL,
speed_unit = c("auto", "m/s", "cm/s"),
depth_from_pressure = FALSE,
verbose = TRUE
)
Arguments
file |
Character. Path to the Aanderaa Data Studio CSV export. |
lat |
Numeric. Site latitude in decimal degrees (required). |
lon |
Numeric. Site longitude in decimal degrees (required). |
speed_unit |
Character. Units of the current speed column.
|
depth_from_pressure |
Logical. If |
verbose |
Logical. Print progress messages (default |
Value
A one-row dataframe (site-averaged) with columns lat, lon,
current_speed, and any additional variables present in the file
(temperature, salinity, dissolved_oxygen, pressure_dbar, depth)
– ready for merge_sensor_data().
See Also
read_rdi_adcp(), read_nortek_aquadopp(), merge_sensor_data()
Examples
# Moored RCM Blue at Strangford Lough narrows
rcm <- read_aanderaa_csv(
"strangford_rcm_2024.csv",
lat = 54.398,
lon = -5.558
)
survey <- merge_sensor_data(adcp = rcm, bathy = bathy)
Read a generic sensor CSV with flexible column mapping
Description
Reads any sensor CSV (Lowrance BioBase, CTD/probe logger, bathymetric sonar, sidescan export, etc.) and maps its columns to the standard oystermapR variable names. Performs spatial averaging at a specified resolution.
Usage
read_generic_csv(
file,
col_map = NULL,
spatial_res = 4L,
min_obs = 3L,
categorical_cols = c("sediment_type", "benthic_communities", "biotope",
"fishing_intensity"),
verbose = TRUE
)
Arguments
file |
Character. Path to the CSV file. |
col_map |
Named character vector mapping oystermapR variable names to column names in the CSV. Only variables you want to extract need to be listed. Example: col_map = c( lat = "Latitude", lon = "Longitude", depth = "Depth_m", substrate_hardness = "Hardness", temperature = "Temp_C" ) If |
spatial_res |
Integer. Decimal places for spatial binning (default |
min_obs |
Integer. Minimum observations per cell (default |
categorical_cols |
Character vector. Column names (after mapping) that
should be treated as categorical – mode (most common value) is returned
instead of mean. Default: |
verbose |
Logical. Print a summary after loading (default |
Value
A spatially averaged dataframe using oystermapR standard column names.
Examples
# Lowrance BioBase export (automatic column matching)
biobase <- read_generic_csv("biobase_export.csv")
# With explicit column mapping
probe <- read_generic_csv(
"ctd_data.csv",
col_map = c(lat="Lat", lon="Lon", temperature="Temp", salinity="Sal",
dissolved_oxygen="DO_mgl")
)
Read and process a Nortek Signature 500 ADCP CSV file
Description
Reads a raw merged Nortek Signature 500 ADCP CSV, automatically detects
and excludes depth bins contaminated by acoustic sidelobe interference,
computes current velocity and estimated bed shear stress from the deepest
clean bin, and returns a spatially averaged dataframe ready to merge with
other sensor data via merge_sensor_data().
Sidelobe detection: Each bin is tested independently. A bin is flagged
as contaminated if more than sidelobe_threshold fraction of its speed
readings exceed max_plausible_speed. Once a bin is flagged, all deeper
bins are also flagged (contamination propagates downward).
Shear stress accuracy: Bed shear stress (tau = rho * Cd * U_bed^2)
is most accurate when computed from near-bed velocity. If near-bed bins are
contaminated and only near-surface bins remain, a warning is issued and the
shear_stress_quality column is set to "low". If two or more clean bins
exist, the deepest clean bin is used and quality is "moderate" or
"good".
Usage
read_nortek_adcp(
file,
spatial_res = 4L,
min_obs = 5L,
max_plausible_speed = 1.5,
sidelobe_threshold = 0.1,
rho = 1025,
Cd = 0.002,
verbose = TRUE
)
Arguments
file |
Character. Path to the Nortek merged CSV file. |
spatial_res |
Integer. Decimal places for lat/lon binning (default |
min_obs |
Integer. Minimum number of raw ensembles per spatial cell
to include in output (default |
max_plausible_speed |
Numeric. Speed in m/s above which a reading is
considered a sidelobe spike (default |
sidelobe_threshold |
Numeric. Fraction of readings in a bin that must
exceed |
rho |
Numeric. Seawater density in kg/m^3 (default |
Cd |
Numeric. Drag coefficient for bed shear stress calculation
(default |
verbose |
Logical. Print processing summary (default |
Value
A dataframe with one row per spatial cell containing:
-
lat,lon– cell centroid coordinates -
date– earliest observation date in cell (character,"YYYY-MM-DD") -
current_velocity– mean speed from deepest clean bin (m/s) -
current_velocity_sd– standard deviation within cell (m/s) -
current_velocity_p95– 95th percentile speed (m/s) -
shear_stress– estimated bed shear stress (N/m^2) -
shear_stress_quality–"good","moderate", or"low" -
n_ensembles– raw ensembles averaged into this cell -
bins_used– which velocity bins contributed (e.g."bin1") -
bins_excluded– which bins were removed as contaminated
Examples
adcp <- read_nortek_adcp("S104456A008_AWE_Melfort_merged.csv")
print(adcp)
Read a Nortek Aquadopp Profiler ASCII export
Description
Reads the ASCII velocity export produced by Nortek's AquaPro software from an Aquadopp Profiler deployment. The function accepts two common export layouts:
-
ENU CSV – a single file with named columns for east, north, and up velocity per depth bin (e.g.
East_1,North_1,East_2, ...) plus optional time, temperature, and pressure columns. -
Separate beam files – the companion
.sen(sensor) file providing time, temperature, and pressure, and the.v1/.v2files providing beam velocities. Supply the.senpath tofileand setbeam_files = c("survey.v1", "survey.v2", "survey.v3").
Because Aquadopp instruments are often moored at a single location rather
than vessel-mounted, GPS coordinates may not be embedded in the export. In
that case supply lat and lon directly; all output rows receive that
fixed position.
Usage
read_nortek_aquadopp(
file,
beam_files = NULL,
lat = NULL,
lon = NULL,
spatial_res = 4L,
min_obs = 5L,
max_plausible_speed = 2.5,
rho = 1025,
Cd = 0.002,
verbose = TRUE
)
Arguments
file |
Character. Path to the AquaPro ENU CSV export, or path to the
|
beam_files |
Character vector. Paths to |
lat |
Numeric. Fixed latitude for moored deployments (decimal degrees). Required when GPS columns are absent from the export. |
lon |
Numeric. Fixed longitude for moored deployments (decimal degrees). Required when GPS columns are absent from the export. |
spatial_res |
Integer. Decimal places for lat/lon binning (default |
min_obs |
Integer. Minimum ensembles per grid cell (default |
max_plausible_speed |
Numeric. Upper speed threshold for outlier
removal in m/s (default |
rho |
Numeric. Water density in kg/m^3 (default |
Cd |
Numeric. Drag coefficient for bed shear stress (default |
verbose |
Logical. Print progress messages (default |
Value
A dataframe with columns lat, lon, current_speed,
bed_shear_stress, shear_stress_quality, and optionally temperature,
pressure_dbar, and date – ready for merge_sensor_data().
Examples
# Moored deployment (fixed position)
adcp <- read_nortek_aquadopp(
"harbour_mouth_aqd.csv",
lat = 56.512, lon = -5.403
)
# Vessel-mounted with GPS columns in the file
adcp <- read_nortek_aquadopp("transect_enu.csv")
survey <- merge_sensor_data(adcp = adcp, bathy = bathy)
Read a Teledyne RDI ADCP binary file (PD0 format)
Description
Reads the binary PD0 output file produced by Teledyne RDI instruments
(WorkHorse II, Sentinel V, StreamPro, Ocean Surveyor, Long Ranger) via the
the oce package's read.adp.rdi() parser. The function
extracts east/north velocity profiles, applies sidelobe contamination
detection, and derives current speed and bed shear stress – producing
output in the same format as read_nortek_adcp().
Optional dependency: reading RDI binary (PD0) files requires the oce
package (install.packages("oce")). If oce is absent the function will
stop with an informative message. ASCII WinRiver II exports (.txt/.csv)
do not require oce.
GPS: vessel-mounted deployments embed GPS in the binary file and are
handled automatically. For moored deployments supply lat and lon.
Usage
read_rdi_adcp(
file,
lat = NULL,
lon = NULL,
spatial_res = 4L,
min_obs = 5L,
max_plausible_speed = 2,
sidelobe_threshold = 0.1,
rho = 1025,
Cd = 0.002,
verbose = TRUE
)
Arguments
file |
Character. Path to the RDI binary |
lat |
Numeric. Fixed latitude for moored deployments (decimal degrees). |
lon |
Numeric. Fixed longitude for moored deployments (decimal degrees). |
spatial_res |
Integer. Decimal places for lat/lon binning (default |
min_obs |
Integer. Minimum ensembles per grid cell (default |
max_plausible_speed |
Numeric. Upper speed threshold in m/s for
sidelobe / spike removal (default |
sidelobe_threshold |
Numeric. Fraction of ensembles exceeding
|
rho |
Numeric. Water density in kg/m^3 (default |
Cd |
Numeric. Drag coefficient for bed shear stress (default |
verbose |
Logical. Print progress messages (default |
Value
A dataframe with columns lat, lon, current_speed,
bed_shear_stress, shear_stress_quality, and n_ensembles – ready
for merge_sensor_data().
See Also
read_nortek_adcp(), read_nortek_aquadopp(), merge_sensor_data()
Examples
# Vessel-mounted WorkHorse II (GPS embedded in binary)
adcp <- read_rdi_adcp("survey_2024.000")
# Moored Sentinel V (fixed position)
adcp <- read_rdi_adcp("mooring_2024.000", lat = 52.41, lon = -9.20)
survey <- merge_sensor_data(adcp = adcp, bathy = bathy)
Read a bathymetric or sidescan GeoTIFF and convert to oystermapR variables
Description
Reads a single-band GeoTIFF raster using terra and returns a spatially
averaged dataframe suitable for merge_sensor_data().
Two modes are supported, selected via the type argument:
"bathy" – Bathymetric DEM (e.g. from Ping 3DSS or post-processed
soundings). Extracts:
-
depth– raster cell value (metres) -
slope– computed usingterra::terrain() -
roughness– Terrain Ruggedness Index viaterra::terrain()
"sidescan" – Sidescan backscatter mosaic (normalised 0–1 or raw DN).
Extracts:
-
substrate_hardness– backscatter intensity normalised to [0, 1]. High backscatter -> hard/coarse substrate (rock, gravel, shell); low backscatter -> soft substrate (mud, fine sand).
Usage
read_sonar_tif(
file,
type = c("bathy", "sidescan"),
spatial_res = 4L,
band = 1L,
nodata_val = NA,
depth_positive = TRUE,
verbose = TRUE
)
Arguments
file |
Character. Path to the GeoTIFF file ( |
type |
Character. Either |
spatial_res |
Integer. Decimal places for output lat/lon grid
(default |
band |
Integer. Raster band to read (default |
nodata_val |
Numeric. Value to treat as no-data (default |
depth_positive |
Logical. For |
verbose |
Logical (default |
Value
A dataframe with lat, lon, and the derived variable columns
ready for merge_sensor_data().
Examples
bathy_df <- read_sonar_tif("kames_bay_bathy.tif", type = "bathy")
sidescan_df <- read_sonar_tif("kames_bay_sidescan.tif", type = "sidescan")
survey <- merge_sensor_data(bathy = bathy_df, sidescan = sidescan_df, adcp = adcp_df)
Read a bathymetric XYZ point cloud and derive depth, slope and roughness
Description
Reads a plain-text XYZ soundings file (comma or space delimited, optional
#-prefixed header), spatially averages onto a regular grid, and derives:
-
depth– mean depth per cell (metres, positive downward) -
roughness– rugosity index per cell, derived from intra-cell depth variance using the surface-area approximationrugosity = sqrt(1 + (depth_sd / cell_size_m)^2). Values near 1.0 are flat; higher values indicate more complex relief. -
slope– maximum downslope gradient in degrees, computed by finite differences between neighbouring grid cells (Horn 1981 method).
Usage
read_soundings_xyz(
file,
lon_col = NULL,
lat_col = NULL,
depth_col = NULL,
spatial_res = 4L,
min_soundings = 10L,
depth_positive = TRUE,
verbose = TRUE
)
Arguments
file |
Character. Path to the |
lon_col, lat_col, depth_col |
Character or integer. Column names or
positions for longitude, latitude, and depth. If |
spatial_res |
Integer. Decimal places for lat/lon binning (default |
min_soundings |
Integer. Minimum soundings per cell to include in output
(default |
depth_positive |
Logical. If |
verbose |
Logical. Print processing summary (default |
Value
A dataframe with columns lat, lon, depth, slope,
roughness, and n_soundings – ready for merge_sensor_data().
Examples
bathy <- read_soundings_xyz("kames_bay_soundings.xyz")
survey <- merge_sensor_data(adcp = adcp_data, bathy = bathy)
Reset Bayesian tolerance updates for a species
Description
Clears the session cache for a species, reverting to the built-in prior parameters. Does not affect saved files (use file deletion for those).
Usage
reset_tolerance_update(species = "all", verbose = TRUE)
Arguments
species |
Character. Species key, or |
verbose |
Logical. Default TRUE. |
Value
Invisibly NULL.
Examples
reset_tolerance_update("ostrea_edulis")
reset_tolerance_update("all")
Save Bayesian tolerance updates to disk
Description
Serialises the current session cache for a species to an .rds file so
that updated parameters persist across R sessions. Load again with
load_tolerance_update().
Usage
save_tolerance_update(
species = "all",
path = tools::R_user_dir("oystermapR", which = "data"),
verbose = TRUE
)
Arguments
species |
Character. Species key or |
path |
Character. Directory to save into (default: user-specific data
directory from |
verbose |
Logical. Default TRUE. |
Value
Invisibly: path(s) to saved file(s).
Examples
save_tolerance_update("ostrea_edulis")
save_tolerance_update("all", path = "data/bayes_updates/")
Score anthropogenic disturbance at survey locations
Description
Calculates a disturbance index [0,1] from bottom trawling intensity, dredging activity, and anchor damage. The primary data input is the ICES swept-area ratio (SAR), which can be supplied manually or fetched live from the ICES VMS data portal.
This output is primarily relevant to the gear feasibility and economic viability modules. Very high trawling intensity (SAR > 0.5/yr) constitutes a hard gate for bottom culture and restoration reef deployment – physical gear deployment is not viable on actively trawled ground regardless of ecological suitability.
Input options:
-
trawling_data– ICES VMS SAR data, manually downloaded as CSV/dataframe. -
fetch_live = TRUE– queries ICES VMS GeoServer endpoint. Neither – returns zero disturbance with a warning.
Usage
score_anthropogenic_disturbance(
result,
trawling_data = NULL,
anchor_data = NULL,
dredge_areas = NULL,
fetch_live = FALSE,
match_radius_m = 5000,
verbose = TRUE
)
Arguments
result |
Dataframe from |
trawling_data |
Dataframe of ICES VMS SAR data (see Details), or NULL. |
anchor_data |
Dataframe of anchoring density data, or NULL. |
dredge_areas |
Dataframe of dredging/extraction area centroids, or NULL. |
fetch_live |
Logical. Query ICES VMS GeoServer (default FALSE). |
match_radius_m |
Numeric. Spatial matching radius in metres (default 5000 m, consistent with ICES c-square ~0.05degrees resolution). |
verbose |
Logical. Default TRUE. |
Value
result with additional columns:
-
disturbance_sar: swept-area ratio at or near each site (NA if no data) -
disturbance_score[0,1]: composite anthropogenic disturbance index -
disturbance_class: "Negligible" / "Low" / "Moderate" / "High" / "Active" -
disturbance_gear_gate: logical – TRUE if SAR exceeds gear deployment threshold (used byassess_gear_feasibility()) -
disturbance_note: management flag
trawling_data format
A dataframe with columns:
-
lat,lon– centroid of c-square or fishing location -
sar– swept-area ratio (numeric; dimensionless ratio per year) -
gear_type(optional) – "otter", "beam", "dredge", "seine" etc. Dredge gear has higher impact per SAR unit.
Additional disturbance inputs
-
anchor_data: dataframe withlat,lon,density(vessel anchoring density index) for recreational/commercial anchorage areas. -
dredge_areas: dataframe withlat,lonidentifying licensed aggregate extraction areas (all sites within the polygon receive score 1.0).
Examples
# Manual ICES VMS data (downloaded from ICES data portal)
vms <- read.csv("ices_vms_sar_2022.csv") # columns: lat, lon, sar, gear_type
result <- score_anthropogenic_disturbance(result, trawling_data = vms)
# Live ICES VMS fetch
result <- score_anthropogenic_disturbance(result, fetch_live = TRUE)
Score locations for disease and parasite risk
Description
Assesses the environmental risk of two key bivalve pathogens at each survey location:
-
Bonamia ostreae (Ostrea edulis only) – an intracellular haplosporidian parasite that is the primary constraint on flat oyster restoration in northern Europe. Risk is temperature-driven: transmission peaks 15-20degrees C and is negligible below 8degrees C. The function also accepts a
known_sitesdataframe of confirmed infected locations to apply a proximity multiplier. -
OsHV-1 microvariant (Magallana gigas) – a herpesvirus causing Pacific Oyster Mortality Syndrome (POMS). Mortality events occur when seawater exceeds 16degrees C for sustained periods. High turbidity amplifies risk by increasing larval stress.
Risk scores are returned on a 0-1 scale (0 = negligible, 1 = highest environmental risk) and classified as Low / Moderate / High / Critical. They are deliberately kept separate from the suitability score – a High suitability site with High disease risk is a meaningful regulatory red flag.
Usage
score_disease_risk(
result,
species = "ostrea_edulis",
known_sites = NULL,
temp_col = "temperature",
salinity_col = "salinity",
turbidity_col = "turbidity",
verbose = TRUE
)
Arguments
result |
Dataframe from |
species |
Character. |
known_sites |
Dataframe or NULL. Known infected/positive sites with
|
temp_col |
Character. Name of temperature column in |
salinity_col |
Character. Name of salinity column (default |
turbidity_col |
Character. Name of turbidity column (default
|
verbose |
Logical. Print risk summary (default TRUE). |
Value
Input dataframe with additional columns:
disease_risk_score [0-1], disease_risk_class (Low/Moderate/High/Critical),
disease_agent (pathogen name), and disease_risk_note (plain-language
interpretation).
Examples
result <- predict_oyster(survey, "ostrea_edulis")
# Basic risk scoring (temperature-driven only)
result <- score_disease_risk(result, "ostrea_edulis")
# With known Bonamia-positive site locations
infected <- data.frame(lat = c(55.8, 56.1), lon = c(-5.2, -5.4))
result <- score_disease_risk(result, "ostrea_edulis",
known_sites = infected)
# Sites with high ecological suitability but also high disease risk
flagged <- subset(result,
suitability_class == "High" & disease_risk_class %in% c("High","Critical"))
Score economic viability for aquaculture or restoration
Description
Combines ecological suitability, gear feasibility, site accessibility, and patch size into a composite economic viability index [0-1]. This is a heuristic scoring – not a financial model – but it ranks sites consistently and can be used to shortlist candidates for detailed business case development.
The index weighs four components:
-
Ecological suitability (40%) – from
predict_oyster() -
Gear feasibility (25%) – best gear score from
assess_gear_feasibility() -
Accessible patch area (20%) – log-scaled area of connected suitable habitat (from
analyse_connectivity()if available) -
Access score (15%) – distance to nearest harbour/port (if
harbourstable supplied) or flat 0.5 if unknown
Usage
score_economic_viability(
result,
harbours = NULL,
target = c("restoration", "aquaculture"),
verbose = TRUE
)
Arguments
result |
Dataframe from |
harbours |
Dataframe or NULL. Known harbour/landing locations with
|
target |
Character. |
verbose |
Logical. Print viability summary (default TRUE). |
Value
Input dataframe with viability_score [0-1],
viability_class (Poor/Fair/Good/Excellent), and viability_notes.
Examples
result <- predict_oyster(survey, "ostrea_edulis")
result <- assess_gear_feasibility(result)
result <- analyse_connectivity(result)
harbours <- data.frame(
name = c("Tarbert","Portavadie"),
lat = c(55.865, 55.875),
lon = c(-5.425, -5.300)
)
result <- score_economic_viability(result, harbours = harbours,
target = "aquaculture")
# Best aquaculture candidates
subset(result, viability_class %in% c("Good","Excellent"))
Score harmful algal bloom risk at survey locations
Description
Produces a HAB risk index (0 = negligible, 1 = severe) based on historical bloom event frequency and, where available, nutrient and stratification data.
Input options (in priority order):
-
hab_data– manually supplied dataframe of historical HAB events (see Details). -
fetch_live = TRUE– queries the ICES HAB database for event records within the survey extent and a specified date range. Requires internet access and thehttrpackage. Neither – returns a fixed background-level risk (0.1) with a warning.
Usage
score_hab_risk(
result,
hab_data = NULL,
fetch_live = FALSE,
date_range = NULL,
match_radius_m = 5000,
species = "ostrea_edulis",
verbose = TRUE
)
Arguments
result |
Dataframe from |
hab_data |
Dataframe of historical HAB events (see Details), or NULL. |
fetch_live |
Logical. Query ICES HAB database (default FALSE). |
date_range |
Character vector length 2 |
match_radius_m |
Numeric. Radius in metres for event aggregation (default 5000 m = 5 km, consistent with ICES monitoring station spacing). |
species |
Character. Target species – affects toxin severity weights
( |
verbose |
Logical. Default TRUE. |
Value
result with additional columns:
-
hab_risk[0,1]: composite risk index -
hab_risk_class: "Low" / "Moderate" / "High" / "Critical" -
hab_closure_days_per_year: estimated days/year with shellfish closures -
hab_dominant_toxin: most frequently recorded toxin class at the site -
hab_risk_note: management flag
hab_data format
A dataframe with columns:
-
lat,lon– event coordinates (decimal degrees) -
date– event date (Date or character "YYYY-MM-DD") -
genus(optional) – bloom genus (e.g. "Alexandrium", "Pseudo-nitzschia") -
toxin(optional) – toxin class (PSP, ASP, DSP, AZP) – affects severity weight -
closure_days(optional) – number of days the area was closed; default 1 per event if absent
Examples
# Manual data
hab <- data.frame(lat=52.1, lon=-4.5, date="2022-07-15",
genus="Alexandrium", toxin="PSP", closure_days=14)
result <- score_hab_risk(result, hab_data=hab, species="ostrea_edulis")
# Live ICES fetch
result <- score_hab_risk(result, fetch_live=TRUE,
date_range=c("2015-01-01","2024-12-31"))
Score larval dispersal connectivity at survey locations
Description
Estimates the larval connectivity of each survey location using one or both of two approaches:
Route 1 – Built-in gap-threshold union-find (no external data needed):
Patches above min_suitability that lie within the species' effective
dispersal kernel distance are grouped into dispersal clusters using
union-find. Within each cluster, a connectivity score is derived from the
number, quality, and distance-weighted contribution of potential larval
sources (Gaussian decay kernel, sigma = dispersal_km / 2).
Route 2 – External connectivity matrix (particle tracking / literature):
When connectivity_matrix is supplied (a dataframe of source -> destination
pairs with weights), those weights replace the Gaussian kernel for matched
pairs. Rows not covered by the matrix fall back to Route 1. The matrix can
come from:
-
OpenDrift particle tracking simulations (export as source/destination settlement density CSV)
-
FVCOM / ROMS model connectivity matrices
-
Published empirical matrices (e.g. Robins et al. 2017)
Usage
score_larval_connectivity(
result,
species = NULL,
dispersal_km = NULL,
pld_days = NULL,
tidal_excursion_km = 5,
min_suitability = 0.4,
connectivity_matrix = NULL,
matrix_match_radius_deg = 0.1,
verbose = TRUE
)
Arguments
result |
Dataframe from |
species |
Character. Species key (e.g. |
dispersal_km |
Numeric. Override for the effective dispersal kernel
radius in km. When supplied, |
pld_days |
Numeric. Override for the mean pelagic larval duration in
days. Takes precedence over species lookup; ignored if |
tidal_excursion_km |
Numeric. Mean daily tidal dispersal distance in km
(default 5 km/day approx. mean current 0.06 m/s x 24 h). Increase for highly
tidal or estuarine sites (8–15 km/day). Ignored if |
min_suitability |
Numeric. Minimum suitability score for a patch to qualify as a potential larval source or sink (default 0.40). Unsuitable patches still receive a connectivity score reflecting their proximity to sources. |
connectivity_matrix |
Dataframe or NULL. External connectivity weights (Route 2). Must contain columns:
|
matrix_match_radius_deg |
Numeric. Spatial matching tolerance in decimal degrees for linking matrix entries to result rows (default 0.10 approx. 10 km). |
verbose |
Logical. Print dispersal parameters and connectivity summary (default TRUE). |
Value
result with additional columns:
-
larval_dispersal_km: effective kernel distance used -
larval_pld_days: PLD used (species default or override) -
larval_type: "lecithotrophic" or "planktotrophic" -
n_larval_sources: suitable patches within dispersal range -
source_quality_score[0,1]: Gaussian-weighted quality of nearby sources -
larval_cluster_id: dispersal network cluster ID (integer; NA if below min_suitability threshold and no sources nearby) -
larval_cluster_size: number of suitable patches in dispersal network -
nearest_source_km: km to nearest suitable source patch -
larval_connectivity_score[0,1]: composite dispersal connectivity score -
larval_connectivity_class: "Highly connected" / "Connected" / "Low connectivity" / "Isolated" -
larval_connectivity_note: plain-language ecological interpretation -
larval_source: "union_find", "matrix", or "matrix+union_find"
Dispersal distance
If dispersal_km is not supplied, the effective kernel distance is derived
from the species' mean pelagic larval duration (PLD) multiplied by
tidal_excursion_km (mean daily tidal dispersal distance):
dispersal_km = PLD_mean_days x tidal_excursion_km
PLD values are literature-derived:
| Species | PLD | Larval type | Default dispersal |
| O. edulis | 1--7 days | lecithotrophic | ~20 km |
| M. gigas | 14--28 days | planktotrophic | ~105 km |
| C. angulata | 14--21 days | planktotrophic | ~85 km |
| O. stentina | 2--7 days | lecithotrophic | ~20 km |
| O. lurida | 3--10 days | lecithotrophic | ~30 km |
References
Cowen R.K. & Sponaugle S. (2009) Annual Review of Marine Science 1:443–466. Robins P.E. et al. (2017) J Applied Ecology 54:1699–1710. doi:10.1111/1365-2664.12854 Siegel D.A. et al. (2008) PNAS 105:8974–8979. Woolmer A.P. et al. (2009) J Shellfish Research 28:107–116. Trimble A.C. et al. (2009) J Shellfish Research 28:43–53.
Examples
result <- predict_oyster(survey, "ostrea_edulis")
# Route 1 only -- built-in dispersal kernel
result <- score_larval_connectivity(result, species = "ostrea_edulis")
# Route 1 with custom dispersal distance (e.g. strong tidal excursion)
result <- score_larval_connectivity(result, species = "ostrea_edulis",
tidal_excursion_km = 10)
# Route 1 with manual dispersal override
result <- score_larval_connectivity(result, dispersal_km = 8)
# Route 2 -- external connectivity matrix from OpenDrift
cm <- read.csv("opendrift_connectivity.csv")
# Required columns: source_lat, source_lon, dest_lat, dest_lon, weight
result <- score_larval_connectivity(result,
species = "ostrea_edulis",
connectivity_matrix = cm)
# Inspect isolated patches -- need artificial seeding
isolated <- subset(result,
larval_connectivity_class == "Isolated" & suitability_class == "High")
# Strong candidates: high suitability AND high connectivity
targets <- subset(result,
suitability_class == "High" & larval_connectivity_class == "Highly connected")
Score all locations in a dataframe
Description
Applies .score_row() across every non-excluded row. Excluded rows
receive suitability = 0. Adds per-variable score columns and a
limiting_factors column identifying the variables most constraining
the score at each location.
Usage
score_locations(df, tolerances, verbose = FALSE)
Arguments
df |
A dataframe processed by |
tolerances |
Species tolerance list from |
verbose |
Logical. Print per-variable scoring summary (default |
Value
The input dataframe with additional columns:
-
suitability: weighted score [0, 1] -
suitability_class: "High" / "Moderate" / "Low" / "Very Low" / "Excluded" -
limiting_factors: comma-separated names of the variables most limiting the score (NA if all variables score >= 0.65) -
score_<variable>: one column per scored variable present in the data
Score predation and bioturbation pressure at survey locations
Description
Produces a predation risk index (0 = negligible, 1 = severe) based on predator occurrence data, substrate rugosity, and depth. The output is intended as an overlay on ecological suitability – high predation risk does not reduce the habitat suitability score directly but is flagged in a separate column for management planning (e.g. cage exclusion, deep subtidal placement to avoid intertidal drills).
Input options (in priority order):
-
predator_data– manually supplied dataframe with predator records (see Details). -
fetch_live = TRUE– queries EMODnet Biology for Asterias rubens and Carcinus maenas occurrence records within the survey extent. Requires internet access and thehttrpackage. Neither – returns a depth-proxy-only risk score with a warning.
Usage
score_predation_risk(
result,
predator_data = NULL,
fetch_live = FALSE,
match_radius_m = 1000,
species = "ostrea_edulis",
verbose = TRUE
)
Arguments
result |
Dataframe from |
predator_data |
Dataframe of predator records (see Details), or NULL. |
fetch_live |
Logical. Query EMODnet Biology API (default FALSE). |
match_radius_m |
Numeric. Radius in metres within which predator records are aggregated per survey cell (default 1000 m). |
species |
Character. Target oyster species – affects which predators
are most relevant ( |
verbose |
Logical. Default TRUE. |
Value
result with additional columns:
-
predation_risk[0,1]: composite risk index -
predation_risk_class: "Low" / "Moderate" / "High" / "Severe" -
predation_risk_note: text flag for management planning -
n_predator_records: number of predator records within match_radius_m
predator_data format
A dataframe with columns:
-
lat,lon– coordinates -
species– species name or AphiaID (character) -
density(optional) – relative abundance / density index; if absent, each record is treated as presence-only (density = 1) -
date(optional) – survey date (used to filter to relevant season)
Examples
# Manual data
pred <- data.frame(lat=51.5, lon=-4.1, species="Asterias rubens", density=3)
result <- score_predation_risk(result, predator_data=pred, species="ostrea_edulis")
# Live EMODnet fetch
result <- score_predation_risk(result, fetch_live=TRUE)
# Depth-proxy only (no predator data)
result <- score_predation_risk(result)
Score sediment stability and mobility at survey locations
Description
Calculates the Shields parameter and a sediment stability score [0,1] for each survey location. High sediment mobility (theta >> theta_cr) indicates that the seabed is frequently in motion, which is unfavourable for oyster settlement, juvenile survival, and restoration reef persistence.
Usage
score_sediment_stability(
result,
current_col = "current_ms",
wave_hs_col = "wave_hs_m",
depth_col = "depth_m",
substrate_col = "substrate",
d50_col = "d50_mm",
wave_period_s = 8,
drag_coef = 0.003,
verbose = TRUE
)
Arguments
result |
Dataframe from
|
current_col |
Character. Column name for depth-averaged current speed
(m/s). Default |
wave_hs_col |
Character. Column name for significant wave height (m).
Default |
depth_col |
Character. Column name for depth (m). Default |
substrate_col |
Character. Column name for substrate type. Default
|
d50_col |
Character. Column name for measured median grain size (mm).
Default |
wave_period_s |
Numeric. Peak wave period in seconds for orbital velocity calculation. Default 8 s (typical NW European sea state). |
drag_coef |
Numeric. Quadratic bed drag coefficient C_f. Default 0.003 (smooth mixed seabed; increase to 0.005-0.010 for rough rock/reef). |
verbose |
Logical. Default TRUE. |
Value
result with additional columns:
-
shields_parameter: Shields number theta at each site -
shields_critical: critical Shields number theta_cr for the substrate d50 -
mobility_ratio: theta / theta_cr (> 1.0 = mobile; < 1.0 = stable) -
d50_mm_estimated: grain size used (mm) – measured or substrate-derived -
sediment_stability_score[0,1]: 1 = fully stable, 0 = highly mobile -
sediment_mobility_class: "Stable" / "Marginally stable" / "Mobile" / "Highly mobile" -
sediment_stability_note: management flag
Inputs required
At minimum, the function needs current velocity to estimate bed shear stress. Grain size improves accuracy but can be estimated from the substrate type column if not directly measured.
Examples
# With current velocity and substrate type columns
result <- score_sediment_stability(result, current_col="current_ms", substrate_col="substrate")
# After score_wave_exposure() -- wave_hs_m column already present
result <- score_wave_exposure(result, fetch_km=15)
result <- score_sediment_stability(result)
# With measured grain size
result$d50_mm <- c(0.25, 2.5, 15.0, 0.08)
result <- score_sediment_stability(result)
Score locations for spat settlement suitability
Description
Assesses whether survey locations meet the conditions required for bivalve larval settlement and metamorphosis. Settlement scoring uses a separate, tighter tolerance specification than adult habitat scoring – larvae are more sensitive to turbidity, require minimum current flow for delivery, and need hard substrate for attachment.
The output can be compared directly against the adult suitability score from
predict_oyster() to distinguish "good adult habitat" from "likely natural
recruitment site."
Usage
score_settlement(survey, species = "ostrea_edulis", verbose = TRUE)
Arguments
survey |
Dataframe of survey measurements. Same format as |
species |
Character. One of |
verbose |
Logical. Print summary (default TRUE). |
Value
Dataframe with settlement_suitability, settlement_class, and
per-variable settle_score_* columns. Combine with adult result via
cbind() or merge on lat/lon.
Examples
adult_result <- predict_oyster(survey, "ostrea_edulis")
settlement_result <- score_settlement(survey, "ostrea_edulis")
# Identify cells suitable for both adult survival AND natural recruitment
combined <- merge(adult_result, settlement_result[, c("lat","lon",
"settlement_suitability","settlement_class")],
by = c("lat","lon"))
combined$dual_suitable <- combined$suitability >= 0.6 &
combined$settlement_suitability >= 0.6
Score wave exposure from fetch or measured wave height
Description
Produces a wave exposure score [0,1] and significant wave height estimate for each survey location. High wave exposure reduces gear feasibility and can destabilise restoration reef structures and spat bags.
Input options:
-
fetch_km: effective fetch in kilometres (distance to nearest land or obstruction in the prevailing wind direction). Most useful for estuarine, sea loch, or enclosed bay sites. Can be measured from GIS (e.g. in QGIS using the Fetch tool) or estimated from chart inspection. -
wave_height_m: directly measured or modelled significant wave height (Hs). If supplied,fetch_kmis ignored for Hs computation. Both absent: wave exposure is estimated from depth alone (coarse proxy; a warning is issued).
Usage
score_wave_exposure(
result,
fetch_col = NULL,
wave_height_col = NULL,
fetch_km = NULL,
wave_height_m = NULL,
wind_speed_ms = 12,
depth_limit = TRUE,
verbose = TRUE
)
Arguments
result |
Dataframe from |
fetch_col |
Character. Name of a column in |
wave_height_col |
Character. Name of a column in |
fetch_km |
Numeric scalar. Uniform fetch in km applied to all rows
(overridden by |
wave_height_m |
Numeric scalar. Uniform Hs in metres applied to all
rows (overridden by |
wind_speed_ms |
Numeric. Design wind speed in m/s used for JONSWAP calculation (default 12 m/s = moderate gale, Beaufort 6, typical design condition for coastal aquaculture siting). |
depth_limit |
Logical. Apply depth-limited wave breaking cap (Hs_max = 0.6 * depth_m). Default TRUE. |
verbose |
Logical. Default TRUE. |
Value
result with additional columns:
-
wave_hs_m: significant wave height (m) – computed or supplied -
wave_exposure_score[0,1]: 0 = fully sheltered, 1 = severely exposed -
wave_exposure_class: "Sheltered" / "Moderate" / "Exposed" / "Severe" -
wave_exposure_note: gear implication flag -
wave_source: method used ("measured", "jonswap_fetch", "depth_proxy")
Examples
# Fetch column in result
result$fetch_km <- c(2.5, 8.0, 25.0, 45.0)
result <- score_wave_exposure(result, fetch_col = "fetch_km")
# Uniform fetch for a sheltered sea loch
result <- score_wave_exposure(result, fetch_km = 3.5)
# Measured wave height column
result <- score_wave_exposure(result, wave_height_col = "hs_m")
# Depth proxy only (coarse)
result <- score_wave_exposure(result)
Partial dependence / sensitivity analysis for a scored variable
Description
Computes the marginal response curve for a single environmental variable: suitability is predicted at a grid of values for the focal variable, while all other variables are held at their observed median (or a specified background value). This reveals the shape of the scoring function as actually applied to a real dataset.
Partial dependence is a standard diagnostic for SDM publication (Elith et al. 2008; Zurell et al. 2020 ODMAP protocol). Use it to verify that predicted responses match ecological expectations before submission.
Usage
sensitivity_analysis(
predicted,
species,
variable,
n_steps = 100L,
background = NULL,
season = NULL,
verbose = TRUE
)
Arguments
predicted |
Dataframe from |
species |
Character. Species key (e.g. |
variable |
Character. Name of the focal variable to vary (e.g.
|
n_steps |
Integer. Number of evenly spaced values across the variable's biological range (default 100). |
background |
Named list. Fixed values for all other variables. If NULL
(default), uses column medians from |
season |
Character or NULL. Season to apply for seasonal variables
(e.g. |
verbose |
Logical. Default TRUE. |
Value
A dataframe with columns:
-
x: focal variable value -
suitability: predicted suitability at that value -
variable: focal variable name -
species: species key
Suitable for plotting with ggplot2 or base R plot().
References
Elith et al. (2008) J Animal Ecology 77:802-813. Zurell et al. (2020) Ecography 43:1261-1277.
Examples
result <- predict_oyster(survey, "ostrea_edulis")
# Temperature response curve
temp_pd <- sensitivity_analysis(result, "ostrea_edulis", "temperature")
plot(temp_pd$x, temp_pd$suitability, type="l",
xlab="Temperature (degrees C)", ylab="Suitability",
main="Partial dependence: temperature")
# Salinity response (summer)
sal_pd <- sensitivity_analysis(result, "ostrea_edulis", "salinity",
season = "summer")
# ggplot2 version
library(ggplot2)
ggplot(temp_pd, aes(x, suitability)) +
geom_line(colour="steelblue", linewidth=1.2) +
geom_ribbon(aes(ymin=0, ymax=suitability), alpha=0.2, fill="steelblue") +
labs(x="Temperature (degrees C)", y="Suitability [0-1]") +
theme_minimal()
Apply Gaussian kernel smoothing to suitability scores
Description
Individual CTD or ADCP casts are point measurements subject to instrument
noise, micro-scale patchiness, and tidal state variability. smooth_suitability()
applies a Gaussian distance-weighted kernel to the suitability scores, replacing
each cell's score with a weighted mean of all cells within bandwidth_m metres.
The result is a spatially coherent surface that better reflects broad habitat
quality rather than single-cast noise.
The smoothed score is computed as:
\hat{s}_i = \frac{\sum_j w_{ij} \cdot s_j}{\sum_j w_{ij}}
where w_{ij} = \exp(-d_{ij}^2 / (2\sigma^2)) and \sigma is the
Gaussian bandwidth in metres. Excluded cells (suitability = 0, excluded flag)
are down-weighted but included so that unsuitable barriers are preserved.
Usage
smooth_suitability(
result,
bandwidth_m = 500,
max_radius_m = NULL,
smooth_excluded = FALSE,
verbose = TRUE
)
Arguments
result |
Dataframe from |
bandwidth_m |
Numeric. Gaussian kernel bandwidth (1 sigma) in metres. Default 500 m. Increase for sparser surveys; decrease to preserve fine detail. |
max_radius_m |
Numeric. Maximum search radius for neighbours in metres.
Cells further than this are ignored (default |
smooth_excluded |
Logical. If FALSE (default), excluded cells do not contribute to neighbours' smoothed scores (hard barriers are preserved). |
verbose |
Logical. Print summary (default TRUE). |
Value
Input dataframe with suitability_raw (original) and
suitability (smoothed, overwrites original) columns, plus a
suitability_class column reclassified from the smoothed score.
Examples
result <- predict_oyster(survey, "ostrea_edulis")
# Smooth with 300 m bandwidth (good for dense ADCP surveys)
result_smooth <- smooth_suitability(result, bandwidth_m = 300)
# Compare raw vs smoothed
plot(result$suitability_raw, result$suitability,
xlab = "Raw", ylab = "Smoothed", pch = 20)
Spatial block cross-validation for suitability model evaluation
Description
Evaluates model performance using spatial block cross-validation (Roberts et al. 2017), which avoids inflated performance estimates that arise when nearby train and test records share similar environments.
Records are partitioned into n_blocks contiguous geographic blocks. In
each fold, one block is held out as test data and the remaining blocks
provide training data to re-score the suitability model. AUC, TSS, and
Brier score are computed per fold and summarised across folds.
This function is purely evaluative – it does not refit a new model. Instead,
it uses the spatial heterogeneity in the existing predicted suitability
surface to estimate out-of-block performance.
Usage
spatial_block_cv(
predicted,
records,
presence_col = "presence",
n_blocks = 5L,
match_radius_deg = 0.002,
seed = 42L,
plot = TRUE,
verbose = TRUE
)
Arguments
predicted |
Dataframe from |
records |
Dataframe of known presence/absence records. Must contain
|
presence_col |
Character. Name of presence/absence column (default
|
n_blocks |
Integer. Number of spatial blocks (default 5). More blocks = finer spatial resolution but smaller test sets per fold. |
match_radius_deg |
Numeric. Matching radius in degrees (default 0.002 approx. 220 m). Passed to inner matching step. |
seed |
Integer. Random seed for reproducibility (default 42). |
plot |
Logical. Reserved for future visualisation output; currently unused (default TRUE). |
verbose |
Logical. Print fold-level diagnostics (default TRUE). |
Value
A named list:
-
mean_auc: mean AUC across blocks (primary reporting metric) -
auc_sd: standard deviation of per-block AUC -
mean_tss,tss_sd: mean / SD of True Skill Statistic -
brier_mean,brier_sd: mean / SD of Brier score -
n_blocks_actual: number of blocks with >= 5 records (used in CV) -
fold_results: dataframe with per-fold metrics -
spatial_autocorr_range_deg: estimated autocorrelation range (degrees) -
spatial_bias_warning: logical, TRUE if random CV would be misleading
Interpreting results
Compare mean_auc from spatial CV with standard AUC from
validate_against_records(). A large drop (> 0.10) indicates strong
spatial autocorrelation and means the model is less transferable than
the standard validation suggested. This is common for coastal surveys
with spatially clustered records.
References
Roberts et al. (2017) Ecography 40:913-929. doi:10.1111/ecog.02881
Examples
records <- read.csv("nbn_ostrea_edulis.csv")
result <- predict_oyster(survey, "ostrea_edulis")
# Standard validation
std_val <- validate_against_records(result, records)
std_val$auc # may be optimistically high
# Spatial block CV
sp_cv <- spatial_block_cv(result, records, n_blocks = 5)
sp_cv$mean_auc # more honest estimate of transferability
# Compare -- if sp_cv$mean_auc << std_val$auc, model is spatially
# autocorrelated and transfers less well than initially apparent.
Stack multiple survey runs from the same sensor type
Description
Combines two or more dataframes from the same sensor (e.g. an ADCP run on Monday and again on Thursday) by row-binding them and re-averaging overlapping cells onto a common spatial grid. Non-overlapping cells are kept as-is.
This is the right tool when you have repeated surveys of the same area – rather than picking one run, overlapping cells are averaged across all runs which improves noise reduction. Non-overlapping cells (different survey extents) are preserved.
After stacking, pass the result into merge_sensor_data() alongside your
other sensor datasets as usual.
Usage
stack_surveys(..., spatial_res = 4L, verbose = TRUE)
Arguments
... |
Two or more dataframes of the same sensor type. Each must contain
|
spatial_res |
Integer. Decimal places for the common spatial grid
(default |
verbose |
Logical. Print a summary (default |
Value
A single spatially averaged dataframe combining all input surveys.
Examples
# Stack two ADCP runs before merging with other sensors
adcp_mon <- read_nortek_adcp("survey_monday.csv")
adcp_thu <- read_nortek_adcp("survey_thursday.csv")
adcp_all <- stack_surveys(adcp_mon, adcp_thu)
survey <- merge_sensor_data(adcp = adcp_all, bathy = bathy_df)
Look up approximate tidal range for major European survey ports
Description
Returns the mean spring tidal range (MHWS - MLWS) and typical Chart Datum
offset from Mean Sea Level for a named port. Useful for a quick sanity check
on tidal height values before applying correct_to_chart_datum().
This is a reference table only – always use official tide table predictions for actual corrections.
Usage
tidal_port_info(port)
Arguments
port |
Character. Port name (partial, case-insensitive match). |
Value
A one-row dataframe with columns port, country,
mhws_m, mlws_m, spring_range_m, cd_below_msl_m.
Examples
tidal_port_info("oban")
tidal_port_info("brest")
Update species tolerance parameters from field observations (Bayesian)
Description
Fits a Bayesian logistic regression linking AHP suitability scores to observed presence/absence, then updates the optimal-range tolerance parameters for one or more scored variables.
Default method (MAP + Laplace): finds the Maximum A Posteriori estimate
via optim() (L-BFGS-B) and approximates the posterior as a multivariate
normal using the numerical Hessian. Fast; suitable for routine use.
MCMC method: Random-Walk Metropolis-Hastings sampler for full posterior samples. Recommended when n < 50 or when credible intervals are needed for publication. May take several minutes for large parameter sets.
Posteriors are stored in a package-level cache so that repeated calls
accumulate evidence (sequential Bayesian updating). Use
save_tolerance_update() to persist between sessions.
Usage
update_species_tolerances(
records,
species,
update_vars = NULL,
presence_col = "presence",
method = c("map", "mcmc"),
n_iter = 5000L,
n_warmup = 1000L,
prior_sd_fraction = 0.2,
min_records = 20L,
verbose = TRUE
)
Arguments
records |
Dataframe of field observations. Must contain columns
|
species |
Character. Species key (e.g. |
update_vars |
Character vector. Variables to update parameters for.
Defaults to all numeric scored variables present in |
presence_col |
Character. Name of presence/absence column (default
|
method |
Character. |
n_iter |
Integer. MCMC iterations (default 5000; ignored for MAP). |
n_warmup |
Integer. MCMC warm-up / burn-in (default 1000; ignored for MAP). |
prior_sd_fraction |
Numeric. Prior SD as fraction of parameter range (default 0.20 = 20 percent). Wider = weaker / more data-driven prior. |
min_records |
Integer. Minimum matched records required (default 20). |
verbose |
Logical. Print progress and diagnostics (default TRUE). |
Value
Invisibly: a named list with elements:
-
species: species key -
method: "map" or "mcmc" -
n_records: number of matched records used -
loglik_null: log-likelihood of intercept-only model -
loglik_fit: log-likelihood at MAP/posterior mean -
mcfadden_r2: 1 - loglik_fit / loglik_null (pseudo-R^2) -
updated_params: named list of updated parameter values per variable -
posterior_sd: named list of posterior SD per parameter (from Laplace/MCMC) -
prior_params: the parameter values before updating (for comparison) -
convergence: optim() convergence code (0 = success; MAP only)
Side effect: updates the in-session cache accessed by score_locations().
What gets updated
For each variable in update_vars, the function estimates shifts in
optimal_min and optimal_max (the sweet-spot bounds), plus
acceptable_min / poor_min and acceptable_max / poor_max (shoulder
bounds, updated proportionally to maintain curve shape). Parameters are
constrained to biologically plausible ranges via box constraints in the
optimiser.
See Also
get_tolerance_posteriors(), reset_tolerance_update(),
save_tolerance_update(), load_tolerance_update()
Examples
# Field records with presence/absence + environmental measurements
records <- data.frame(
lat = c(51.5, 51.6, 51.7, 51.8, 51.4),
lon = c(-4.1, -4.2, -4.0, -4.3, -4.0),
presence = c(1, 1, 0, 0, 1),
temperature = c(16.2, 17.1, 12.3, 11.0, 15.8),
salinity = c(32, 33, 31, 30, 34),
depth = c(5, 8, 15, 20, 3)
)
fit <- update_species_tolerances(
records = records,
species = "ostrea_edulis",
update_vars = c("temperature", "salinity", "depth")
)
# See updated parameters
get_tolerance_posteriors("ostrea_edulis")
# Re-run predict_oyster -- it will automatically use updated parameters
result <- predict_oyster(survey, "ostrea_edulis")
# Save to disk for next session
save_tolerance_update("ostrea_edulis")
Validate suitability predictions against known presence/absence records
Description
Compares the suitability scores from predict_oyster() against a
presence/absence dataset to quantify how well the model discriminates
between occupied and unoccupied habitat. Useful for assessing whether the
AHP-weighted scoring produces ecologically defensible outputs before using
them for site selection.
Metrics computed:
-
AUC (Area Under the ROC Curve): probability that a random presence location scores higher than a random absence location. 0.5 = random; 1.0 = perfect discrimination.
-
Optimal threshold: Youden's J statistic (maximises sensitivity + specificity - 1). Used for the confusion-matrix-derived metrics below.
-
Sensitivity (true positive rate): fraction of presences correctly predicted above threshold.
-
Specificity (true negative rate): fraction of absences correctly predicted below threshold.
-
F1 score: harmonic mean of precision and sensitivity.
-
Brier score: mean squared error between predicted probability and observed presence/absence; 0 = perfect, 0.25 = uninformative.
-
TSS (True Skill Statistic = sensitivity + specificity - 1): values above 0.4 generally indicate useful predictive skill.
Usage
validate_against_records(
predicted,
records,
presence_col = "presence",
match_radius_deg = 0.002,
plot = TRUE,
verbose = TRUE
)
Arguments
predicted |
Dataframe from |
records |
Dataframe of known presence/absence records. Must contain:
|
presence_col |
Character. Name of the presence/absence column in
|
match_radius_deg |
Numeric. Radius in decimal degrees within which a
prediction cell is matched to a record (default |
plot |
Logical. Print an ASCII ROC curve to the console (default |
verbose |
Logical. Print full metric summary (default |
Value
A named list:
-
auc: numeric [0,1] -
optimal_threshold: numeric [0,1] – Youden's J -
sensitivity: numeric [0,1] at optimal threshold -
specificity: numeric [0,1] at optimal threshold -
f1: numeric [0,1] at optimal threshold -
tss: numeric [-1,1] at optimal threshold -
brier_score: numeric [0,1] -
n_presences: integer -
n_absences: integer -
n_unmatched: integer – records dropped due to no nearby prediction -
roc_df: dataframe with columnsthreshold,sensitivity,specificity(for custom plotting)
Examples
# Load known flat oyster records (e.g. from NBN Atlas CSV export)
records <- read.csv("nbn_ostrea_edulis.csv")
# Run prediction on same area
result <- predict_oyster(survey, "ostrea_edulis")
# Validate
val <- validate_against_records(result, records,
presence_col = "presence")
val$auc # overall discrimination power
val$tss # skill score
val$roc_df # data for custom ggplot