---
title: "Complete survey workflow: Example Bay"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Complete survey workflow: Example Bay}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.width = 7,
  fig.height = 4,
  eval      = FALSE
)
```

## Overview

This vignette walks through a complete oystermapR workflow using a simulated
survey of **Example Bay** — a fictional sheltered coastal inlet with realistic
physical characteristics. The full pipeline is covered:

1. Loading acoustic Doppler (ADCP) current data
2. Loading bathymetric soundings
3. Loading CTD water-quality data (temperature, salinity, chlorophyll_a)
4. Merging all sensor datasets onto a common grid
5. Running quality control
6. Predicting suitability for *Ostrea edulis* (European Flat Oyster)
7. Scoring wave exposure and HAB risk
8. Exporting a GeoTIFF for QGIS

The example data files (`example_bay_adcp.csv` and `example_bay_soundings.xyz`)
are included in the package and represent a simulated mid-summer survey.

```{r library}
library(oystermapR)
```

---

## Survey context

Example Bay is a sheltered inlet approximately 6 km × 4 km in extent, with
depths ranging from 2 m at the shoreline to around 22 m in the central channel.
A six-hour ADCP transect was run on a flood tide during June. Bathymetric
soundings were collected concurrently using a single-beam echosounder.

The target species is *Ostrea edulis*, the European Flat Oyster, whose optimal
conditions include: depths of 1–20 m, current speeds of 0.05–0.4 m/s,
temperatures of 5–25°C, and salinities of 25–35 PSU.

---

## Step 1: Load ADCP data

`read_nortek_adcp()` reads the Nortek Signature 500 merged CSV export. It
auto-detects velocity bins, spatially averages ensembles onto a grid, and
derives bed shear stress from the near-bed velocity profile. The `spatial_res`
argument controls the decimal places used for lat/lon grid binning; `2` gives
approximately 1 km cells, suitable for a bay-scale survey.

```{r load-adcp}
adcp_file <- system.file("extdata", "example_bay_adcp.csv", package = "oystermapR")

adcp <- read_nortek_adcp(
  file        = adcp_file,
  spatial_res = 2,
  verbose     = TRUE
)

head(adcp[, c("lat", "lon", "current_velocity", "shear_stress")])
```

The output contains one row per grid cell. `current_velocity` is the mean
near-bed flow magnitude; `shear_stress` is the estimated bed shear stress
in N/m², used by `predict_oyster()` for suspension-feeder scoring.

---

## Step 2: Load bathymetric soundings

`read_soundings_xyz()` reads a space-delimited XYZ point cloud, grids and
averages the depths, and derives slope and rugosity via finite differences.

```{r load-bathy}
xyz_file <- system.file("extdata", "example_bay_soundings.xyz", package = "oystermapR")

bathy <- read_soundings_xyz(
  file          = xyz_file,
  spatial_res   = 2,
  min_soundings = 5,
  verbose       = TRUE
)

head(bathy[, c("lat", "lon", "depth", "slope", "roughness")])
```

`slope` is the maximum downslope gradient in degrees at each grid cell; values
above ~15° typically exclude oyster settlement in the scoring model. `roughness`
is a dimensionless rugosity index (1.0 = flat; higher = more complex substrate).

---

## Step 3: Load CTD water-quality data

The ADCP and echosounder provide hydrodynamic and bathymetric variables, but
`predict_oyster()` can also use water-quality data when available. The Example
Bay survey included a 5 × 4 grid of CTD casts recording temperature, salinity,
and chlorophyll_a, stored as a plain CSV.

`read_generic_csv()` handles any tabular sensor export with flexible column
matching. Column names are recognised automatically when they follow standard
conventions (`lat`/`lon`, `temperature`, `salinity`, `chlorophyll_a`). For
instruments that export non-standard headers, supply an explicit `col_map`:

```r
# Example with non-standard headers from a YSI EXO sonde export:
ctd <- read_generic_csv(
  "ysi_export.csv",
  col_map = c(lat = "GPS_Lat", lon = "GPS_Lon",
              temperature = "Temp_C", salinity = "Sal_PSU")
)
```

For Example Bay the column names match the oystermapR conventions directly:

```{r load-ctd}
ctd_file <- system.file("extdata", "example_bay_ctd.csv", package = "oystermapR")

ctd <- read_generic_csv(
  file        = ctd_file,
  spatial_res = 2,
  verbose     = TRUE
)

head(ctd[, c("lat", "lon", "temperature", "salinity", "chlorophyll_a")])
```

Spatially averaging at `spatial_res = 2` collapses the four replicate casts
per station to a single value per ~1 km grid cell.

---

## Step 4: Merge all sensor datasets

`merge_sensor_data()` performs a full outer join of any number of sensor
dataframes on rounded lat/lon grid keys. Cells present in only one source are
retained with `NA` for the missing variables, so no data is discarded. All
three layers — ADCP, bathymetry, and CTD — are combined in a single call:

```{r merge}
survey <- merge_sensor_data(adcp = adcp, bathy = bathy, ctd = ctd)

cat("Merged survey:", nrow(survey), "grid cells\n")
cat("Columns:", paste(names(survey), collapse = ", "), "\n")
```

A `substrate_hardness` column would typically come from a sidescan mosaic
processed via `read_sonar_tif()`. Here we add a representative value for a
mixed shell-gravel substrate, typical of productive inshore oyster ground.

```{r add-substrate}
set.seed(101)
n <- nrow(survey)
# 0 = soft mud, 1 = hard rock; 0.3--0.7 = shell/gravel mix
survey$substrate_hardness <- round(runif(n, 0.30, 0.70), 2)
```

---

## Step 5: Quality control

`qc_survey_data()` applies three complementary checks to every numeric column:

- **Range check** — values outside physically plausible bounds for that
  variable (species-agnostic; e.g. salinity above 42 PSU) are flagged
  `"range_fail"`.
- **Statistical outlier** — values beyond `iqr_k` × IQR from the median are
  flagged `"outlier"` (default k = 3, Tukey's outer fence).
- **Cross-variable sanity** — physically implausible combinations (e.g.
  dissolved oxygen above 20 mg/L with temperature below 5°C) are flagged.

Setting `apply_flags = TRUE` replaces flagged values with `NA` so they are
silently skipped downstream rather than causing erroneous scores.

```{r qc}
survey_clean <- qc_survey_data(
  df          = survey,
  apply_flags = TRUE,
  verbose     = TRUE
)

# Count any flags raised across all columns
flag_cols <- grep("^qc_flag_", names(survey_clean), value = TRUE)
n_flagged <- sum(sapply(survey_clean[flag_cols], function(x) sum(!is.na(x) & x != "pass")))
cat("Total flagged values replaced with NA:", n_flagged, "\n")
```

The QC step is non-destructive by default (`apply_flags = FALSE`) — it adds
`qc_flag_<variable>` columns so you can inspect which cells were problematic
before deciding whether to exclude them.

---

## Step 6: Predict suitability

`predict_oyster()` applies AHP-weighted scoring rules to each available
variable, combines them into a suitability score in [0, 1], and classifies
locations as High / Moderate / Low / Very Low / Excluded.

```{r predict, warning = FALSE}
result <- predict_oyster(
  data    = survey_clean,
  species = "ostrea_edulis",
  verbose = TRUE
)

# Summary of suitability classes
table(result$suitability_class)
```

```{r suitability-summary}
# Mean score and range
cat(sprintf(
  "Suitability: mean = %.2f, range = %.2f -- %.2f\n",
  mean(result$suitability, na.rm = TRUE),
  min(result$suitability,  na.rm = TRUE),
  max(result$suitability,  na.rm = TRUE)
))
```

The result dataframe retains all input columns plus `suitability`,
`suitability_class`, and per-variable component scores (`score_depth`,
`score_current_velocity`, etc.).

---

## Step 7: Risk and disturbance scoring

oystermapR includes optional risk modules that can be appended to the result.
Here we add wave exposure (derived from the current speed data and fetch
geometry) and a simple HAB risk score.

```{r risk, warning = FALSE}
# Wave exposure: uses current_velocity and depth as proxies for fetch exposure
result <- score_wave_exposure(result, verbose = FALSE)

# HAB risk: without live ICES data, scores from chlorophyll_a alone
result <- score_hab_risk(result, verbose = FALSE)

cat("Wave exposure range:", round(range(result$wave_exposure, na.rm=TRUE), 3), "\n")
cat("HAB risk range:     ", round(range(result$hab_risk,      na.rm=TRUE), 3), "\n")
```

---

## Step 8: Export GeoTIFF for QGIS

`export_geotiff()` interpolates the suitability scores onto a regular raster
and writes a GeoTIFF. `export_qml_style()` writes a matching QGIS colour-ramp
style file (`.qml`) so the layer renders immediately with the standard
oystermapR colour scheme (red = excluded, green = high suitability).

```{r export, eval = FALSE}
# Write GeoTIFF and companion QGIS style file
export_geotiff(
  df         = result,
  path       = "example_bay_suitability.tif",
  resolution = 0.001,
  contours   = TRUE
)
export_qml_style("example_bay_suitability.tif")
```

Load `example_bay_suitability.tif` into QGIS via **Layer → Add Layer → Add
Raster Layer**, then right-click the layer and choose **Load Layer Style** to
apply the `.qml` file.

---

## Inspecting component scores

The result dataframe includes a `score_<variable>` column for every variable
that was scored. Comparing these helps diagnose which environmental factor is
the main limiting constraint at a site.

```{r component-scores}
score_cols <- grep("^score_", names(result), value = TRUE)
# Mean component score per variable (higher = more suitable)
col_means <- sort(colMeans(result[score_cols], na.rm = TRUE))
print(round(col_means, 3))
```

Variables scoring consistently below 0.5 are the main limiting factors for
*Ostrea edulis* at this site. Scores near 1.0 indicate that variable is not
constraining growth.

---

## Next steps

- **Validation** — if you have historical presence/absence records for the
  site, pass them to `validate_against_records()` to calculate AUC, TSS, and
  F1 against model predictions.
- **Bayesian tolerance updating** — `update_species_tolerances()` can refine
  the *O. edulis* scoring parameters from field observations at this site.
- **Multi-species comparison** — run the same workflow for `"magallana_gigas"`
  and call `compare_species()` to see which species is better suited to each
  grid cell.
- **Live data** — `fetch_live_environmental_data()` can replace the simulated
  temperature and salinity with real-time CMEMS model output or ICES
  observational data; see `?oystermapR_live_config` for credential setup.

---

## Session info

```{r session-info}
sessionInfo()
```