Type: | Package |
Title: | Biomonitoring and Bioassessment Calculations |
Version: | 1.2.4 |
Maintainer: | Erik W. Leppo <Erik.Leppo@tetratech.com> |
Description: | An aid for manipulating data associated with biomonitoring and bioassessment. Calculations include metric calculation, marking of excluded taxa, subsampling, and multimetric index calculation. Targeted communities are benthic macroinvertebrates, fish, periphyton, and coral. As described in the Revised Rapid Bioassessment Protocols (Barbour et al. 1999) https://archive.epa.gov/water/archive/web/html/index-14.html. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 3.5) |
Imports: | dplyr, maps, rlang, stats, tidyselect, tidyr |
Suggests: | DataExplorer, DT, ggplot2, knitr, lazyeval, readxl, reshape2, rmarkdown, testthat, shiny, shinydashboard, shinydashboardPlus, shinyjs, shinyWidgets, utils, writexl, shinyalert |
URL: | https://github.com/leppott/BioMonTools |
BugReports: | https://github.com/leppott/BioMonTools/issues |
VignetteBuilder: | knitr |
RoxygenNote: | 7.3.3 |
NeedsCompilation: | no |
Packaged: | 2025-10-06 18:17:29 UTC; Erik.Leppo |
Author: | Erik W. Leppo |
Repository: | CRAN |
Date/Publication: | 2025-10-09 12:40:02 UTC |
Taxa Observation Maps
Description
Map taxonomic observations from a data frame. Input a dataframe with SampID, TaxaID, TaxaCount, Latitude, and Longitude. Other arguments are format (jpg vs. pdf), file name prefix, and output directory. Files are saved with the prefix "map.taxa." by default.
Usage
MapTaxaObs(
df_obs,
SampID,
TaxaID,
TaxaCount,
Lat,
Long,
output_dir = NULL,
output_prefix = "maps.taxa",
output_type = "pdf",
database,
regions,
map_grp = NULL,
leg_loc = "right",
verbose = FALSE,
...
)
Arguments
df_obs |
Observation data frame |
SampID |
df_obs column name, Unique Sample identifier |
TaxaID |
df_obs column name, Unique Taxa identifier |
TaxaCount |
df_obs column name, Number of individuals for TaxonID and SampID |
Lat |
df_obs column name, Latitude |
Long |
df_obs column name, Longitude |
output_dir |
Directory to save output. Default is working directory. |
output_prefix |
Prefix to TaxaID for each file. Default = "map.taxa." |
output_type |
File format for output; jpg or pdf. |
database |
maps::map function database; world, usa, state, county |
regions |
maps::map function regions. Names pertinent to map_db. |
map_grp |
Map grouping variable from df_obs. Will generate legend and color code the points on the map. Default = NULL |
leg_loc |
Legend location text. Default = "right" Other values may not work properly. |
verbose |
Boolean value for if status messages are output to the console. Default = FALSE |
... |
Optional arguments to be passed to methods. |
Details
The user will pass arguments for maps::map function that is used for the map. For example, 'database' and 'regions'. Without these arguments no map will be created.
The map will have all points and colored points for each taxon. In addition the map will include the number of samples by taxon.
The example data is fish but can be used for benthic macroinvertebrates as well.
If use grouping variable colors are from grDevices::rainbow()
Jpg file names replace all non-alphanumeric characters with "_".
The R package maps is required for this function.
Value
Taxa maps to user defined directory as jpg or pdf.
Examples
df_obs <- data_Taxa_MA
SampID <- "estuary"
TaxaID <- "TaxaName"
TaxaCount <- "Count"
Lat <- "Latitude"
Long <- "Longitude"
output_dir <- tempdir()
output_prefix <- "maps.taxa."
output_type <- "pdf"
myDB <- "state"
myRegion <- "massachusetts"
myXlim <- c(-(73+(30/60)), -(69+(56/60)))
myYlim <- c((41+(14/60)),(42+(53/60)))
# Run function with extra arguments for map
MapTaxaObs(df_obs[1:500, ],
SampID,
TaxaID,
TaxaCount,
Lat,
Long,
output_dir,
output_prefix,
output_type,
database = "state",
regions = "massachusetts",
map_grp = "estuary",
leg_loc = "bottomleft",
xlim = myXlim,
ylim = myYlim,
verbose = FALSE)
TaxaMaster_Ben_BCG_PacNW
Description
Example data
Usage
TaxaMaster_Ben_BCG_PacNW
Format
A data frame with 684 observations on the following 20 variables.
TaxaID
a character vector
Phylum
a character vector
SubPhylum
a character vector
Class
a character vector
SubClass
a character vector
Order
a character vector
SuperFamily
a character vector
Family
a character vector
Tribe
a character vector
Genus
a character vector
SubGenus
a character vector
Species
a character vector
BCG_Attr
a character vector
NonTarget
a logical vector
Thermal_Indicator
a character vector
Long_Lived
a character vector
FFG
a character vector
Habit
a character vector
Life_Cycle
a character vector
TolVal
a numeric vector
Source
example master taxa from BCG Pacific Northwest
Assign Index_Class
Description
Assign Index_Class for based on user input fields. If use the same name of an existing field the information will be overwritten.
Multiple criteria are treated as "AND" so all must be met to be assigned to a particular Index_Class.
Internally uses 'tidyr' and 'dplyr'
If Index_Class is included in data then it is renamed Index_Class_Orig and returned in the output data frame.
Usage
assign_IndexClass(
data,
criteria,
name_indexclass = "INDEX_CLASS",
name_indexname = "INDEX_NAME",
name_siteid = "SITEID",
data_shape = "WIDE"
)
Arguments
data |
Data frame (wide format) with metric values to be evaluated. |
criteria |
Data frame of metric thresholds to check. |
name_indexclass |
Name for new Index_Class column. Default = INDEX_CLASS |
name_indexname |
Name for Index Name column. Default = INDEX_NAME |
name_siteid |
Name for Site ID column. Default = SITEID |
data_shape |
Shape of data; wide or long. Default is 'wide' |
Details
Requires use of reference file with criteria.
Value
Returns a data frame with new column added.
Examples
# Packages
library(readxl)
# EXAMPLE 1
# Create Example Data
df_data <- data.frame(SITEID = paste0("Site_", LETTERS[1:10]),
INDEX_NAME = "BCG_MariNW_Bugs500ct",
GRADIENT = round(stats::runif(10, 0.5, 1.5), 1),
ELEVATION = round(stats::runif(10, 700, 800), 1))
# Import Checks
df_criteria <- read_excel(system.file("extdata/IndexClass.xlsx",
package = "BioMonTools"),
sheet = "Index_Class")
# Run Function
df_results <- assign_IndexClass(df_data, df_criteria, "INDEX_CLASS")
# Results
df_results
Estuary taxa data
Description
A dataset with example fish taxa data and locations for mapping.
Usage
data_Taxa_MA
Format
A data frame with 2,675 observations on the following 15 variables.
estuary
a factor with levels
BOSTON HARBOR
BUZZARDS BAY
CAPE COD BAY
MASSACHUSETTS BAY
WAQUOIT BAY
CommonName
a factor with levels
ALEWIFE
AMERICAN EEL
AMERICAN LOBSTER
AMERICAN PLAICE
AMERICAN SAND LANCE
AMERICAN SHAD
ATLANTIC COD
ATLANTIC CROAKER
ATLANTIC HERRING
ATLANTIC MACKEREL
ATLANTIC MENHADEN
ATLANTIC ROCK CRAB
ATLANTIC SALMON
ATLANTIC STINGRAY
ATLANTIC STURGEON
ATLANTIC TOMCOD
BAY ANCHOVY
BAY SCALLOP
BLACK DRUM
BLACK SEA BASS
BLUE CRAB
BLUE MUSSEL
BLUEBACK HERRING
BLUEFISH
BROWN SHRIMP
BUTTERFISH
CHANNEL CATFISH
COWNOSE RAY
CUNNER
DAGGERBLADE GRASS SHRIMP
EASTERN OYSTER
FOURSPINE STICKLEBACK
GOBIES
GREEN CRAB
GREEN SEA URCHIN
GRUBBY
HADDOCK
HOGCHOKER
JONAH CRAB
KILLIFISHES
LONGHORN SCULPIN
MULLETS
MUMMICHOG
NINESPINE STICKLEBACK
NORTHERN KINGFISH
NORTHERN PIPEFISH
NORTHERN SEAROBIN
NORTHERN SHRIMP
OCEAN POUT
OYSTER TOADFISH
PINFISH
POLLOCK
QUAHOG
RAINBOW SMELT
RED DRUM
RED HAKE
ROCK GUNNEL
SCUP
SEA SCALLOP
SEVENSPINE BAY SHRIMP
SHEEPSHEAD MINNOW
SHORTHORN SCULPIN
SHORTNOSE STURGEON
SILVER HAKE
SILVERSIDES
SKATES
SMOOTH FLOUNDER
SOFTSHELL CLAM
SPINY DOGFISH
SPOT
SPOTTED SEATROUT
STRIPED BASS
SUMMER FLOUNDER
TAUTOG
THREESPINE STICKLEBACK
WEAKFISH
WHITE HAKE
WHITE PERCH
WINDOWPANE FLOUNDER
WINTER FLOUNDER
YELLOW PERCH
YELLOWTAIL FLOUNDER
LifeStage
a factor with levels
ADULTS
EGGS
JUVENILES
LARVAE
MATING
PARTURITION
SPAWNING
SalZone
a factor with levels
>25 ppt
0.5-25 ppt
Winter
a numeric vector
Spring
a numeric vector
Summer
a numeric vector
Fall
a numeric vector
All
a numeric vector
TaxaName
Taxa Names for mapping
State
a factor with levels
MA
Latitude
a numeric vector
Longitude
a numeric vector
Count
a numeric vector
PctDensity
a numeric vector
Source
example data
Benthic macroinvertebrate taxa data; MBSS
Description
A data set with example benthic macroinvertebrate data. Calculate metrics then statistics. Data from MBSS.
Usage
data_benthos_MBSS
Format
A data frame with 5,666 observations on the following 40 variables.
INDEX_NAME
a character vector
SAMPLEID
a character vector
DATE
a character vector
TAXAID
a character vector
N_TAXA
a numeric vector, count
N_GRIDS
a numeric vector, number of grids in subsample (max = 30)
EXCLUDE
a character vector, whether taxon should be excluded from taxa richness metrics
INDEX_CLASS
a character vector, index region
Phylum
a character vector
Class
a character vector
Order
a character vector
Family
a character vector
Genus
a character vector
Other_Taxa
a character vector
Tribe
a character vector
FFG
a character vector
FAM_TV
a numeric vector
Habit
a character vector
TOLVAL
a numeric vector
TOLVAL2
a numeric vector
UFC
a numeric vector
UFC_Comment
a character vector
SUBPHYLUM
a character vector
SUBCLASS
a character vector
INFRAORDER
a character vector
SUBFAMILY
a character vector
LIFE_CYCLE
a character vector
BCG_ATTR
a character vector
THERMAL_INDICATOR
a character vector
LONGLIVED
a character vector
NOTEWORTHY
a character vector
FFG2
a character vector
HABITAT
a character vector
ELEVATION_ATTR
a character vector
GRADIENT_ATTR
a character vector
WSAREA_ATTR
a character vector
HABSTRUCT
a character vector
BCG_ATTR2
a character vector
NONTARGET
a logical vector
AIRBREATHER
a logical vector
Source
example data from MBSS
Benthic macroinvertebrate taxa data; Pacific Northwest
Description
A dataset with example (demonstration only) taxa data and attributes for calculating metric values.
This dataset is an example only. DO NOT USE for any analyses.
Usage
data_benthos_PacNW
Format
A data frame with 598 observations on the following 38 variables.
INDEX_NAME
a character vector
INDEX_CLASS
a character vector
SampleID
a character vector
TaxaID
a character vector
N_TAXA
a numeric vector
Exclude
a logical vector
NonTarget
a logical vector
Phylum
a character vector
Class
a character vector
Order
a character vector
Family
a character vector
Subfamily
a character vector
Tribe
a character vector
Genus
a character vector
BCG_Attr
a numeric vector
Thermal_Indicator
a character vector
FFG
a character vector
Clinger
a character vector
LongLived
a logical vector
Noteworthy
a logical vector
Habitat
a character vector
SubPhylum
a character vector
InfraOrder
a character vector
Habit
a logical vector
Life_Cycle
a logical vector
TolVal
a logical vector
FFG2
a logical vector
TolVal2
a logical vector
UFC
a character vector
UFC_Comment
a numeric vector
SubClass
a character vector
Elevation_Attr
a character vector
Gradient_Attr
a character vector
WSArea_Attr
a character vector
HabStruct
a character vector
BCG_Attr2
a character vector
AirBreather
a logical vector
Source
example data
rarify example data
Description
A dataset with example benthic macroinvertebrate data (600 count) to be used with the rarify function. Includes 12 samples.
Usage
data_bio2rarify
Format
A data frame with 223 rows and 28 variables:
- SampleID
Sample ID
- TaxaID
unique taxonomic identifier
- N_Taxa
number of individuals in sample
Source
example data
Coral taxa data; Florida BCG.
Description
A data set with example coral data. Calculate metrics. Data from Florida BCG providers.
Usage
data_coral_bcg_metric_dev
Format
A data frame with 2138 observations on the following 25 variables.
DataSource
a character vector
SampleID
a character vector
TotTranLngth_m
a numeric vector
SampDate
a Date
TAXAID
a character vector
CommonName
a character vector
Juvenile
a logical vector
DiamMax_cm
a numeric vector
DiamPerp_cm
a numeric vector
Height_cm
a numeric vector
TotMort_pct
a numeric vector
BCG_ATTR
a character vector
Weedy
a character vector
LRBC
a logical vector
MorphConvFact
a numeric vector
Phylum
a character vector
Class
a character vector
SubClass
a character vector
Order
a character vector
Family
a character vector
Genus
a character vector
SubGenus
a character vector
Species
a character vector
INDEX_NAME
a character vector
INDEX_CLASS
a character vector
Source
example coral data from Florida BCG
Coral metric value data; Florida BCG.
Description
A data set with coral metric value data. Used to compare to metric value calculations. Data from Florida BCG providers.
Usage
data_coral_bcg_metric_qc
Format
A data frame with 100 observations on the following 19 variables.
SAMPLEID
a character vector
INDEX_NAME
a character vector
INDEX_CLASS
a character vector
transect_area_m2
a numeric vector
ncol_total
a numeric vector
lcol_total
a numeric vector
nt_total
a numeric vector
ncol_Acropora
a numeric vector
ncol_AcroOrbi_m2
a numeric vector
pcol_Acropora
a numeric vector
nt_BCG_att123
a numeric vector
nt_BCG_att1234
a numeric vector
nt_BCG_att5
a numeric vector
pt_BCG_att5
a numeric vector
LCSA3D_samp_m2
a numeric vector
LCSA3D_BCG_att1234_m2
a numeric vector
LCSA3D_LRBC_m2
a numeric vector
ncol_SmallWeedy
a numeric vector
pcol_SmallWeedy
a numeric vector
Source
example coral metric results from Florida BCG
Diatom taxa data; Indiana DEM
Description
A data set with example diatom data. Calculate metrics. Data from IDEM.
Usage
data_diatom_mmi_dev
Format
A data frame with 24797 observations on the following 38 variables.
INDEX_NAME
a character vector
INDEX_CLASS
a character vector
STATIONID
a character vector
COLLDATE
a Date
SAMPLEID
a character vector
TAXAID
a character vector
EXCLUDE
a logical vector
NONTARGET
a logical vector
N_TAXA
a numeric vector
ORDER
a character vector
FAMILY
a character vector
GENUS
a character vector
BC_USGS
a character vector
TROPHIC_USGS
a character vector
SAP_USGS
a character vector
PT_USGS
a character vector
O_USGS
a character vector
SALINITY_USGS
a character vector
BAHLS_USGS
a character vector
P_USGS
a character vector
N_USGS
a character vector
HABITAT_USGS
a character vector
N_FIXER_USGS
a character vector
MOTILITY_USGS
a character vector
SIZE_USGS
a character vector
HABIT_USGS
a character vector
MOTILE2_USGS
a character vector
TOLVAL
a numeric vector
DIATOM_ISA
a character vector
DIAT_CL
a numeric vector
POLL_TOL
a numeric vector
BEN_SES
a numeric vector
DIATAS_TP
a numeric vector
DIATAS_TN
a numeric vector
DIAT_COND
a numeric vector
DIAT_CA
a numeric vector
MOTILITY
a numeric vector
NF
a numeric vector
#'
PHYLUM
a character vector
Source
example data from IDEM
Diatom metric value data; Indiana DEM
Description
A data set with diatom metric value data. Used to compare to metric value calculations. Data from IDEM.
Usage
data_diatom_mmi_qc
Format
A data frame with 497 observations on the following 250 variables.
SAMPLEID
a character vector
INDEX_NAME
a character vector
INDEX_CLASS
a character vector
ni_total
a numeric vector
li_total
a numeric vector
nt_total
a numeric vector
nt_Achnan_Navic
a numeric vector
nt_LOW_N
a numeric vector
nt_HIGH_N
a numeric vector
nt_LOW_P
a numeric vector
nt_HIGH_P
a numeric vector
nt_BC_1
a numeric vector
nt_BC_2
a numeric vector
nt_BC_3
a numeric vector
nt_BC_4
a numeric vector
nt_BC_5
a numeric vector
nt_BC_12
a numeric vector
nt_BC_45
a numeric vector
nt_PT_1
a numeric vector
nt_PT_2
a numeric vector
nt_PT_3
a numeric vector
nt_PT_4
a numeric vector
nt_PT_5
a numeric vector
nt_PT_12
a numeric vector
nt_SALINITY_1
a numeric vector
nt_SALINITY_2
a numeric vector
nt_SALINITY_3
a numeric vector
nt_SALINITY_4
a numeric vector
nt_SALINITY_12
a numeric vector
nt_SALINITY_34
a numeric vector
nt_O_1
a numeric vector
nt_O_2
a numeric vector
nt_O_3
a numeric vector
nt_O_4
a numeric vector
nt_O_5
a numeric vector
nt_O_345
a numeric vector
nt_SESTONIC_HABIT
a numeric vector
nt_BENTHIC_HABIT
a numeric vector
nt_BAHLS_1
a numeric vector
nt_BAHLS_2
a numeric vector
nt_BAHLS_3
a numeric vector
nt_TROPHIC_1
a numeric vector
nt_TROPHIC_2
a numeric vector
nt_TROPHIC_3
a numeric vector
nt_TROPHIC_4
a numeric vector
nt_TROPHIC_5
a numeric vector
nt_TROPHIC_6
a numeric vector
nt_TROPHIC_7
a numeric vector
nt_TROPHIC_456
a numeric vector
nt_SAP_1
a numeric vector
nt_SAP_2
a numeric vector
nt_SAP_3
a numeric vector
nt_SAP_4
a numeric vector
nt_SAP_5
a numeric vector
nt_NON_N_FIXER
a numeric vector
nt_N_FIXER
a numeric vector
nt_HIGHLY_MOTILE
a numeric vector
nt_MODERATELY_MOTILE
a numeric vector
nt_NON_MOTILE
a numeric vector
nt_SLIGHTLY_MOTILE
a numeric vector
nt_WEAKLY_MOTILE
a numeric vector
nt_BIG
a numeric vector
nt_SMALL
a numeric vector
nt_MEDIUM
a numeric vector
nt_VERY_BIG
a numeric vector
nt_VERY_SMALL
a numeric vector
nt_ADNATE
a numeric vector
nt_STALKED
a numeric vector
nt_HIGHLY_MOTILE.1
a numeric vector
nt_ARAPHID
a numeric vector
nt_DIAT_CL_1
a numeric vector
nt_DIAT_CL_2
a numeric vector
nt_BEN_SES_1
a numeric vector
nt_BEN_SES_2
a numeric vector
nt_DIAT_CA_1
a numeric vector
nt_DIAT_CA_2
a numeric vector
nt_DIAT_COND_1
a numeric vector
nt_DIAT_COND_2
a numeric vector
nt_DIATAS_TN_1
a numeric vector
nt_DIATAS_TN_2
a numeric vector
nt_DIATAS_TP_1
a numeric vector
nt_DIATAS_TP_2
a numeric vector
nt_MOTILITY_1
a numeric vector
nt_MOTILITY_2
a numeric vector
nt_NF_1
a numeric vector
nt_NF_2
a numeric vector
pi_Achnan_Navic
a numeric vector
pi_HIGH_N
a numeric vector
pi_LOW_N
a numeric vector
pi_HIGH_P
a numeric vector
pi_LOW_P
a numeric vector
pi_BC_1
a numeric vector
pi_BC_2
a numeric vector
pi_BC_3
a numeric vector
pi_BC_4
a numeric vector
pi_BC_5
a numeric vector
pi_PT_1
a numeric vector
pi_PT_2
a numeric vector
pi_PT_3
a numeric vector
pi_PT_4
a numeric vector
pi_PT_5
a numeric vector
pi_PT_45
a numeric vector
pi_SALINITY_1
a numeric vector
pi_SALINITY_2
a numeric vector
pi_SALINITY_3
a numeric vector
pi_SALINITY_4
a numeric vector
pi_O_1
a numeric vector
pi_O_2
a numeric vector
pi_O_3
a numeric vector
pi_O_4
a numeric vector
pi_O_5
a numeric vector
pi_SESTONIC_HABIT
a numeric vector
pi_BENTHIC_HABIT
a numeric vector
pi_BAHLS_1
a numeric vector
pi_BAHLS_2
a numeric vector
pi_BAHLS_3
a numeric vector
pi_TROPHIC_1
a numeric vector
pi_TROPHIC_2
a numeric vector
pi_TROPHIC_3
a numeric vector
pi_TROPHIC_4
a numeric vector
pi_TROPHIC_5
a numeric vector
pi_TROPHIC_6
a numeric vector
pi_TROPHIC_7
a numeric vector
pi_SAP_1
a numeric vector
pi_SAP_2
a numeric vector
pi_SAP_3
a numeric vector
pi_SAP_4
a numeric vector
pi_SAP_5
a numeric vector
pi_NON_N_FIXER
a numeric vector
pi_N_FIXER
a numeric vector
pi_HIGHLY_MOTILE
a numeric vector
pi_MODERATELY_MOTILE
a numeric vector
pi_NON_MOTILE
a numeric vector
pi_SLIGHTLY_MOTILE
a numeric vector
pi_WEAKLY_MOTILE
a numeric vector
pi_BIG
a numeric vector
pi_SMALL
a numeric vector
pi_MEDIUM
a numeric vector
pi_VERY_BIG
a numeric vector
pi_VERY_SMALL
a numeric vector
pi_ADNATE
a numeric vector
pi_STALKED
a numeric vector
pi_HIGHLY_MOTILE.1
a numeric vector
pi_ARAPHID
a numeric vector
pi_DIAT_CL_1
a numeric vector
pi_DIAT_CL_1_ASSR
a numeric vector
pi_DIAT_CL_2
a numeric vector
pi_BEN_SES_1
a numeric vector
pi_BEN_SES_2
a numeric vector
pi_DIAT_CA_1
a numeric vector
pi_DIAT_CA_2
a numeric vector
pi_DIAT_COND_1
a numeric vector
pi_DIAT_COND_2
a numeric vector
pi_DIATAS_TN_1
a numeric vector
pi_DIATAS_TN_2
a numeric vector
pi_DIATAS_TP_1
a numeric vector
pi_DIATAS_TP_2
a numeric vector
pi_MOTILITY_1
a numeric vector
pi_MOTILITY_2
a numeric vector
pi_NF_1
a numeric vector
pi_NF_2
a numeric vector
pt_Achnan_Navic
a numeric vector
pt_HIGH_N
a numeric vector
pt_LOW_N
a numeric vector
pt_HIGH_P
a numeric vector
pt_LOW_P
a numeric vector
pt_BC_1
a numeric vector
pt_BC_2
a numeric vector
pt_BC_3
a numeric vector
pt_BC_4
a numeric vector
pt_BC_5
a numeric vector
pt_BC_12
a numeric vector
pt_BC_45
a numeric vector
pt_PT_1
a numeric vector
pt_PT_2
a numeric vector
pt_PT_3
a numeric vector
pt_PT_4
a numeric vector
pt_PT_5
a numeric vector
pt_PT_12
a numeric vector
pt_SALINITY_1
a numeric vector
pt_SALINITY_2
a numeric vector
pt_SALINITY_3
a numeric vector
pt_SALINITY_4
a numeric vector
pt_SALINITY_34
a numeric vector
pt_O_1
a numeric vector
pt_O_2
a numeric vector
pt_O_3
a numeric vector
pt_O_4
a numeric vector
pt_O_5
a numeric vector
pt_O_345
a numeric vector
pt_SESTONIC_HABIT
a numeric vector
pt_BENTHIC_HABIT
a numeric vector
pt_BAHLS_1
a numeric vector
pt_BAHLS_2
a numeric vector
pt_BAHLS_3
a numeric vector
pt_TROPHIC_1
a numeric vector
pt_TROPHIC_2
a numeric vector
pt_TROPHIC_3
a numeric vector
pt_TROPHIC_4
a numeric vector
pt_TROPHIC_5
a numeric vector
pt_TROPHIC_6
a numeric vector
pt_TROPHIC_7
a numeric vector
pt_TROPHIC_456
a numeric vector
pt_SAP_1
a numeric vector
pt_SAP_2
a numeric vector
pt_SAP_3
a numeric vector
pt_SAP_4
a numeric vector
pt_SAP_5
a numeric vector
pt_NON_N_FIXER
a numeric vector
pt_N_FIXER
a numeric vector
pt_HIGHLY_MOTILE
a numeric vector
pt_MODERATELY_MOTILE
a numeric vector
pt_NON_MOTILE
a numeric vector
pt_SLIGHTLY_MOTILE
a numeric vector
pt_WEAKLY_MOTILE
a numeric vector
pt_BIG
a numeric vector
pt_SMALL
a numeric vector
pt_MEDIUM
a numeric vector
pt_VERY_BIG
a numeric vector
pt_VERY_SMALL
a numeric vector
pt_ADNATE
a numeric vector
pt_STALKED
a numeric vector
pt_HIGHLY_MOTILE.1
a numeric vector
pt_ARAPHID
a numeric vector
pt_DIAT_CL_1
a numeric vector
pt_DIAT_CL_2
a numeric vector
pt_BEN_SES_1
a numeric vector
pt_BEN_SES_2
a numeric vector
pt_DIAT_CA_1
a numeric vector
pt_DIAT_CA_2
a numeric vector
pt_DIAT_COND_1
a numeric vector
pt_DIAT_COND_2
a numeric vector
pt_DIATAS_TN_1
a numeric vector
pt_DIATAS_TN_2
a numeric vector
pt_DIATAS_TP_1
a numeric vector
pt_DIATAS_TP_2
a numeric vector
pt_MOTILITY_1
a numeric vector
pt_MOTILITY_2
a numeric vector
pt_NF_1
a numeric vector
pt_NF_2
a numeric vector
nt_Sens_810
a numeric vector
nt_RefIndicators
a numeric vector
nt_Tol_13
a numeric vector
pi_Sens_810
a numeric vector
pi_RefIndicators
a numeric vector
pi_Tol_13
a numeric vector
pt_Sens_810
a numeric vector
pt_RefIndicators
a numeric vector
pt_Tol_13
a numeric vector
wa_POLL_TOL
a numeric vector
Source
example metric value data from IDEM
Fish data, MBSS
Description
A dataset with example fish taxa data for metric calculation.
Usage
data_fish_MBSS
Format
A data frame with 1694 observations on the following 30 variables.
SAMPLEID
a character vector
TAXAID
a character vector
N_TAXA
a numeric vector
TYPE
a character vector
TOLER
a character vector
NATIVE
a character vector
TROPHIC
a character vector
SILT
a character vector
INDEX_CLASS
a character vector
SAMP_LENGTH_M
a numeric vector
SAMP_WIDTH_M
a numeric vector
SAMP_BIOMASS
a numeric vector
INDEX_NAME
a character vector
EXCLUDE
a logical vector
BCG_ATTR
a character vector
#'
DA_MI2
a numeric vector
N_ANOMALIES
a numeric vector
FAMILY
a character vector
GENUS
a character vector
THERMAL_INDICATOR
a character vector
ELEVATION_ATTR
a character vector
GRADIENT_ATTR
a character vector
WSAREA_ATTR
a character vector
REPRODUCTION
a character vector
HABITAT
a character vector
CONNECTIVITY
a logical vector
SCC
a logical vector
HYBRID
a logical vector
BCGATTR2
a character vector
TOLVAL2
a numeric vector
Source
example data
data_metval_scmb_ibi
Description
Example data metrics
@format A data frame with 20 observations on the following 13 variables.
INDEX_NAME
a character vector
INDEX_REGION
a character vector
SampID
a character vector
nt_total
a numeric vector
nt_Mol
a numeric vector
ni_Noto
a numeric vector
pi_intol
a numeric vector
qc_nt_total
a numeric vector
qc_nt_Mol
a numeric vector
qc_ni_Noto
a numeric vector
qc_pi_intol
a numeric vector
qc_sum
a numeric vector
qc_nar
a character vector
Usage
data_metval_scmb_ibi
Format
An object of class data.frame
with 20 rows and 13 columns.
Source
example data
Metric data for metric stats for mmi development
Description
A data set with example benthic macroinvertebrate data. Calculate metrics then statistics.
Usage
data_mmi_dev
Format
A data frame with 10,574 observations on the following 34 variables.
Class
a character vector
Ref_v1
a character vector
CalVal_Class4
a character vector
Unique_ID
a character vector
BenSampID
a character vector
CollDate
a character vector
CollMeth
a character vector
TaxaID
a character vector
Individuals
a numeric vector
Exclude
a logical vector
NonTarget
a character vector
Phylum
a character vector
Benthic_MasterTaxa.Class
a character vector
Order
a character vector
Family
a character vector
Subfamily
a character vector
Tribe
a character vector
Genus
a character vector
TolVal
a character vector
FFG
a character vector
Habit
a character vector
INDEX_NAME
a character vector
SUBPHYLUM
a character vector
CLASS
a character vector
SUBCLASS
a character vector
INFRAORDER
a character vector
LIFE_CYCLE
a character vector
BCG_ATTR
a character vector
THERMAL_INDICATOR
a character vector
LONGLIVED
a character vector
NOTEWORTHY
a character vector
FFG2
a character vector
TOLVAL2
a character vector
HABITAT
a numeric vector
Source
example data
Metric data for metric stats for mmi development
Description
A data set with example benthic macroinvertebrate data. Calculate metrics then statistics.
Usage
data_mmi_dev_small
Format
A data frame with 1,374 observations on the following 34 variables.
Class
a character vector
Ref_v1
a character vector
CalVal_Class4
a character vector
Unique_ID
a character vector
BenSampID
a character vector
CollDate
a character vector
CollMeth
a character vector
TaxaID
a character vector
Individuals
a numeric vector
Exclude
a logical vector
NonTarget
a character vector
Phylum
a character vector
Benthic_MasterTaxa.Class
a character vector
Order
a character vector
Family
a character vector
Subfamily
a character vector
Tribe
a character vector
Genus
a character vector
TolVal
a character vector
FFG
a character vector
Habit
a character vector
INDEX_NAME
a character vector
SUBPHYLUM
a character vector
CLASS
a character vector
SUBCLASS
a character vector
INFRAORDER
a character vector
LIFE_CYCLE
a character vector
BCG_ATTR
a character vector
THERMAL_INDICATOR
a character vector
LONGLIVED
a character vector
NOTEWORTHY
a character vector
FFG2
a character vector
TOLVAL2
a character vector
HABITAT
a numeric vector
Source
example data
Mark "exclude" (non-distinct / non-unique / ambiguous) taxa
Description
Takes as an input data frame with Sample ID, Taxa ID, and phlogenetic name fields and returns a similar dataframe with a column for "exclude" taxa (TRUE or FALSE).
Exclude taxa are refered to by multiple names; ambiguous, non-distinct, and non-unique. The "exclude" name was chosen so as to be consistent with "non-target" taxa. That is, taxa marked as "TRUE" are treated as undesireables. Exclude taxa are those that are present in a sample when taxa of the same group are present in the same sample are identified finer level. That is, the parent is marked as exclude when child taxa are present in the same sample.
Usage
markExcluded(
df_samptax,
SampID = "SAMPLEID",
TaxaID = "TAXAID",
TaxaCount = "N_TAXA",
Exclude = "EXCLUDE",
TaxaLevels,
Exceptions = NA,
verbose = FALSE
)
Arguments
df_samptax |
Input data frame. |
SampID |
Column name in df_samptax for sample identifier. Default = "SAMPLEID". |
TaxaID |
Column name in df_samptax for organism identifier. Default = "TAXAID". |
TaxaCount |
Column name in df_samptax for organism count. Default = "N_TAXA". |
Exclude |
Column name for Exclude Taxa results in returned data frame. Default = "Exclude". |
TaxaLevels |
Column names in df_samptax that for phylogenetic names to be evaluated. Need to be in order from coarse to fine (i.e., Phylum to Species). |
Exceptions |
NA or two column data frame of synonyms or other exceptions. Default = NA Column 1 is the name used in the TaxaID column of df_samptax. Column 2 is the name used in the TaxaLevels columns of df_samptax. |
verbose |
Boolean value for if status messages are output to the console. Default = FALSE |
Details
The exclude taxa are referenced in the metric values function. These taxa are removed from the taxa richness metrics. This is because these are coarser level taxa when fine level taxa are present in the same sample.
Exceptions is a 2 column data frame of synonyms or other exceptions. Column 1 is the name used in the TaxaID column the input data frame (df_samptax). Column 2 is the name used in the TaxaLevels columns of the input data frame (df_samptax). The phylogenetic columns (TaxaLevels) will be modified from Column 2 of the Exceptions data frame to match Column 1 of the Exceptions data frame. This ensures that the algorithm for markExcluded works properly. The changes will not be stored and the original names provided in the input data frame (df_samptax) will be returned in the final result. The function example below includes a practical case.
Taxa Levels are phylogenetic names that are to be checked. They should be listed in order from course (kingdom) to fine (species). Names not appearing in the data will be skipped.
The spelling of names must be consistent (including case) for this function to produce the intended output.
Value
Returns a data frame of df_samptax with an additional column, Exclude.
Examples
# Packages
library(readxl)
library(dplyr)
library(lazyeval)
library(knitr)
# Data
df_samps_bugs <- read_excel(system.file("./extdata/Data_Benthos.xlsx",
package="BioMonTools"),
guess_max=10^6)
# Variables
SampID <- "SampleID"
TaxaID <- "TaxaID"
TaxaCount <- "N_Taxa"
Exclude <- "Exclude_New"
TaxaLevels <- c("Kingdom",
"Phylum",
"SubPhylum",
"Class",
"SubClass",
"Order",
"SubOrder",
"SuperFamily",
"Family",
"SubFamily",
"Tribe",
"Genus",
"SubGenus",
"Species",
"Variety")
# Taxa that should be treated as equivalent
Exceptions <- data.frame("TaxaID" = "Sphaeriidae",
"PhyloID" = "Pisidiidae")
# EXAMPLE 1
df_tst <- markExcluded(df_samps_bugs,
SampID = "SampleID",
TaxaID = "TaxaID",
TaxaCount = "N_Taxa",
Exclude = "Exclude_New",
TaxaLevels = TaxaLevels,
Exceptions = Exceptions)
# Compare
df_compare <- dplyr::summarise(dplyr::group_by(df_tst, SampleID),
Exclude_Import = sum(Exclude),
Exclude_R = sum(Exclude_New))
df_compare$Diff <- df_compare$Exclude_Import - df_compare$Exclude_R
#
tbl_diff <- table(df_compare$Diff)
kable(tbl_diff)
# sort
df_compare <- df_compare %>% arrange(desc(Diff))
# Number with issues
sum(abs(df_compare$Diff))
# total samples
nrow(df_compare)
# confusion matrix
tbl_results <- table(df_tst$Exclude, df_tst$Exclude_New, useNA = "ifany")
#
# Show differences
kable(tbl_results)
knitr::kable(df_compare[1:10, ])
knitr::kable(df_compare[672:678, ])
# samples with differences
samp_diff <- as.data.frame(df_compare[df_compare[,"Diff"] != 0, "SampleID"])
# results for only those with differences
df_tst_diff <- df_tst[df_tst[,"SampleID"] %in% samp_diff$SampleID, ]
# add diff field
df_tst_diff$Exclude_Diff <- df_tst_diff$Exclude - df_tst_diff$Exclude_New
# Classification Performance Metrics
class_TP <- tbl_results[2,2] # True Positive
class_FN <- tbl_results[2,1] # False Negative
class_FP <- tbl_results[1,2] # False Positive
class_TN <- tbl_results[1,1] # True Negative
class_n <- sum(tbl_results) # total
#
# sensitivity (recall); TP / (TP+FN); measure model to ID true positives
class_sens <- class_TP / (class_TP + class_FN)
# precision; TP / (TP+FP); accuracy of model positives
class_prec <- class_TP / (class_TP + class_FP)
# specifity; TN / (TN + FP); measure model to ID true negatives
class_spec <- class_TN / (class_TN + class_FP)
# overall accuracy; (TP + TN) / all cases; accuracy of all classifications
class_acc <- (class_TP + class_TN) / class_n
# F1; 2 * (class_prec*class_sens) / (class_prec+class_sens)
## balance of precision and recall
class_F1 <- 2 * (class_prec * class_sens) / (class_prec + class_sens)
#
results_names <- c("Sensitivity (Recall)",
"Precision",
"Specificity",
"Overall Accuracy",
"F1")
results_values <- c(class_sens,
class_prec,
class_spec,
class_acc,
class_F1)
#
tbl_class <- data.frame(results_names, results_values)
names(tbl_class) <- c("Performance Metrics", "Percent")
tbl_class$Percent <- round(tbl_class$Percent * 100, 2)
kable(tbl_class)
#~~~~~~~~~~~~~~~~~~~~~~~~~~
# EXAMPLE 2
## No Exceptions
df_tst2 <- markExcluded(df_samps_bugs,
SampID = "SampleID",
TaxaID = "TaxaID",
TaxaCount = "N_Taxa",
Exclude = "Exclude_New",
TaxaLevels = TaxaLevels,
Exceptions = NA)
# Compare
df_compare2 <- dplyr::summarise(dplyr::group_by(df_tst2, SampleID),
Exclude_Import = sum(Exclude),
Exclude_R = sum(Exclude_New))
df_compare2$Diff <- df_compare2$Exclude_Import - df_compare2$Exclude_R
#
tbl_diff2 <- table(df_compare2$Diff)
kable(tbl_diff2)
# sort
df_compare2 <- df_compare2 %>% arrange(desc(Diff))
# Number with issues
sum(abs(df_compare2$Diff))
# total samples
nrow(df_compare2)
# confusion matrix
tbl_results2 <- table(df_tst2$Exclude, df_tst2$Exclude_New, useNA = "ifany")
#
# Show differences
kable(tbl_results2)
knitr::kable(df_compare2[1:10, ])
knitr::kable(tail(df_compare2))
# samples with differences
(samp_diff2 <- as.data.frame(df_compare2[df_compare2[, "Diff"] != 0,
"SampleID"]))
# results for only those with differences
df_tst_diff2 <- filter(df_tst2, SampleID %in% samp_diff2$SampleID)
# add diff field
df_tst_diff2$Exclude_Diff <- df_tst_diff2$Exclude - df_tst_diff2$Exclude_New
# Classification Performance Metrics
class_TP2 <- tbl_results2[2,2] # True Positive
class_FN2 <- tbl_results2[2,1] # False Negative
class_FP2 <- tbl_results2[1,2] # False Positive
class_TN2 <- tbl_results2[1,1] # True Negative
class_n2 <- sum(tbl_results2) # total
#
# sensitivity (recall); TP / (TP+FN); measure model to ID true positives
class_sens2 <- class_TP2 / (class_TP2 + class_FN2)
# precision; TP / (TP+FP); accuracy of model positives
class_prec2 <- class_TP2 / (class_TP2 + class_FP2)
# specifity; TN / (TN + FP); measure model to ID true negatives
class_spec2 <- class_TN2 / (class_TN2 + class_FP2)
# overall accuracy; (TP + TN) / all cases; accuracy of all classifications
class_acc2 <- (class_TP2 + class_TN2) / class_n2
# F1; 2 * (class_prec*class_sens) / (class_prec+class_sens)
## balance of precision and recall
class_F12 <- 2 * (class_prec2 * class_sens2) / (class_prec2 + class_sens2)
#
results_names2 <- c("Sensitivity (Recall)",
"Precision",
"Specificity",
"Overall Accuracy",
"F1")
results_values2 <- c(class_sens2,
class_prec2,
class_spec2,
class_acc2,
class_F12)
#
tbl_class2 <- data.frame(results_names2, results_values2)
names(tbl_class2) <- c("Performance Metrics", "Percent")
tbl_class2$Percent <- round(tbl_class2$Percent * 100, 2)
kable(tbl_class2)
Score metrics
Description
This function calculates metric scores based on a Thresholds data frame. Can generate scores for categories n=3 (e.g., 1/3/5, ScoreRegime="Cat_135") or n=4 (e.g., 0/2/4/6, ScoreRegime="Cat_0246") or continuous (e.g., 0-100, ScoreRegime="Cont_0100").
Usage
metric.scores(
DF_Metrics,
col_MetricNames,
col_IndexName,
col_IndexClass,
DF_Thresh_Metric,
DF_Thresh_Index,
col_ni_total = "ni_total",
col_IndexRegion = NULL
)
Arguments
DF_Metrics |
Data frame of metric values (as columns), Index Name, and Index Region (strata). |
col_MetricNames |
Names of columns of metric values. |
col_IndexName |
Name of column with index (e.g., MBSS.2005.Bugs) |
col_IndexClass |
Name of column with relevant bioregion or site class (e.g., COASTAL). |
DF_Thresh_Metric |
Data frame of Scoring Thresholds for metrics (INDEX_NAME, INDEX_CLASS, METRIC_NAME, Direction, Thresh_Lo, Thresh_Mid, Thresh_Hi, ScoreRegime , SingleValue_Add, NormDist_Tail_Lo, NormDist_Tail_Hi, CatGrad_xvar , CatGrad_InfPt, CatGrad_Lo_m, CatGrad_Lo_b, CatGrad_Mid_m, CatGrad_Mid_b , CatGrad_Hi_m, CatGrad_Hi_b). |
DF_Thresh_Index |
Data frame of Scoring Thresholds for indices (INDEX_NAME, INDEX_CLASS,METRIC_NAME, ScoreRegime, Thresh01, Thresh02 , Thresh03, Thresh04, Thresh05, Thresh06, Thresh07 , Nar01, Nar02, Nar03, Nar04, Nar05, Nar06). |
col_ni_total |
Name of column with total number of individuals. Used for cases where sample was collected but no organisms collected. Default = ni_total.#' |
col_IndexRegion |
Name of column with relevant bioregion or site class (e.g., COASTAL). Default = NULL. DEPRECATED |
Details
The R library dplyr is needed for this function.
For all ScoreRegime cases at the index level a "sum_Index" field is computed that is the sum of all metric scores. Valid "ScoreRegime" values are:
* SUM = all metric scores added together.
* AVERAGE = all metric scores added and divided by the number of metrics. The index is on the same scale as the individual metric scores.
* AVERAGE_100 = AVERAGE is scaled 0 to 100.
FIX, 2024-01-29, v1.0.0.9060 Rename col_IndexRegion to col_IndexClass Add col_IndexRegion as variable at end to avoid breaking existing code Later remove it as an input variable but add code in the function to accept
Value
vector of scores
Examples
# Example data
library(readxl)
library(reshape2)
# Thresholds
fn_thresh <- file.path(system.file(package = "BioMonTools"),
"extdata",
"MetricScoring.xlsx")
df_thresh_metric <- read_excel(fn_thresh, sheet = "metric.scoring")
df_thresh_index <- read_excel(fn_thresh, sheet = "index.scoring")
#~~~~~~~~~~~~~~~~~~~~~~~~
# Pacific Northwest, BCG Level 1 Indicator Taxa Index
df_samps_bugs <- read_excel(system.file("extdata/Data_Benthos.xlsx"
, package = "BioMonTools")
, guess_max = 10^6)
myIndex <- "BCG_PacNW_L1"
df_samps_bugs$Index_Name <- myIndex
df_samps_bugs$Index_Class <- "ALL"
(myMetrics.Bugs <- unique(
as.data.frame(df_thresh_metric)[df_thresh_metric[,
"INDEX_NAME"] == myIndex, "METRIC_NAME"]))
# Run Function
df_metric_values_bugs <- metric.values(df_samps_bugs,
"bugs",
fun.MetricNames = myMetrics.Bugs)
# index to BCG.PacNW.L1
df_metric_values_bugs$INDEX_NAME <- myIndex
df_metric_values_bugs$INDEX_CLASS <- "ALL"
# SCORE Metrics
df_metric_scores_bugs <- metric.scores(df_metric_values_bugs,
myMetrics.Bugs,
"INDEX_NAME",
"INDEX_CLASS",
df_thresh_metric,
df_thresh_index)
# QC, table
table(df_metric_scores_bugs$Index, df_metric_scores_bugs$Index_Nar)
# QC, plot
hist(df_metric_scores_bugs$Index,
main = "PacNW BCG Example Data",
xlab = "Level 1 Indicator Taxa Index Score")
abline(v = c(21,30), col = "blue")
text(21 + c(-2, +2), 200, c("Low", "Medium"), col = "blue")
Calculate metric statistics
Description
This function calculates metric statistics for use with developing a multi-metric index.
Inputs are a data frame with
Usage
metric.stats(
fun.DF,
col_metrics,
col_SampID = "SAMPLEID",
col_RefStatus = "Ref_Status",
RefStatus_Ref = "Ref",
RefStatus_Str = "Str",
RefStatus_Oth = "Oth",
col_DataType = "Data_Type",
DataType_Cal = "Cal",
DataType_Ver = "Ver",
col_Subset = NULL,
Subset_Value = NULL
)
Arguments
fun.DF |
Data frame. |
col_metrics |
Column names for metrics. |
col_SampID |
Column name for unique sample identifier. Default = "SAMPLEID". |
col_RefStatus |
Column name for Reference Status. Default = "Ref_Status" |
RefStatus_Ref |
Reference Status name for Reference used in col_ RefStatus. Default = “Ref”. Use NULL if you don't use this value. |
RefStatus_Str |
Reference Status name for Stressed used in col_ RefStatus. Default = “Str”. Use NULL if you don't use this value. |
RefStatus_Oth |
Reference Status name for Other used in col_ RefStatus. Default = “Oth”. Use NULL if you don't use this value. |
col_DataType |
Column name for Data Type – Validation vs. Calibration. Default = "Data_Type" |
DataType_Cal |
Datatype name for Calibration used in col_DataType. Default = “Cal”. Use NULL if you don't use this value. |
DataType_Ver |
Datatype name for Verification used in col_DataType. Default = “Ver”. Use NULL if you don't use this value. |
col_Subset |
Column name to subset the data and run on each subset. Default = NULL. If NULL then no subset will be generated. |
Subset_Value |
Subset name to be used for creating subset. Default = NULL. |
Details
Summary statistics for the data are calculated.
The data is filtered by the column Subset for only a single value given by the user. If need further subsets re-run the function. If no subset is given the entire data set is used.
Statistics will be generated for up to 6 combinations for RefStatus (Ref, Oth, Str) and DataType (Cal, Ver).
The resulting dataframe will have the statistics in columns with the first 4 columns as: INDEX_CLASS (if col_Subset not provided), col_RefStatus, col_DataType, and Metric_Name.
The following statistics are generated with na.rm = TRUE.
* n = number
* min = minimum
* max = maximum
* mean = mean
* median = median
* range = range (max - min)
* sd = standard deviation
* cv = coefficient of variation (sd/mean)
* q05 = quantile, 5
* q10 = quantile, 10
* q25 = quantile, 25
* q50 = quantile, 50
* q75 = quantile, 75
* q90 = quantile, 90
* q95 = quantile, 95
Value
data frame of metrics (rows) and statistics (columns). This is in long format with columns for INDEX_CLASS, RefStatus, and DataType.
Examples
# data, benthos
df_bugs <- data_mmi_dev_small
# Munge Names
names(df_bugs)[names(df_bugs) %in% "BenSampID"] <- "SAMPLEID"
names(df_bugs)[names(df_bugs) %in% "TaxaID"] <- "TAXAID"
names(df_bugs)[names(df_bugs) %in% "Individuals"] <- "N_TAXA"
names(df_bugs)[names(df_bugs) %in% "Exclude"] <- "EXCLUDE"
names(df_bugs)[names(df_bugs) %in% "Class"] <- "INDEX_CLASS"
names(df_bugs)[names(df_bugs) %in% "Unique_ID"] <- "SITEID"
# Add Missing Columns
df_bugs$ELEVATION_ATTR <- NA_character_
df_bugs$GRADIENT_ATTR <- NA_character_
df_bugs$WSAREA_ATTR <- NA_character_
df_bugs$HABSTRUCT <- NA_character_
df_bugs$BCG_ATTR2 <- NA_character_
df_bugs$AIRBREATHER <- NA
df_bugs$UFC <- NA_real_
# Calc Metrics
cols_keep <- c("Ref_v1",
"CalVal_Class4",
"SITEID",
"CollDate",
"CollMeth")
# INDEX_NAME and INDEX_CLASS kept by default
df_metval <- metric.values(df_bugs, "bugs", fun.cols2keep = cols_keep)
# Calc Stats
col_metrics <- names(df_metval)[9:ncol(df_metval)]
col_SampID <- "SAMPLEID"
col_RefStatus <- "REF_V1"
RefStatus_Ref <- "Ref"
RefStatus_Str <- "Strs"
RefStatus_Oth <- "Other"
col_DataType <- "CALVAL_CLASS4"
DataType_Cal <- "cal"
DataType_Ver <- "verif"
col_Subset <- "INDEX_CLASS"
Subset_Value <- "CentralHills"
df_stats <- metric.stats(df_metval,
col_metrics,
col_SampID,
col_RefStatus,
RefStatus_Ref,
RefStatus_Str,
RefStatus_Oth,
col_DataType,
DataType_Cal,
DataType_Ver,
col_Subset,
Subset_Value)
# Save Results
write.table(df_stats,
file.path(tempdir(), "metric.stats.tsv"),
col.names = TRUE,
row.names = FALSE,
sep = "\t")
Secondary metric statistics
Description
This function calculates secondary statistics (DE and z-score) on metric statistics for use with developing a multi-metric index.
Usage
metric.stats2(
data_metval,
data_metstat,
col_metval_RefStatus = "RefStatus",
col_metval_DataType = "DataType",
col_metval_Subset = "INDEX_CLASS",
col_metstat_RefStatus = "RefStatus",
col_metstat_DataType = "DataType",
col_metstat_Subset = "INDEX_CLASS",
RefStatus_Ref = "Ref",
RefStatus_Str = "Str",
RefStatus_Oth = "Oth",
DataType_Cal = "Cal",
DataType_Ver = "Ver",
Subset_Value = NULL
)
Arguments
data_metval |
Data frame of metric values. |
data_metstat |
Data frame of metric statistics |
col_metval_RefStatus |
Column name for Reference Status. Default = "Ref_Status" |
col_metval_DataType |
Column name for Data Type – Validation vs. Calibration. Default = "Data_Type" |
col_metval_Subset |
Column name for INDEX_CLASS in data_metstats. Default = INDEX_CLASS |
col_metstat_RefStatus |
Column name for Reference Status. Default = "Ref_Status" |
col_metstat_DataType |
Column name for Data Type – Validation vs. Calibration. Default = "Data_Type" |
col_metstat_Subset |
Column name for INDEX_CLASS in data_metstats. Default = xx. |
RefStatus_Ref |
RefStatus value for Reference. Default = "Ref" |
RefStatus_Str |
RefStatus value for Stressed. Default = "Str" |
RefStatus_Oth |
RefStatus value for Other. Default = "Oth" |
DataType_Cal |
DataType value for Calibration. Default = "Cal" |
DataType_Ver |
DataType value for Verification. Default = "Ver" |
Subset_Value |
Subset value of INDEX_CLASS (site class). Default = NULL |
Details
Secondary metrics statistics for the data are calculated.
Inputs are metric values and metric stats outputs.
Metric values is a wide format with columns for each metric. Assumes only a single Subset.
Metrics stats is a wide format with columns for each statistic with metrics in a single column. Assumes only a single Subset.
Required fields are RefStatus, DataType, and INDEX_CLASS. The user is allowed to enter their own values for these fields for each input file.
The two statistics calculated are z-score and discrimination efficiency (DE) for each metric within each DataType (cal / val).
Z-scores are calculated using the calibration (or development) data set for a given INDEX_CLASS (or Site Class).
* (mean Ref - mean Str) / sd Ref
DE is calculated without knowing the expected direction of response for each metric for a given INDEX_CLASS (or Site Class). DE is the percentage (0-100) of **stressed** samples that fall **below** the **25th** quantile (for decreaser metrics, e.g., total taxa) or **above** the **75th** quantile (for increaser metrics, e.g., HBI) of the **reference** samples.
A data frame of the metric.stats input is returned with new columns (z_score, DE25 and DE75). The z-score is added for each Ref_Status. DE25 and DE75 are only added where Ref_Status is labeled as Stressed.
Value
A data frame of the metric.stats input is returned with new columns (z_score, DE25 and DE75).
Examples
# data, benthos
df_bugs <- data_mmi_dev_small
# Munge Names
names(df_bugs)[names(df_bugs) %in% "BenSampID"] <- "SAMPLEID"
names(df_bugs)[names(df_bugs) %in% "TaxaID"] <- "TAXAID"
names(df_bugs)[names(df_bugs) %in% "Individuals"] <- "N_TAXA"
names(df_bugs)[names(df_bugs) %in% "Exclude"] <- "EXCLUDE"
names(df_bugs)[names(df_bugs) %in% "Class"] <- "INDEX_CLASS"
names(df_bugs)[names(df_bugs) %in% "Unique_ID"] <- "SITEID"
# Add Missing Columns
df_bugs$ELEVATION_ATTR <- NA_character_
df_bugs$GRADIENT_ATTR <- NA_character_
df_bugs$WSAREA_ATTR <- NA_character_
df_bugs$HABSTRUCT <- NA_character_
df_bugs$BCG_ATTR2 <- NA_character_
df_bugs$AIRBREATHER <- NA
df_bugs$UFC <- NA_real_
# Calc Metrics
cols_keep <- c("Ref_v1",
"CalVal_Class4",
"SITEID",
"CollDate",
"CollMeth")
# INDEX_NAME and INDEX_CLASS kept by default
df_metval <- metric.values(df_bugs, "bugs", fun.cols2keep = cols_keep)
# Calc Stats
col_metrics <- names(df_metval)[9:ncol(df_metval)]
col_SampID <- "SAMPLEID"
col_RefStatus <- "REF_V1"
RefStatus_Ref <- "Ref"
RefStatus_Str <- "Strs"
RefStatus_Oth <- "Other"
col_DataType <- "CALVAL_CLASS4"
DataType_Cal <- "cal"
DataType_Ver <- "verif"
col_Subset <- "INDEX_CLASS"
Subset_Value <- "CentralHills"
df_stats <- metric.stats(df_metval,
col_metrics,
col_SampID,
col_RefStatus,
RefStatus_Ref,
RefStatus_Str,
RefStatus_Oth,
col_DataType,
DataType_Cal,
DataType_Ver,
col_Subset,
Subset_Value)
# Calc Stats2 (z-scores and DE)
data_metval <- df_metval
data_metstat <- df_stats
col_metval_RefStatus <- "REF_V1"
col_metval_DataType <- "CALVAL_CLASS4"
col_metval_Subset <- "INDEX_CLASS"
col_metstat_RefStatus <- "REF_V1"
col_metstat_DataType <- "CALVAL_CLASS4"
col_metstat_Subset <- "INDEX_CLASS"
RefStatus_Ref <- "Ref"
RefStatus_Str <- "Strs"
RefStatus_Oth <- "Other"
DataType_Cal <- "cal"
DataType_Ver <- "verif"
Subset_Value <- "CentralHills"
df_stats2 <- metric.stats2(data_metval,
data_metstat,
col_metval_RefStatus,
col_metval_DataType,
col_metval_Subset,
col_metstat_RefStatus,
col_metstat_DataType,
col_metstat_Subset,
RefStatus_Ref,
RefStatus_Str,
RefStatus_Oth,
DataType_Cal,
DataType_Ver,
Subset_Value)
# Save Results
write.table(df_stats2,
file.path(tempdir(), "metric.stats2.tsv"),
col.names = TRUE,
row.names = FALSE,
sep = "\t")
Calculate metric values
Description
This function calculates metric values for bugs, fish, algae , and coral. Inputs are a data frame with SampleID and taxa with phylogenetic and autecological information (see below for required fields by community). The dplyr package is used to generate the metric values.
Usage
metric.values(
fun.DF,
fun.Community,
fun.MetricNames = NULL,
boo.Adjust = FALSE,
fun.cols2keep = NULL,
boo.marine = FALSE,
boo.Shiny = FALSE,
verbose = FALSE,
metric_subset = NULL,
taxaid_dni = NULL
)
Arguments
fun.DF |
Data frame of taxa (list required fields) |
fun.Community |
Community name for which to calculate metric values (bugs, fish, algae, or coral) |
fun.MetricNames |
Optional vector of metric names to be returned. If none are supplied then all will be returned. Default=NULL |
boo.Adjust |
Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics. |
fun.cols2keep |
Column names of fun.DF to retain in the output. Uses column names. |
boo.marine |
Should estuary/marine metrics be included. Ignored if fun.MetricNames is not null. Default = FALSE. |
boo.Shiny |
Boolean value for if the function is accessed via Shiny. Default = FALSE. |
verbose |
Include messages to track progress. Default = FALSE |
metric_subset |
Subset of metrics to be generated. Internal function. Default = NULL |
taxaid_dni |
Taxa names to be included in DNI (Do Not Include) metrics (n = 3) but dropped for all other metrics. Only for benthic metrics. Default = NULL |
Details
All percent metric results are 0-100.
No manipulations of the taxa are performed by this routine. All benthic macroinvertebrate taxa should be identified to the appropriate operational taxonomic unit (OTU).
Any non-count taxa should be identified in the "Exclude" field as "TRUE". These taxa will be excluded from taxa richness metrics (but will count for all others).
Any non-target taxa should be identified in the "NonTarget" field as "TRUE". Non-target taxa are those that are not part of your intended #' capture list; e.g., fish, herps, water column taxa, or water surface taxa in a benthic sample. The target list will vary by program. The non-target taxa will be removed prior to any calculations.
Excluded taxa are ambiguous taxa (on a sample basis), i.e., the parent taxa when child taxa are present. For example, the parent taxa Chironomidae would be excluded when the child taxa Tanytarsini is present. Both would be excluded when Tanytarsus is present. The markExcluded function can be used to populated this field.
There are a number of required fields (see below) for metric to calculation. If any fields are missing the user will be prompted as to which are missing and if the user wants to continue or quit. If the user continues the missing fields will be added but will be filled with zero or NA (as appropriate). Any metrics based on the missing fields will not be valid.
A future update may turn these fields into function parameters. This would allow the user to tweak the function inputs to match their data rather than having to update their data to match the function.
Required fields, all communities:
* SAMPLEID (character or number, must be unique)
* TAXAID (character or number, must be unique)
* N_TAXA
* INDEX_NAME
* INDEX_CLASS (BCG or MMI site category; e.g., for BCG PacNW valid values are "hi" or "lo")
Additional Required fields, bugs:
* EXCLUDE (valid values are TRUE and FALSE)
* NONTARGET (valid values are TRUE and FALSE)
* PHYLUM, SUBPHYLUM, CLASS, SUBCLASS, INFRAORDER, ORDER, FAMILY, SUBFAMILY, TRIBE, GENUS
* FFG, HABIT, LIFE_CYCLE, TOLVAL, BCG_ATTR, THERMAL_INDICATOR, FFG2, TOLVAL2, LONGLIVED, NOTEWORTHY, HABITAT, UFC, ELEVATION_ATTR, GRADIENT_ATTR, WSAREA_ATTR, HABSTRUCT
Additional Required fields, fish:
* N_ANOMALIES
* SAMP_BIOMASS (biomass total for sample, funciton uses max in case entered for all taxa in sample)
* NATIVE: NATIVE or other text values
* DA_MI2, SAMP_WIDTH_M, SAMP_LENGTH_M, , TYPE, TOLER, TROPHIC, SILT, FAMILY, GENUS, HYBRID, BCG_ATTR, THERMAL_INDICATOR, ELEVATION_ATTR, GRADIENT_ATTR, WSAREA_ATTR, REPRODUCTION, HABITAT, CONNECTIVITY, SCC
Additional Required fields, algae:
* EXCLUDE, NONTARGET, PHYLUM, ORDER, FAMILY, GENUS, BC_USGS, TROPHIC_USGS, SAP_USGS, PT_USGS, O_USGS, SALINITY_USGS, BAHLS_USGS, P_USGS, N_USGS, HABITAT_USGS, N_FIXER_USGS, MOTILITY_USGS, SIZE_USGS, HABIT_USGS, MOTILE2_USGS, TOLVAL, DIATOM_ISA, DIAT_CL, POLL_TOL, BEN_SES, DIATAS_TP, DIATAS_TN, DIAT_COND, DIAT_CA, MOTILITY, NF
Valid values for fields:
* FFG: CG, CF, PR, SC, SH
* HABIT: BU, CB, CN, SP, SW
* LIFE_CYCLE: UNI, SEMI, MULTI
* THERMAL_INDICATOR: STENOC, COLD, COOL, WARM, STENOW, EURYTHERMAL , COWA, NA
* LONGLIVED: TRUE, FALSE
* NOTEWORTHY: TRUE, FALSE
* HABITAT: BRAC, DEPO, GENE, HEAD, RHEO, RIVE, SPEC, UNKN
* UFC: integers 1:6 (taxonomic uncertainty frequency class)
* ELEVATION_ATTR: LOW, HIGH
* GRADIENT_ATTR: LOW, MOD, HIGH
* WSAREA_ATTR: SMALL, MEDIUM, LARGE, XLARGE
* REPRODUCTION: BROADCASTER, SIMPLE NEST, COMPLEX NEST, BEARER, MIGRATORY
* CONNECTIVITY: TRUE, FALSE
* SCC (Species of Conservation Concern): TRUE, FALSE
'Columns to keep' are additional fields in the input file that the user wants retained in the output. Fields need to be those that are unique per sample and not associated with the taxa. For example, the fields used in qc.check(); Area_mi2, SurfaceArea, Density_m2, and Density_ft2.
If fun.MetricNames is provided only those metrics will be returned in the provided order. This variable can be used to sort the metrics per the user's preferences. By default the metric names will be returned in the groupings that were used for calculation.
The fields TOLVAL2 and FFG2 are provided to allow the user to calculate metrics based on alternative scenarios. For example, including both HBI and NCBI where the NCBI uses a different set of tolerance values (TOLVAL2).
If TAXAID is 'NONE' and N_TAXA is '0' then metrics **will** be calculated with that record. Other values for TAXAID with N_TAXA = 0 will be removed before calculations.
For 'Oligochete' metrics either Class or Subclass is required for calculation.
The parameter boo.Shiny can be set to TRUE when accessing this function in Shiny. Normally the QC check for required fields is interactive. Setting boo.Shiny to TRUE will always continue. The default is FALSE.
The parameter 'taxaid_dni' denotes taxa to be included in Do Not Include (DNI) metrics but dropped from all other metrics. Only for benthic metrics.
Breaking change from 0.5 to 0.6 with change from Index_Name to Index_Class.
Value
data frame of SampleID and metric values
Examples
# Example 1, data already in R
df_metval <- metric.values(BioMonTools::data_benthos_PacNW,
"bugs")
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Example 2, specific metrics or metrics in a specific order
## reuse df_samps_bugs from above
# metric names to keep (in this order)
myMetrics <- c("ni_total",
"nt_EPT",
"nt_Ephem",
"pi_tv_intol",
"pi_Ephem",
"nt_ffg_scrap",
"pi_habit_climb")
# Run Function
df_metval_myMetrics <- metric.values(BioMonTools::data_benthos_PacNW,
"bugs",
fun.MetricNames = myMetrics)
Calculate metric values, Algae
Description
Subfunction of metric.values for use with Algae.
Usage
metric.values.algae(
myDF,
MetricNames = NULL,
boo.Adjust = FALSE,
cols2keep = NULL,
MetricSort = NA,
boo.Shiny = FALSE,
verbose
)
Arguments
myDF |
Data frame of taxa. |
MetricNames |
Optional vector of metric names to be returned. |
boo.Adjust |
Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics. |
cols2keep |
Column names of fun.DF to retain in the output. Uses column names. |
MetricSort |
How metric names should be sort; NA = as is , AZ = alphabetical. Default = NULL. |
boo.Shiny |
Boolean value for if the function is accessed via Shiny. Default = FALSE. |
verbose |
Include messages to track progress. Default = FALSE |
Details
For internal use only. Called from metric.values().
Value
Data frame
Calculate metric values, Bugs
Description
Subfunction of metric.values for use with Benthic Macroinvertebrates
Usage
metric.values.bugs(
myDF,
MetricNames = NULL,
boo.Adjust = FALSE,
cols2keep = NULL,
MetricSort = NA,
boo.marine = FALSE,
boo.Shiny,
verbose,
metric_subset,
taxaid_dni = NULL
)
Arguments
myDF |
Data frame of taxa. |
MetricNames |
Optional vector of metric names to be returned. |
boo.Adjust |
Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics. |
cols2keep |
Column names of fun.DF to retain in the output. Uses column names. |
MetricSort |
How metric names should be sort; NA = as is , AZ = alphabetical. Default = NULL. |
boo.marine |
Should estuary/marine metrics be included. Ignored if fun.MetricNames is not null. Default = FALSE. |
boo.Shiny |
Boolean value for if the function is accessed via Shiny. Default = FALSE. |
verbose |
Include messages to track progress. Default = FALSE |
metric_subset |
Subset of metrics to be generated. Internal function. Default = NULL |
taxaid_dni |
Taxa names to be included in DNI (Do Not Include) metrics (n = 3) but dropped for all other metrics. Only for benthic metrics. Default = NULL |
Details
For internal use only. Called from metric.values().
Value
Data frame
Calculate metric values, coral
Description
Subfunction of metric.values for use with coral
Usage
metric.values.coral(
myDF,
MetricNames = NULL,
boo.Adjust = FALSE,
cols2keep = NULL,
MetricSort = NA,
boo.Shiny = FALSE,
verbose
)
Arguments
myDF |
Data frame of taxa. |
MetricNames |
Optional vector of metric names to be returned. |
boo.Adjust |
Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics. |
cols2keep |
Column names of fun.DF to retain in the output. Uses column names. |
MetricSort |
How metric names should be sort; NA = as is , AZ = alphabetical. Default = NULL. |
boo.Shiny |
Boolean value for if the function is accessed via Shiny. Default = FALSE. |
verbose |
Include messages to track progress. Default = FALSE |
Details
For internal use only. Called from metric.values().
Value
Data frame
Calculate metric values, Fish
Description
Subfunction of metric.values for use with Fish.
Usage
metric.values.fish(
myDF,
MetricNames = NULL,
boo.Adjust = FALSE,
cols2keep = NULL,
boo.Shiny,
verbose
)
Arguments
myDF |
Data frame of taxa. |
MetricNames |
Optional vector of metric names to be returned. |
boo.Adjust |
Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics. |
cols2keep |
Column names of fun.DF to retain in the output. Uses column names. |
boo.Shiny |
Boolean value for if the function is accessed via Shiny. Default = FALSE. |
verbose |
Include messages to track progress. Default = FALSE |
Details
For internal use only. Called from metric.values().
Value
Data frame
Metric values Groups to Excel
Description
The output of metric.values() is saved to Excel with different groups of metrics on different worksheets.
Usage
metvalgrpxl(
fun.DF.MetVal,
fun.DF.xlMetNames = NULL,
fun.Community,
fun.MetVal.Col2Keep = c("SAMPLEID", "INDEX_NAME", "INDEX_CLASS"),
fun.xlGrpCol = "Sort_Group",
file.out = NULL
)
Arguments
fun.DF.MetVal |
Data frame of metric values. |
fun.DF.xlMetNames |
Data frame of metric names and groups. Default (NULL) will use the verion of MetricNames.xlsx that is in the BioMonTools package. |
fun.Community |
Community name of calculated metric values (bugs, fish, or algae) |
fun.MetVal.Col2Keep |
Column names in metric values to keep. Default = c("SAMPLEID", "INDEX_NAME", "INDEX_CLASS") |
fun.xlGrpCol |
Column name from Excel metric names to use for Groupings. Default = Sort_Group |
file.out |
Output file name. Default (NULL) will generate a file name based on the data and time (e.g., MetricValuesGroups_bugs_20220201.xlsx) |
Details
This function will save the output of metric.values() into groups by worksheet as defined by the user.
The Excel file MetricNames.xlsx provided in the extdata folder has a column named 'Groups' that can be used as default groupings. If no groupings are provided (the default) all metrics are saved to a single worksheet. Within each group the 'sort_order' is used to sort the metrics. If this column is blank then the metrics are sorted in the order they appear in the output from metric.values() (i.e., in fun.DF).
The MetricNames data frame must include the following fields:
* Metric_Name
* Community
* Sort_Group (user defined)
Value
Saves Excel file with metrics grouped by worksheet
Examples
# Example 1, bugs
## Community
comm <- "bugs"
## Calculate Metrics
df_metval <- metric.values(BioMonTools::data_benthos_PacNW, comm)
## Metric Names and Groups
df_metnames <- readxl::read_excel(system.file("extdata/MetricNames.xlsx",
package="BioMonTools"),
guess_max = 10^6,
sheet = "MetricMetadata",
skip = 4)
## Columns to Keep
col2keep <- c("SAMPLEID", "INDEX_NAME", "INDEX_CLASS")
## Grouping Column
col_Grp <- "Sort_Group"
## File Name
file_out <- file.path(tempdir(), paste0("MetValGrps_", comm, ".xlsx"))
## Run Function
metvalgrpxl(df_metval, df_metnames, comm, col2keep, col_Grp, file_out)
QC checks on metric values
Description
Apply "QC checks" on calculated metrics and station/sample attributes to "flag" samples for the user. Examples include watershed size or total number of individuals. Can have checks for both high and low values. Checks are stored in separate file. For structure see df.checks in example.
Usage
qc.checks(df.metrics, df.checks, input.shape = "wide")
Arguments
df.metrics |
Wide data frame with metric values to be evaluated. |
df.checks |
Data frame of metric thresholds to check. |
input.shape |
Shape of df.metrics; wide or long. Default is wide. |
Details
used reshape2 package
Value
Returns a data frame of SampleID checks and results; Pass and Fail.
Examples
library(readxl)
# Calculate Metrics
df.samps.bugs <- read_excel(system.file("./extdata/Data_Benthos.xlsx",
package="BioMonTools"),
guess_max = 10^6)
# Columns to keep
myCols <- c("Area_mi2", "SurfaceArea", "Density_m2", "Density_ft2")
# Run Function
myDF <- df.samps.bugs
df.metric.values.bugs <- metric.values(myDF, "bugs", fun.cols2keep = myCols)
# Import Checks
df.checks <- read_excel(system.file("./extdata/MetricFlags.xlsx",
package="BioMonTools"),
sheet="Flags")
# Run Function
df.flags <- qc.checks(df.metric.values.bugs, df.checks)
# Summarize Results
table(df.flags[, "CHECKNAME"], df.flags[, "FLAG"], useNA = "ifany")
Quality Control Check on User Data Against Master Taxa List
Description
This function compares the user's data frame to a data frame with the official (or user supplied) master taxa list (benthic macroinvertebrates).
Usage
qc_taxa(
DF_User,
DF_Official = NULL,
fun.Community = NULL,
useOfficialTaxaInfo = "only_Official"
)
Arguments
DF_User |
User taxa data. |
DF_Official |
Official master taxa list. Can be a local file or from a URL. Default is NULL. A NULL value will use the official online files. |
fun.Community |
Community name for which to compare the master taxa list (bugs or fish). |
useOfficialTaxaInfo |
Select how to handle new/different taxa. See 'Details' for more information. Valid values are "only_Official", "only_user", "add_new". Default = "only_Official". |
Details
Output is a data frame with matches.
Messages are output to the console with the number of matches and which user taxa did not match the official list.
The official list is stored online but the user can input their own saved copy.
Any columns in the user input file that match the official master taxa list will be renamed with the "_NonOfficial" suffix.
New/different taxa in the user data are handled by the 'useOfficialTaxaInfo' parameter. For taxa that did not match the master taxa list the user has options on how to handle the differences for the phylogeny (e.g., columns for phylum, class, family, etc.) and autecology (e.g., columns for FFG, habit, tolerance value, etc.). The options are below.
* only_official = use only official master taxa information. Any non-matching taxa will not have any master taxa information.
* only_user = only use the information provided by the user. Information from the 'Official' will not be used. This should only be used for non-official calculations.
* add_new = hybrid approach that uses official master taxa information, when present, but includes user information for non-matching taxa if the column names match.
Default master taxa lists are saved as CSV files online at:
https://github.com/leppott/MBSStools_SupportFiles
The files can be downloaded with the following code.
**Benthic Macroinvertebrate**
url_mt_bugs <- "https://github.com/leppott/MBSStools_SupportFiles/raw/master/Data/CHAR_Bugs.csv" df_mt_bugs <- read.csv(url_mt_bugs)
The master taxa files are periodically updated. Update dates will be logged on the GitHub repository.
Expected fields include:
**Benthic Macroinvertebrates**
+ TAXON, Phylum, Class, Order, Family, Genus, Other_Taxa, Tribe, FFG, FAM_TV, Habit, FinalTolVal07, Comment
Value
input data frame with master taxa information added to it.
Examples
# Example 1, Master Taxa List, Bugs
url_mt_bugs <- "https://github.com/leppott/MBSStools_SupportFiles/raw/master/Data/CHAR_Bugs.csv"
df_mt_bugs <- read.csv(url_mt_bugs)
# User data
DF_User <- data_benthos_MBSS
DF_Official <- NULL # NULL df_mt_bugs
fun.Community <- "bugs"
useOfficialTaxaInfo <- "only_Official"
# modify taxa id column
DF_User[, "TAXON"] <- DF_User[, "TAXAID"]
df_qc_taxa_bugs <- qc_taxa(DF_User,
DF_Official,
fun.Community,
useOfficialTaxaInfo)
# QC input/output
dim(DF_User)
dim(df_qc_taxa_bugs)
names(DF_User)
names(df_qc_taxa_bugs)
Rarify (subsample) biological sample to fixed count
Description
Takes as an input a 3 column data frame (SampleID, TaxonID , Count) and returns a similar dataframe with revised Counts.
The other inputs are subsample size (target number of organisms in each sample) and seed. The seed is given so the results can be reproduced from the same input file. If no seed is given a random seed is used.
Usage
rarify(inbug, sample.ID, abund, subsiz, mySeed = NA, verbose = FALSE)
Arguments
inbug |
Input data frame. Needs 3 columns (SampleID, taxonomicID , Count). |
sample.ID |
Column name in inbug for sample identifier. |
abund |
Column name in inbug for organism count. |
subsiz |
Target subsample size for each sample. |
mySeed |
Seed for random number generator. If provided the results with the same inbug file will produce the same results. Default = NA (random seed will be used.) |
verbose |
Boolean value for if status messages are output to the console. Default = FALSE |
Details
rarify function: R function to rarify (subsample) a macroinvertebrate sample down to a fixed count; by John Van Sickle, USEPA. email: VanSickle.John@epa.gov ; Version 1.0, 06/10/05;
Value
Returns a data frame with the same three columns but the abund field has been modified so the total count for each sample is no longer above the target (subsiz).
Examples
# Subsample to 500 organisms (from over 500 organisms) for 12 samples.
# load bio data
df_biodata <- data_bio2rarify
dim(df_biodata)
# subsample
mySize <- 500
Seed_OR <- 18590214
Seed_WA <- 18891111
Seed_US <- 17760704
bugs_mysize <- rarify(inbug = df_biodata,
sample.ID = "SampleID",
abund = "N_Taxa",
subsiz = mySize,
mySeed = Seed_US,
verbose = FALSE)
# view results
dim(bugs_mysize)
# Compare pre- and post- subsample counts
df_compare <- merge(df_biodata,
bugs_mysize,
by = c("SampleID", "TaxaID"),
suffixes = c("_Orig","_500"))
df_compare <- df_compare[, c("SampleID",
"TaxaID",
"N_Taxa_Orig",
"N_Taxa_500")]
# compare totals
tbl_totals <- aggregate(cbind(N_Taxa_Orig, N_Taxa_500) ~ SampleID,
df_compare,
sum)
# save the data
write.table(bugs_mysize,
file.path(tempdir(), paste("bugs", mySize, "txt", sep = ".")),
sep = "\t")
Taxa Translate
Description
Convert user taxa names to those in an official project based name list.
Usage
taxa_translate(
df_user = NULL,
df_official = NULL,
df_official_metadata = NULL,
taxaid_user = "TAXAID",
taxaid_official_match = NULL,
taxaid_official_project = NULL,
taxaid_drop = NULL,
col_drop = NULL,
sum_n_taxa_boo = FALSE,
sum_n_taxa_col = NULL,
sum_n_taxa_group_by = NULL,
trim_ws = FALSE,
match_caps = FALSE
)
Arguments
df_user |
User taxa data |
df_official |
Official project taxa data (master taxa list). |
df_official_metadata |
Metadata for official project taxa data. Default is NULL |
taxaid_user |
Taxonomic identifier in user data. Default is "TAXAID". |
taxaid_official_match |
Taxonomic identifier in official data user to match with user data. This is not the project taxanomic identifier. |
taxaid_official_project |
Taxonomic identifier in official data that is specific to a project, e.g., after operational taxonomic unit (OTU) applied. |
taxaid_drop |
Official taxonomic identifier that signals a record should be dropped; e.g., DNI (Do Not Include) or -999. Default = NULL |
col_drop |
Columns to remove in output. Default = NULL |
sum_n_taxa_boo |
Boolean value for if the results should be summarized Default = FALSE DEPRECATED, values will be ignored |
sum_n_taxa_col |
Column name for number of individuals for user data when summarizing. This column will be summed. Default = NULL (suggestion = N_TAXA) DEPRECATED, values will be ignored |
sum_n_taxa_group_by |
Column names for user data to use for grouping the data when summarizing the user data. Suggestions are SAMPID and TAXA_ID. Default = NULL DEPRECATED, values will be ignored |
trim_ws |
Boolean value for taxaid to have leading and trailing white space removed. Non-braking spaces (e.g., from ITIS) also removed (including inside text). Default = FALSE |
match_caps |
Boolean value to match user and official TaxaIDs after converting to ALL CAPS. Default = FALSE |
Details
Merges user file with official file. The official file has phylogeny, autecology, and other project specific fields.
The inputs for the function uses existing data frames (or tibbles).
Any fields that match between the user file and the official file the official data column name have the 'official' version retained.
The 'col_drop' parameter can be used to remove unwanted columns; e.g., the other taxa id fields in the 'official' data file.
By default, taxa are not collapsed to the official taxaid. That is, if multiple taxa in a sample have the same name the rows will not be combined. If collapsing is desired set the parameter 'sum_n_taxa_boo' to TRUE. Will also need to provide 'sum_n_taxa_col' and 'sum_n_taxa_group_by'. This feature was DEPRECATED in v1.0.2.9040 (2024-06-12). The parameters will remain and could be reinstituted in a future version.
Slightly different than 'qc_taxa' since no options in 'taxa_translate' for using one field over another and is more generic.
The parameter 'taxaid_drop' is used to drop records that matched to a new name that should not be included in the results. Examples include "999" or "DNI" (Do Not Include). Default is NULL so no action is taken. "NA"s are always removed.
Optional parameter 'trim_ws' is used to invoke the function 'trimws' to remove from the taxa matching field any leading and trailing white space. Default is FALSE (no action). All horizontal and vertical white space characters are removed. See ?trimws for additional information. Additionally, non-breaking spaces (nbsp) inside the text string will be replaced with a normal space. This cuts down on the number of permutations need to be added to the translation table.
Optional parameter 'match_caps' is used to convert user and official taxaid values to ALL CAPS before matching. Any non-ascii characters will cause this to fail. A message is output to the console for any taxaid values that contain non-ascii characters. In the event that 'match_caps' is set to TRUE and non-ascii characters are present the matching will be done without converting to upper case as this would cause the function to fail.
The taxa list and metadata file names will be added to the results as two new columns.
Another output is the unique taxa with old and new names.
Value
A list with four elements. The first (merge) is the user data frame with additional columns from the official data appended to it. Names from the user data that overlap with the official data have the suffix '_User'. The second element (nonmatch) of the list is a vector of the non-matching taxa from the user data. The third element (metadata) includes the metadata for the official data (if provided). The fourth element (unique) is a data frame of the unique taxa names old and new.
Examples
# Example 1, PacNW
## Input Parameters
df_user <- BioMonTools::data_benthos_PacNW
fn_official <- file.path(system.file("extdata", package = "BioMonTools"),
"taxa_official",
"ORWA_TAXATRANSLATOR_20221219b.csv")
df_official <- read.csv(fn_official)
fn_official_metadata <- file.path(system.file("extdata",
package = "BioMonTools"),
"taxa_official",
"ORWA_ATTRIBUTES_METADATA_20221117.csv")
df_official_metadata <- read.csv(fn_official_metadata)
taxaid_user <- "TaxaID"
taxaid_official_match <- "Taxon_orig"
taxaid_official_project <- "OTU_MTTI"
taxaid_drop <- "DNI"
col_drop <- c("Taxon_v2", "OTU_BCG_MariNW") # non desired ID cols in Official
sum_n_taxa_boo <- TRUE
sum_n_taxa_col <- "N_TAXA"
sum_n_taxa_group_by <- c("INDEX_NAME", "INDEX_CLASS", "SampleID", "TaxaID")
## Run Function
taxatrans <- taxa_translate(df_user,
df_official,
df_official_metadata,
taxaid_user,
taxaid_official_match,
taxaid_official_project,
taxaid_drop,
col_drop,
sum_n_taxa_boo,
sum_n_taxa_col,
sum_n_taxa_group_by)
## View Results
taxatrans$nonmatch
#~~~~~
# Example 2, Multiple Stages
# Create data
TAXAID <- c(rep("Agapetus", 3), rep("Zavrelimyia", 2))
N_TAXA <- c(rep(33, 3), rep(50, 2))
STAGE <- c("A", "L", "P", "X", "")
df_user <- data.frame(TAXAID, N_TAXA, STAGE)
df_user[, "INDEX_NAME"] <- "BCG_MariNW_Bugs500ct"
df_user[, "INDEX_CLASS"] <- "HiGrad-HiElev"
df_user[, "SAMPLEID"] <- "Test2023"
df_user[, "STATIONID"] <- "Test"
df_user[, "DATE"] <- "2023-01-16"
## Input Parameters
fn_official <- file.path(system.file("extdata", package = "BioMonTools"),
"taxa_official",
"ORWA_TAXATRANSLATOR_20221219b.csv")
df_official <- read.csv(fn_official)
fn_official_metadata <- file.path(system.file("extdata",
package = "BioMonTools"),
"taxa_official",
"ORWA_ATTRIBUTES_20221212.csv")
df_official_metadata <- read.csv(fn_official_metadata)
taxaid_user <- "TAXAID"
taxaid_official_match <- "Taxon_orig"
taxaid_official_project <- "OTU_BCG_MariNW"
taxaid_drop <- NULL
col_drop <- c("Taxon_v2", "OTU_MTTI") # non desired ID cols in Official
sum_n_taxa_boo <- TRUE
sum_n_taxa_col <- "N_TAXA"
sum_n_taxa_group_by <- c("INDEX_NAME", "INDEX_CLASS", "SAMPLEID", "TAXAID")
## Run Function
taxatrans <- taxa_translate(df_user,
df_official,
df_official_metadata,
taxaid_user,
taxaid_official_match,
taxaid_official_project,
taxaid_drop,
col_drop,
sum_n_taxa_boo,
sum_n_taxa_col,
sum_n_taxa_group_by)
## View Results (before and after)
df_user
taxatrans$merge