letsRept: An Interface to the Reptile Database

This package was developed to facilitate the processes of reptile nomenclature update based on a search for species synonyms according to the Reptile Database website (Uetz et al., 2025).

Currently, the package accesses many species information from the Reptile Database using R interface.

It is useful to people trying to harmonize reptile nomenclature in databases from different sources (IUCN, species traits database, etc), or trying to get summaries from a given higher taxa or region (e.g.: Snakes from Brazil). But it can also just print single species accounts directly in R for quick information check.

Download

The package is available on CRAN, check if the installed version is 1.1.0 or above. If not, the development version can be installed from GitHub.

To install the stable version of this package, users must run:

# Install CRAN version (check package version, must be 1.1.0 or above)
install.packages("letsRept")

#OR

# Install development version from GitHub
# install.packages("devtools")
devtools::install_github("joao-svalencar/letsRept")

library(letsRept)

List of functions and examples

NOTE: Some functions use parallel processing by default. Parallel processing may affect system performance. Adjust the number of cores according to your needs. A snippet showing how to access the number of available cores in your computer is provided along with the functions examples.

Function `reptSearch`:

Retrieves species information from The Reptile Database using a binomial species name.

# single species:
reptSearch(binomial = "Apostolepis adhara")

# If user wants to check the list of references related to species, as listed in RDB:
reptSearch(binomial = "Bothrops pauloensis", getRef=TRUE)

reptSearch() supports synonym-based queries. If the provided binomial does not match any currently valid species name, the function automatically passes the query to reptAdvancedSearch(synonym = binomial). In such cases, if the synonym can be unambiguously resolved to a valid species, the function will return the corresponding species information. Otherwise, it provides a link (which can be accessed using reptSpecies(url = link)) to a list of all species that include the queried synonym in their synonymy.

Function `reptRefs`:

Retrieves the reference list for a given species from the Reptile Database. By default, it returns both the citation and a link to the reference source (if available). If the user sets getLink = FALSE, the function returns only the plain-text references.

This function is useful to extract citation metadata for taxonomic or historical analyses.

# Basic usage
reptRefs("Boa constrictor")

# If you want only reference texts, without links:
reptRefs("Boa constrictor", getLink = FALSE)

Function `reptAdvancedSearch`:

Creates a link for a page as derived from an Advanced Search in RD (multiple species in a page):

# create multiple species link:
link_boa <- reptAdvancedSearch(genus = "Boa", exact = TRUE) #returns a link to a list of all Boa species.

link_apo <- reptAdvancedSearch(genus = "Apostolepis") #returns a link to a list of all Apostolepis species

link <- reptAdvancedSearch(higher = "snakes", location = "Brazil") #returns a link to a list of all snake species in Brazil

link <- reptAdvancedSearch(year = "2010 OR 2011 OR 2012") #returns a link to a list of all species described from 2010 to 2012

reptAdvancedSearch(synonym = "example") will return the species information directly (via reptSearch("example")) if the synonym uniquely matches a single valid species. The exact = TRUE argument in the Boa example parses the information strictly as “Boa”, avoiding to return genus like Pseudoboa, or Boaedon.

⚠️ Note:

The argument exact does not work properly for searches using logical arguments (e.g. AND/OR). If you want to force an exact match (e.g., “Boa” as a phrase) with multiple terms (e.g., “Boa OR Apostolepis”), you must manually include quotes in the input string:

 link <- reptAdvancedSearch(synonym = "\"Boa\" OR Apostolepis")

Function `reptSpecies`:

Retrieve species data from the species link created by reptAdvancedSearch. User can also copy and paste the url from a RD webpage containing the list of species derived from an advanced search. It includes higher taxa information, authors, year of description, and the species url. By default this function uses parallel processing, with default set to use half of the available cores. If user wants to use just one core, there are options for backup saving based on the checkpoint argument.

#check number of available cores:
install.packages("parallel")
parallel::detectCores()

# sample multiple species data:
#Returns higher taxa information and species url:
boa <- reptSpecies(link_boa, taxonomicInfo = TRUE, getLink = TRUE) #link from reptAdvancedSearch(genus = "Boa")

#example 2
apo <- reptSpecies(link_apo, taxonomicInfo = TRUE, getLink = TRUE) #link from reptAdvancedSearch(genus = "Apostolepis")

#Returns only species url - Faster and recommended for large datasets:
boa <- reptSpecies(link_boa, taxonomicInfo = FALSE, getLink = TRUE) #link from reptAdvancedSearch(genus = "Boa")

#Returns only species names - Faster and recommended for large datasets for just nomenclatural check:
boa <- reptSpecies(link_boa, taxonomicInfo = FALSE, getLink = FALSE) #link from reptAdvancedSearch(genus = "Boa")

#example 2
apo <- reptSpecies(link_apo, taxonomicInfo = FALSE, getLink = TRUE) #link from reptAdvancedSearch(genus = "Apostolepis")

#With checkpoint and backups (only without parallel sampling: a lot slower but safer):
path <- "path/to/save/backup_file.rds" #not to run just example 
apo <- reptSpecies(link_apo, taxonomicInfo = FALSE, getLink = TRUE, checkpoint=6, backup_file=path) #link from reptAdvancedSearch(genus = "Apostolepis")

⚠️ Note:

All console messages, warnings, and progress updates can be silenced. However, reptSpecies() issues a helpful warning if any species information fails to be retrieved, along with instructions on how to access those entries from the function output.

As many functions in letsRept relies on internet connection errors or failed species information sampling might happen with no regards on the package coding. From reptSpecies() a given warning might appear:

Warning message:
In reptSpecies(dataList = allReptilesLinks, getLink = FALSE, taxonomicInfo = TRUE,  :
  Data sampling completed with errors for the following species:
- Ablepharus himalayanus: cannot open the connection
- Acanthodactylus tristrami: cannot open the connection
- Amphisbaena leucocephala: cannot open the connection
- Amphisbaena vanzolinii: cannot open the connection
- Amyda ornata: cannot open the connection
... and 10 others

In this case, all species have been tried to be sampled once, and for some (in the example: 15 species) loading their page in R failed. This will provide a dataframe with the usual 7 columns desired when taxonomicInfo = TRUE, plus a column to report the occurence of an error, the message related to the error and the species url regardeless of if getLink = FALSE and the code necessary to extract the failed species from the output in order to retry their sampling:

failed_spp <- df[df$species %in% df$species[df$error == TRUE], c('species', 'url')]

Where df is object named in the first run of reptSpecies(). This line samples the object selecting all species that got any errors in the previous run along with their url, the current requeried entries to run reptSpecies() with argument dataList:

failed <- reptSpecies(dataList = failed_spp, taxonomicInfo = TRUE)

To merge the original df object with the ressampled species we first remove the failed species from df, then we rbind the objects and reorder the species:

df <- df[!df$species %in% df$species[allRept$error == TRUE], c(-8,-9,-10)]
dfNew <- rbind(df, failed)

dfNew <- dfNew[order(dfNew$species),]

A similar approach is possible for any of the functions tied to web scraping.

A new version of reptSpecies() allowing users to ask for automated retries is planned for a future version.

Function `reptStats``:

This function summarizes the taxonomic content of a species list, typically an object created with reptSpecies() with higher taxa information. If no object is provided it summarizes the internal dataset allReptiles.

# summary of internal database allReptiles
reptStats() 

# filter by family and return summary table (internal database)
reptStats(family = "Elapidae")
reptStats(family = "Elapidae", verbose = TRUE)

# creating a subset from RDB
link <- reptAdvancedSearch(higher="snakes", location = "Brazil")
snakes_br <- reptSpecies(link, taxonomicInfo = TRUE)

# summarizing subset
reptStats(snakes_br)
reptStats(snakes_br, family = "Viperidae", verbose = TRUE)

Function `reptSynonyms`:

Samples species synonyms either using binomial or a data frame with species names and the species link (e.g.: the result of reptSpecies(link, getLink=TRUE)). By default this function uses parallel processing, with default set to use half of the available cores.

I benchmarked the options using just a vector of species names and using a dataframe with species url. In general using the dataframe is faster, but for a small list of species using just species names should be enough.

#check number of available cores:
install.packages("parallel")
parallel::detectCores()

# sample species synonyms
boa_syn <- reptSynonyms(boa) # using data frame created, with reptSpecies(boa_link, getLink = TRUE)
Bconstrictor_syn <- reptSynonyms(x = "Boa constrictor") # using species binomial

#example 2
apo_syn <- reptSynonyms(apo)

⚠️ Note:

All console messages, warnings, and progress updates can be silenced. However, reptSynonyms() issues a helpful warning if any species information fails to be retrieved, along with instructions on how to access those entries from the original data frame.

The complex regex pattern used to sample synonyms from The Reptile Database is quite efficient, but still samples about 0.2% in an incorrect format. Most cases represent unusual nomenclature so users might not face any problems trying to match current valid names. In any case, I fixed (potentially) all unusual synonym formats in the internal dataset allSynonyms (last update: 23rd May, 2025)

Function `reptCompare`:

Compares the nomenclature of a user provided vector of species (x) with the RDB nomenclature. User may provide vector of current names sampled from RDB with reptSpecies(), otherwise the names in x will be compared with those in the internal dataset allReptiles (the dataset documentation to see the RDB version (current is May, 2025).

The function returns species that are either unmatched (“review”) or matched with the RDB list, or both.

my_species <- data.frame(species = c("Boa constrictor", "Pantherophis guttatus", "Fake species"))

reptCompare(my_species)
reptCompare(my_species, filter = "review")
reptCompare(my_species, filter = "matched")

Species that requires nomenclature review are usually queried to reptSync(), while matched species is ideally queried to reptSplitCheck with a reference date parsed to the argument pubDate.

Function `reptSync`:

Initially inspired in function aswSync from package AmphiNom (Liedtke, 2018).

This is the most recursive function of the package, using all the previous functions in order to provide the most likely updated nomenclature for the queried species. By default this function uses parallel processing, with default set to use half of the available cores.

The function is divided in two main steps. Here is how it works:

Step 1

The function queries a vector of species (e.g.: IUCN, or a regional list), check their validity through reptSearch and returns a data frame with current valid species names. When reptSearch finds a species page it assumes that is the valid name for the queried species and returns the status “up_to_date”. When reptSearch doesn’t find a species it parses the binomial to reptAdvancedSearch using the synonym filter. If reptAvancedSearch returns a link for a species page that species name is considered valid for the synonym queried and the function returns the status "updated". Otherwise, reptAvancedSearch will return a link for a page with a list of species, then the function assumes that the queried synonym could be assigned to any of those valid names and returns the status: "ambiguous". If the queried species does not return a species page nor a page for multiple species the function returns to columns "RDB" and "status" the sentence "not_found".

Step 2

Step 2 is activated only if solveAmbiguity = TRUE. When reptAvancedSearch returns a link for a page with a list of species, that link is parsed to reptSpecies which collects species names and urls and automatically parses the resulting data frame to reptSynonyms. Finally, with the result of reptSynonyms the function compares the queried species with all listed synonyms. If the queried species is actually listed as a synonym of only one of the searched species (e.g. the queried name is not a synonym, but is mentioned in the comments section), the function will return that valid name and status will be "updated". If the queried species is actually a synonym of more than one valid species, then the function will return both species names and the status will still be "ambiguous".

#check number of available cores:
install.packages("parallel")
parallel::detectCores()

# comparing synonyms:
query <- c("Vieira-Alencar authoristicus",
           "Boa atlantica",
           "Boa diviniloqua",
           "Boa imperator",
           "Boa constrictor longicauda")

reptSync(query)

#example 2:
query <- c("Vieira-Alencar authorisensis",
           "Apostolepis ambiniger",
           "Apostolepis cerradoensis",
           "Elapomorphus assimilis",
           "Apostolepis tertulianobeui",
           "Apostolepis goiasensis")

reptSync(query)

The column "RDB" shows current valid name according to The Reptile Database.

Pay special attention to the "status" column:

status:

"up_to_date" - Species name provided is the current valid name found in The Reptile Database
"updated" - Species name provided is a synonym, and the current valid name is reported unambiguously.
"ambiguous" - Species name provided is considered a synonym of more than one current valid species, likely from a split in taxonomy.
"not_found" - Species name provided in query is not a current valid name nor synonym according to The Reptile Database. This status could be derived from a typo within species name. A very rare situation where this status can pop up is when the current valid name is updated in the query but is not found in the list of synonyms.
"merge" - Multiple names in the query are now considered synonyms of a single valid species, likely from a synonymization.

⚠️ ATTENTION!⚠️

letsRept does not make authoritative taxonomic decisions. It matches input names against currently accepted names in the Reptile Database (RDB).

A name marked as "up_to_date" may still refer to a taxon that has been split, and thus may not reflect the most recent population-level taxonomy, see function reptSplitCheck below.

Function `reptSplitCheck`:

Species names in recent databases are most likely marked with the status "up_to_date". However, the function reptSync only indicates whether the queried binomial is currently valid. In some cases, a species may have undergone a taxonomic split, where part of its original populations retains the name while others have been described as new species. reptSync does not account for such cases, so it is recommended to review all "up_to_date" species for potential taxonomic splits.

To assist with this, the function reptSplitCheck queries binomial names as synonyms using reptAdvancedSearch, and checks whether any associated species were described after a user-defined date (e.g., the publication date of the dataset being used). By default this function uses parallel processing, with default set to use half of the available cores.

query <- c("Atractus dapsilis",
           "Atractus trefauti",
           "Atractus snethlageae",
           "Tantilla melanocephala",
           "Oxybelis aeneus",
           "Oxybelis rutherfordi")

reptSplitCheck(query, pubDate = 2019) # pubDate of Nogueira et al., Atlas of Brazilian Snakes
reptSplitCheck(query, pubDate = 2019, includeAll = TRUE)

Pay special attention to the "status" column:

status:

"up_to_date" – The species name provided is not a synonym of any species described after pubDate.
"check_split" – The species name provided is a synonym of at least one valid species described in or after pubDate, suggesting a possible taxonomic split.
"not_found" - This is a temporary status to detect species that are not listed as synonyms of themselves, so it does not appear in the advanced search (e.g.: to correct in RDB).

OBS: With argument includeAll set to default (FALSE), if the queried species is a synonym of only species described in the same year pubDate that are already included in the queried species list, the queried species will receive the status "up_to_date". To check every possible taxonomic split regardless of whether the new species is already present in the queried species list, change the argument value to TRUE.

Attention: The column "RDB" shows only the current valid names of species described in, or after, pubDate according to The Reptile Database. It will NOT show the queried species, but be aware that reptSplitCheck is for valid names only (e.g.: “matched”, from reptCompare). Therefore, the "check_split" status is only to highlight that users must check if the data related to the queried species could be used (e.g.: species traits) or divided (e.g.: distribution) to the "RDB" column species. In other words, in the example above, it does not mean that Tantilla melanocephala current valid name is T. selmae. It means that users must ensure that the use of the queried nomenclature will not overlook the possibility that the data, or part of it, should be assigned to T. selmae instead.

Function `reptTidySyn`:

This function was developed exclusively to improve the visualization of reptSync and reptSplitCheck outcomes. Queried species with many current valid names would break the data frame visualization in the R console. reptTidySyn stacks current valid names and improves data visualization. Moreover, the argument filter, allows users to filter the printed data frame by “status” so users can focus only in the status that they want to evaluate.

query <- c("Vieira-Alencar authorisensis",
           "Apostolepis ambiniger",
           "Apostolepis cerradoensis",
           "Elapomorphus assimilis",
           "Apostolepis tertulianobeui",
           "Apostolepis goiasensis")

df <- reptSync(query)
reptTidySyn(df)

letsRept: An Interface to the Reptile Database

Download

List of functions and examples

Function `reptSearch`:

Function `reptRefs`:

Function `reptAdvancedSearch`:

Function `reptSpecies`:

Function `reptStats``:

Function `reptSynonyms`:

Function `reptCompare`:

Function `reptSync`:

Function `reptSplitCheck`:

Function `reptTidySyn`:

Internal datasets

How to Cite

References

Author:

letsRept: An Interface to the Reptile Database

Download

List of functions and examples

Function reptSearch:

Function reptRefs:

Function reptAdvancedSearch:

Function reptSpecies:

Function `reptStats``:

Function reptSynonyms:

Function reptCompare:

Function reptSync:

Function reptSplitCheck:

Function reptTidySyn:

Internal datasets

How to Cite

References

Author:

Function `reptSearch`:

Function `reptRefs`:

Function `reptAdvancedSearch`:

Function `reptSpecies`:

Function `reptSynonyms`:

Function `reptCompare`:

Function `reptSync`:

Function `reptSplitCheck`:

Function `reptTidySyn`: