This is a tutorial for using the R package rcatfish.
rcatfish provides access to the California Academy of
Sciences Eschmeyer’s Catalog of Fishes within R (Eschmeyer et al., 1998,
Fricke et al., 2025, https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp).
The Catalog of Fishes database is the gold standard for fish taxonomy as
it provides thorough citations of the taxonomic history of fishes and is
updated continuosly via standard monthly releases. While there are other
packages within R that can be used for checking the taxonomy of
organisms, including fishes (i.e. rfishbase,
taxize, ritis, etc.), the databases accessed
by these packages lacks the expansive information on the taxonomic
history of fishes and are typically not up to date on the cutting edge
of fish systematics.rcatfish introduces functions to access
the various information in the California Academy of Sciences
Eschmeyer’s Catalog of Fishes. This tutorial provides a basic
introduction into using the package and its functions.
Please note when using this package that it is intended to act solely
as an interface to Eschmeyer’s Catalog of Fishes and is not affiliated
with California Academy of Sciences. As Eschmeyer’s Catalog of Fishes is
only published in a non-machine-readable format, rcatfish
is intended to parse the catalog data into a format more suitable for
analysis through web scraping. As such, rcatfish will only
be as accurate as the data in the published catalog, and spelling or
formatting errors may arise due to inconsistencies in Eschmeyer’s
Catalog of Fishes data entry.
In order to install the stable CRAN version of the
rcatfish package:
install.packages("rcatfish")
While we recommend use of the stable CRAN version of this package, we
recommend using the package devtools to temporarily install
the development version of the package from GitHub if for any reason you
wish to use it:
#1. Install 'devtools' if you do not already have it installed:
install.packages("devtools")
#2. Load the 'devtools' package and temporarily install the development version of
#'dietr' from GitHub:
library(devtools)
dev_mode(on=T)
install_github("sborstein/rcatfish") # install the package from GitHub
library(rcatfish)# load the package
#3. Leave developers mode after using the development version of 'rcatfish' so it will not remain on your systempermanently.
dev_mode(on=F)
To make https connections, the dependencies of rcatish utilize curl.
In versions of Windows 7 and up, the curl implementation in R can use
either openSSL or Windows Secure Channel
(SChannel). Only one of these options can be active at a
time and the default is Schannel, which conflicts with this
package. To see which one you have active, you can do the following:
curl::curl_version()$ssl_version
In the output you may have more than one option. The ones in
parentheses are not in use, while the ones lacking parentheses are in
use. If you have Schannel in use, you will need to add the
following line to your ~/.Renviron file to have curl use
openSSL.
CURL_SSL_BACKEND=openssl
This can be added manually or can be done directly in R by running the following line of code.
write('CURL_SSL_BACKEND=openssl', file = "~/.Renviron", append = TRUE)
After adding this line to your ~/.Renviron, re-start R.
Then check the curl_version again. Schannel
should now be in parentheses while your openSSL option
should now lack parentheses, indicating it is in use.
curl::curl_version()$ssl_version
Once installed, you can load rcatfish and all of its
functions/data:
library(rcatfish)
Upon loading, you will see a message showing you the version of the
Catalog of Fishes as well as how to properly cite the Catalog of Fishes
and this R package. In the event you need to get the version of the
Catalog of Fishes at any point, you can use the function
rcatfish_version() which has no arguments to return the
date of the version of the Catalog of Fishes being accessed.
To use the majority of functions in rcatfish you will
need the following data as inputs. First, for all functions which search
the catalog, a query is required. This is can be either a
single search term or a vector of terms to be searched in series, the
content of which will be dependent on the particular search function
being used (e.g. when searching for references, query may
be a catalog reference number).
Aside from a query, several functions require a
type as well. This is used to differentiate the search
method in functions that support multiple types (e.g. search by
type = genus or type = keyword). Similar to
query, the content of this parameter will vary based on the
function it is being used in, but unlike query it is
typically not able to be vectorized and has a specific set of options it
must match in each function that calls it.
All other parameters used in this package are optional and unique to
each individual function. Several of these will be discussed below, but
each function’s options can be seen by running
?function_name or help(function_name).
rcatfish_search)To search the Catalog of Fishes’s taxonomic records, the function
rcatfish_search should be used. This function is equivalent
to using the “Search Eschmeyer’s Catalog” tab on the catalog’s website.
While it has several parameters with default arguments, only
query and type must be specified.
As an example, a search can be performed for all available species names in the family Rhincodontidae using the following function call:
# Search CoF for Available Species Names in Rhincodontidae
rhinco_species <- rcatfish_search(query = "Rhincodontidae", type = "Species")
View(rhinco_species)
When viewing the created object rhinco_species above,
you should see a large dataframe containing species results of that
family.
type Parameter in
rcatfish_searchThe type parameter allows you to specify what type of
results you want returned and is the equivalent of changing the radio
button on the “Search Eschmeyer’s Catalog” tab between “Genera” and
“Species” (the “References” option is available using the
rcatfish_references function which will be discussed
later). It has two acceptable inputs, either “Species” or “Genus”. This
parameter is not vectorizable, so any searches should either all be by
genus or all be by species. There is no default option, so it must
always be explicitly assigned a value during function calls. To see an
example of the difference between the two options:
# Search Rhincodontidae by Species
by_species <- rcatfish_search(query = "Rhincodontidae", type = "Species")
View(by_species)
# Search Rhincodontidae by Genus
by_genus <- rcatfish_search(query = "Rhincodontidae", type = "Genus")
View(by_genus)
When viewing the outputs from above, you will notice that despite each search containing the same query, the results differ. Please always ensure you have the correct value assigned when searching the catalog.
common.name Parameter in
rcatfish_searchCurrently, Eschmeyer’s Catalog of Fishes does not include the common
names of species. rcatfish does, however, have the
capability of searching for species by common name by utilizing the
rfishbase package. To do this, you can utilize the
common.name parameter in rcatfish_search.
Please not that searching by common names can only be performed on a
species level, not by genus.
By default this parameter is set to FALSE. When
explicitly changed to TRUE, the function will first match
the query input of common names to any associated
scientific names currently found in FishBase. Note that while it is
still vectorizable in this format, you can not combine common names with
other search terms (e.g. you can search for
query = c("Humphead Wrasse", "Channel Catfish") but
searching for
query = c("Humphead Wrasse", "Lophius piscatorius") may
return unexpected results). This parameter will return a list containing
the normal rcatfish_search result dataframe and a second
dataframe showing the common names provided and the taxonomic names that
they were matched to. As an example:
# Search Catalog of Fishes by Common Name
common_name_result <- rcatfish_search(query = "Humphead wrasse", type = "Species", common.name = TRUE)
View(common_name_result) # The full list returned
View(common_name_result[[1]]) # The first dataframe in the list, the normal rcatfish_search output
View(common_name_result[[2]]) # The second dataframe in the list, the common names searched and their matches
You can search using common names from other languages if you so
desire by setting the language parameter. By default it is
set to “English”.
taxon.history Parameter in
rcatfish_searchEach entry in Eschmeyer’s Catalog of Fishes also contains a complete
history of that entry’s taxonomic status. By default, this is not
captured with rcatfish_search, although it can be obtained
by setting the taxon.history parameter to
TRUE. When this is done, an additional dataframe is
returned containing each result’s original status, current status, and
every change to its status made in between. This search can be performed
both by species and by genus. Please note that, particularly for queries
with a large number of changes in their history, searching with
taxon.history = TRUE may take considerably longer than a
typical search. As an example of what this may look like:
# Search Catalog of Fishes by Common Name
taxon_history_result <- rcatfish_search(query = "Platyrhina", type = "Genus", taxon.history = TRUE)
View(taxon_history_result) # The full list returned
View(taxon_history_result[[1]]) # The first dataframe in the list, the normal rcatfish_search output
View(taxon_history_result[[2]]) # The second dataframe in the list, the taxonomic histories of the results
rcatfish_searchSeveral other parameters exist in the rcatfish_search
function to modify minor aspects of the search function.
The verbose parameter toggles on and off the message
displayed to the user when running a search (e.g. “Now on query 1 of
100”). By default it is set to TRUE. Messages can be
disabled by changing it to FALSE.
The sleep.time query sets the length of time that the
search function will wait between requests to the Catalog of Fishes’s
server when performing a search of multiple terms. This is set to 10
seconds as requested by the catalog. This parameter should not
be modified. Changing this value may result in blacklisting by the
catalog.
rcatfish_references)To search through references in Eschmeyer’s Catalog of Fishes, the
function rcatfish_references should be used. This function
is equivalent to using the “Search Eschmeyer’s Catalog” tab on the
catalog’s website and selecting the “References” radio button. It has
the parameters query and type, both of which
must be specified.
The query parameter can search either by reference
number in the catalog of by keyword and can be passed as either a single
search term or a vector of terms. The type of search performed is
dictated by the the type parameter, which will accept
either “RefNo” to search by reference number or “keyword” to search by
keyword. Note that when searching by reference number, the query can be
passed either as an integer or as a character string (e.g. 41479 and
“41479” will return the same results).
# Search references by keyword
keyword_reference <- rcatfish_references(query = "Tunisia", type = "keyword")
# Search references by reference number
RefNo_reference <- rcatfish_references(query = 41479, type = "RefNo")
rcatfish_references can be combined with a result from
rcatfish_search to obtain all references associated with
the resulting species. As an example:
# Search the catalog for a given species
search_result <- rcatfish_search(query = "Cichla cataractae", type = "Species")
# Retrieve references from resulting search
references <- rcatfish_references(query = search_result$DescriptionRef, type = "RefNo")
Eschmeyer’s Catalog of Fishes receives monthly updates. These updates
include changes to the taxonomic status of genera and species, changes
related to authorship, and the addition of newly described taxa. Users
cans see these updates by using the rcatfish_updates
function. By default, this function takes no arguments, and will return
all changes provided by the most recent update. However, users can
specify if they want to return the catalog taxonomic changes, authorship
changes, added genera, and added.species with simple TRUE
or FALSE. For example, if we wanted to obtain all the
changes in a version of the catalog, we can do either of the
following:
updates <- rcatfish_updates()
or, we can set specific arguments to return specific update
components. These are set to TRUE by default, but users can
change these given their names.
updates <- rcatfish_updates(changes = TRUE, author.changes = TRUE, added.genera = TRUE, added.species = TRUE)
updates
We can see when running the above code that a list is returned of
changes made in the newest edition of the catalog (which is updated once
a month). This list will be of a variable length depending on which
elements the user asked to return. Other elements of the returned list
() are Changes, AuthorshipChanges,
AddedGenera, and AddedSpecies, which contain
the taxonomic changes, authorship changes, newly added genera to the
catalog, and added species to the catalog respectively.
Eschmeyer’s Catalog of Fishes provides information on the number of species and genera described per family and subfamily via a table on the following linked page (https://researcharchive.calacademy.org/research/ichthyology/catalog/SpeciesByFamily.asp). rcatfish provides access to this page as well as the ability to return species totals for higher taxonomic entities than just family and subfamily, such as orders and classes using the rcatfish_species_by function. This function simply takes a query that is a subfamily, family, class, or order that the user wishes to obtain data for. For example, if we want to return information for the family Cichlidae we can easily do so using the following:
rcatfish_species_by("Cichlidae")
We can see that this has returned a data frame containing the number of available and valid genera and species, as well as the number of genera and species described in the last decade for the family and all subfamilies in Cichlidae. However, while the Catalog of Fishes does not report these figures at higher taxonomic levels, rcatfish can. We can obtain the number of described genera and species for the order Cichliformes, with the following.
rcatfish_species_by("Cichliformes")
We can see that this has provided not just the number of genera and species in each family within the Cichliformes, but has also returned the total for the entire order, which is not reported on the Catalog of Fishes.
Eschmeyer’s Catalog of Fishes provides a hierarchical classification
of fishes organized by Class, Order, Suborder, Family, and Subfamily (https://www.calacademy.org/scientists/catalog-of-fishes-classification/).
The rcatfish function rcatfish_classification
provides access to this table. This function lacks arguments and can be
simply called as followed.
# See Current Breakdown of Fish Classification, from Class Through Subfamily
fish_classification <- rcatfish_classification()
fish_classification
The function returns a data frame that progresses from left to right from most to least inclusive. In addition to providing the hierarchy for Class, Order, Suborder, Family, and Subfamily, the authorship of these taxonomic entities as well as their common name is returned.
# See a Glossary of Terms Used in the Catalog
glossary <- rcatfish_glossary()
We can see that the glossary object made in the line of
code above creates a data frame object containing a list of technical
terms used in the catalog along with definitions and applicable
sub-terms.
Besides just citations for references used in the The Catalog of Fishes, the Catalog of Fishes also provides various information on the journals used for references (https://researcharchive.calacademy.org/research/ichthyology/catalog/journals.asp). For example, the Catalog of Fishes provides information for ISSN numbers, publishers, and comments, such as name changes for journals. Information on the journals can be accessed in rcatfish using the rcatfish_journals function. This function simply takes the argument query which is a string to search for as well as if the argument phrase which is if the query should be passed in quotes while searching as a phrase. For example, to search for journals that are related to Texas, we can do the following:
rcatfish_journals("Texas")
We can see that most of these contain Texas in the title, or information on how one, “Contributions in Marine Science” is a continuation of Publications of the Institute of Marine Science, University of Texas.
Note that passing the query as a phrase may impact the success of the search. For example, if we wanted to search for Journal of Zoology, the search will fail if we do not pass the query as a phrase as it will look for each word separately.
rcatfish_journals("Journal of Zoology")
We can successfully search for this query by invoking phrase = TRUE in the arguments:
rcatfish_journals("Journal of Zoology", phrase = TRUE)
Eschmeyer’s Catalog of Fishes provides information, such as collection abbreviations, locality, previous names, and online access for museum collections with fish holdings (https://researcharchive.calacademy.org/research/ichthyology/catalog/collections.asp). This information can be accessed through rcatfish via the rcatfish_collections function. rcatfish_collections allows users to search for collections by abbreviation, country, or query term. For example, if we knew the museum abbreviation we wanted to search for, such as the UMMZ for the University of Michigan Museum of Zoology, we could do the following by simply providing UMMZ to the abbreviation argument:
rcatfish_collections(abbreviation = "UMMZ", country = NULL, query = NULL, verbose = TRUE)
We can also pass information to more than one field. This can be useful for narrowing down collection results, such as for countries that have a lot of natural history collections. For this example, lets search for collections in the United States of America and query for collections in California and Alaska. Note that to do queries longer than 1, we must ensure that all arguments are the same length. So, in this case, we need to pass the country twice in our search as follows:
rcatfish_collections(abbreviation = NULL, country = rep("U.S.A.",2), query = c("California","Alaska"), sleep.time = 10)
We may also want to query a phrase, such as “Museum of Zoology” to
get a list of collections that contain that name across all collections
(similar to what was covered for rcatfish_journals). In
order to do more complex queries that are phrases, we need to use the
phrase = TRUE argument. We can do the following search as
such:
rcatfish_collections(query = "Museum of Zoology", phrase = TRUE)
Most of the functions in rcatfish require a stable
internet connection to run as it connects to the online Catalog of
Fishes database. If you run into problems, we recommend checking your
internet connection as well as visiting the California Academy of
Sciences Eschmeyer’s Catalog of Fishes site (https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp),
to ensure that it is not down for routine maintenance.
Should you find any entries which appear to return different data
than anticipated, check the Catalog of Fishes directly to confirm the
error and then use the issues section on Github (repo
sborstein/rcatfish). Remember that rcatfish is designed to
return exactly what is published and will capture any mistakes that are
directly in the catalog.
Please note that the authors of this package are not affiliated with Eschmeyer’s Catalog of Fishes nor the California Academy of Sciences. As such, we are not able to correct any errors that exist on the Catalog of Fishes or fix/troubleshoot any issues with the Catalog of Fishes itself.
Further information on the functions and their usage can be found in
the help files help(package=rcatfish).
For any further issues and questions send an email with subject ‘rcatfish support’ to borstein@txstate.edu or post to the issues section on GitHub.
Eschmeyer WN (1998). Catalog of Fishes California Academy of Sciences, San Francisco, California, 2905 pp.
Fricke R (2025). Eschmeyer’s Catalog of Fishes: References. https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp.
Fricke R, Eschmeyer WN (2025). Eschmeyer’s Catalog of Fishes: Guide to Fish Collections. https://researcharchive.calacademy.org/research/ichthyology/catalog/collections.asp.
Fricke R, Eschmeyer WN (2025). Eschmeyer’s Catalog of Fishes: Journals. https://researcharchive.calacademy.org/research/ichthyology/catalog/journals.asp.
Fricke R, Eschmeyer WN, Fong JD (2025). Eschmeyer’s Catalog of Fishes: Species by family/subfamily in the Catalog of Fishes. https://researcharchive.calacademy.org/research/ichthyology/catalog/SpeciesByFamily.asp.
Fricke R, Eschmeyer WN, van der Laan R (2025). Eschmeyer’s Catalog of Fishes: Genera, Species, References. https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp.
Fricke R, van der Laan R, Fong JD (2025). Eschmeyer’s Catalog of Fishes: Changes and Additions. https://researcharchive.calacademy.org/research/ichthyology/catalog/ChangeSummary.asp.
van der Laan R, Fricke R, Eschmeyer WN (2025). Eschmeyer’s Catalog of Fishes: Classification. https://www.calacademy.org/scientists/catalog-of-fishes-classification/.
van der Laan R, Fricke R, Fong J (2025). Eschmeyer’s Catalog of Fishes: Glossary. https://www.calacademy.org/scientists/catalog-of-fishes-glossary/.