| Type: | Package |
| Title: | Read Linguistic Data in the Cross Linguistic Data Format (CLDF) |
| Version: | 1.6.1 |
| Maintainer: | Simon J. Greenhill <simon@simon.net.nz> |
| Description: | Cross-Linguistic Data Format (CLDF) is a framework for storing cross-linguistic data, ensuring compatibility and ease of data exchange between different linguistic datasets see Forkel et al. (2018) <doi:10.1038/sdata.2018.205>. The 'rcldf' package is designed to facilitate the manipulation and analysis of these datasets by simplifying the loading, querying, and visualisation of CLDF datasets making it easier to conduct comparative linguistic analyses, manage language data, and apply statistical methods directly within R. |
| License: | Apache License (≥ 2.0) |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.1.0) |
| Imports: | archive, bib2df (≥ 1.1.1), csvwr, digest, dplyr, jsonlite, leaflet, logger, magrittr, purrr, readr, remotes, rlang, tools, urltools, utils, versionsort |
| Suggests: | ggplot2, patchwork, htmltools, testthat, mockthat, covr, spelling, knitr, rmarkdown, qpdf |
| URL: | https://github.com/SimonGreenhill/rcldf |
| BugReports: | https://github.com/SimonGreenhill/rcldf/issues |
| Language: | en-US |
| RoxygenNote: | 7.3.2 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-05-18 08:59:23 UTC; simon |
| Author: | Simon J. Greenhill [aut, cre] |
| Repository: | CRAN |
| Date/Publication: | 2026-05-18 09:30:02 UTC |
rcldf: Read Linguistic Data in the Cross Linguistic Data Format (CLDF)
Description
Cross-Linguistic Data Format (CLDF) is a framework for storing cross-linguistic data, ensuring compatibility and ease of data exchange between different linguistic datasets see Forkel et al. (2018) doi:10.1038/sdata.2018.205. The 'rcldf' package is designed to facilitate the manipulation and analysis of these datasets by simplifying the loading, querying, and visualisation of CLDF datasets making it easier to conduct comparative linguistic analyses, manage language data, and apply statistical methods directly within R.
Details
rcldf is a library for R to read Cross-Linguistic Data files (CLDF)
Author(s)
Maintainer: Simon J. Greenhill simon@simon.net.nz
See Also
Useful links:
Report bugs at https://github.com/SimonGreenhill/rcldf/issues
Adds a dataframe.
Description
Adds a dataframe.
Usage
add_dataframe(table, filename, group)
Arguments
table |
a metadata section from the CLDF metadata. |
filename |
the filename. |
group |
a grouping from the metadata. |
Value
A dataframe
Extracts a CLDF table as a 'wide' dataframe by resolving all foreign key links
Description
Extracts a CLDF table as a 'wide' dataframe by resolving all foreign key links
Usage
as.cldf.wide(object, table)
Arguments
object |
the |
table |
the name of the table to extract. |
Value
A tibble dataframe
Examples
md <- system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")
cldfobj <- cldf(md)
forms <- as.cldf.wide(cldfobj, 'FormTable')
Reads a Cross-Linguistic Data Format dataset into an object.
Description
Reads a Cross-Linguistic Data Format dataset into an object.
included here to match people expecting e.g. readr::read_csv etc
Usage
cldf(
mdpath,
load_bib = FALSE,
cache_dir = tools::R_user_dir("rcldf", which = "cache")
)
read_cldf(
mdpath,
load_bib = FALSE,
cache_dir = tools::R_user_dir("rcldf", which = "cache")
)
Arguments
mdpath |
the path to the directory or metadata JSON file. |
load_bib |
a boolean flag (TRUE/FALSE, default FALSE) to load the
sources.bib BibTeX file. |
cache_dir |
a directory to cache downloaded files to |
Value
A cldf object
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
Coalesce value to truthiness
Description
Determine whether the input is true, with missing values being interpreted as false.
Usage
coalesce_truth(x)
Arguments
x |
logical, |
Value
FALSE if x is anything but TRUE
Returns a table of datasets available in cldf_meta
Description
Returns a table of datasets available in cldf_meta
Usage
datasets()
Value
A dataframe of available dataset.
Map csvw datatypes to R types
Description
Translate csvw datatypes to R types. This implementation currently targets readr::cols column specifications.
Usage
datatype_to_type(datatypes)
Arguments
datatypes |
a list of csvw datatypes |
Details
rcldf adds some overrides here to add e.g. anyURI etc.
Value
a readr::cols specification - a list of collectors
Examples
cspec <- datatype_to_type(list("double", list(base="date", format="yyyy-MM-dd")))
readr::read_csv(readr::readr_example("challenge.csv"), col_types=cspec)
CSVW default dialect
Description
The CSVW Default Dialect specification described in CSV Dialect Description Format.
Usage
default_dialect
Format
An object of class list of length 13.
Value
a list specifying a default csv dialect
Create a default table schema given a csv file and dialect
Description
If neither the table nor the group have a tableSchema annotation,
then this default schema will used.
Usage
default_schema(filename, dialect = default_dialect)
Arguments
filename |
a csv file |
dialect |
specification of the csv's dialect (default: |
Value
a table schema
Returns the cache dir.
Description
Returns the cache dir.
Usage
get_cache_dir(cache_dir = NA)
Arguments
cache_dir |
a directory to use |
Value
A string of the cache dir
Returns a dataframe of with details on the CLDF dataset in path.
Description
Returns a dataframe of with details on the CLDF dataset in path.
Usage
get_details(path, cache_dir = NA)
Arguments
path |
the path to resolve |
cache_dir |
a directory to cache downloaded files to |
Value
A dataframe.
Returns the filesize in bytes of a directory.
Description
Returns the filesize in bytes of a directory.
Usage
get_dir_size(path)
Arguments
path |
a directory to size |
Value
A numeric of the file size in bytes
Get a filename from url value in metadata (handles .zip files)
Description
Get a filename from url value in metadata (handles .zip files)
Usage
get_filename(base_dir, url)
Arguments
base_dir |
the base_dir |
url |
the url statement |
Value
A string
Returns a table of the foreign keys in a CLDF dataset.
Description
Returns a table of the foreign keys in a CLDF dataset.
Usage
get_foreign_keys(cldf_obj)
Arguments
cldf_obj |
a CLDF object |
Value
a dataframe
Examples
o <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
get_foreign_keys(o)
Downloads and installs a CLDF dataset from a Zenodo endpoint
Description
Downloads and installs a CLDF dataset from a Zenodo endpoint
Usage
get_from_zenodo(zid, load_bib = FALSE, cache_dir = NULL)
Arguments
zid |
Zenodo endpoint |
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
Value
A cldf object
Identifies the separator characters specified by the CLDF metadata.
Description
Identifies the separator characters specified by the CLDF metadata.
Usage
get_separators(metadata)
Arguments
metadata |
|
Value
A dataframe with three columns (name, separator, url).
Extracts a single table from a CLDF dataset.
Description
Extracts a single table from a CLDF dataset.
Usage
get_table_from(
table,
mdpath,
cache_dir = tools::R_user_dir("rcldf", which = "cache")
)
Arguments
table |
a CLDF table type |
mdpath |
a path to a CLDF file |
cache_dir |
a directory to cache downloaded files to |
Value
a dataframe
Examples
md_json <- system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")
df <- get_table_from("LanguageTable", md_json)
Convert a CLDF URL tablename to a short tablename
Description
Convert a CLDF URL tablename to a short tablename
Usage
get_tablename(conformsto, url = NA)
Arguments
conformsto |
the dc:conforms to statement |
url |
the url statement |
Value
A string
Examples
get_tablename("http://cldf.clld.org/v1.0/terms.rdf#ValueTable")
Returns TRUE if url looks like a github URL
Description
Returns TRUE if url looks like a github URL
Usage
is_github(url)
Arguments
url |
A string |
Value
A boolean TRUE/FALSE
Examples
is_github('https://github.com/SimonGreenhill/rcldf/')
Returns TRUE if url looks like a URL
Description
Returns TRUE if url looks like a URL
Usage
is_url(url)
Arguments
url |
A string |
Value
A boolean TRUE/FALSE
Examples
is_url('http://simon.net.nz')
Returns a dataframe of directories in the cache dir
Description
Returns a dataframe of directories in the cache dir
Usage
list_cache_files(cache_dir = NULL)
Arguments
cache_dir |
the cache directory to use. If NULL then R_user_dir will be used. |
Value
A dataframe of the directories
Returns a CLDF dataset object of the latest CLTS version.
Description
Returns a CLDF dataset object of the latest CLTS version.
Usage
load_clts(load_bib = FALSE, cache_dir = NULL)
Arguments
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
Value
A cldf object
Returns a CLDF dataset object of the latest Concepticon version.
Description
Returns a CLDF dataset object of the latest Concepticon version.
Usage
load_concepticon(load_bib = FALSE, cache_dir = NULL)
Arguments
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
Value
A cldf object
Load a CLDF dataset by name and version
Description
Looks up a dataset from the registry returned by datasets,
resolves the requested version, and downloads it from either Zenodo or
GitHub.
Usage
load_dataset(dataset, version = NULL, source = "Zenodo")
Arguments
dataset |
a character string naming the dataset (must match the
|
version |
a character string specifying the version to load (e.g.
|
source |
a character string, either |
Value
A cldf object.
See Also
datasets.
Examples
## Not run:
# load the latest version of a dataset
ds <- load_dataset("vanuatuvoices")
# load a specific version
ds <- load_dataset("vanuatuvoices", version = "v1.3")
# load from GitHub instead
ds <- load_dataset("vanuatuvoices", source = "GitHub")
## End(Not run)
Returns a CLDF dataset object of the latest D-PLACE version.
Description
Returns a CLDF dataset object of the latest D-PLACE version.
Usage
load_dplace(load_bib = FALSE, cache_dir = NULL)
Arguments
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
Value
A cldf object
Returns a CLDF dataset object of the latest glottolog version.
Description
Returns a CLDF dataset object of the latest glottolog version.
Usage
load_glottolog(load_bib = FALSE, cache_dir = NULL)
Arguments
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
Value
A cldf object
Returns the cachekey for the given path.
Description
Returns the cachekey for the given path.
Usage
make_cache_key(path)
Arguments
path |
a path to generate the cachekey for. |
Value
A string.
Converts all values specified in the CLDF metadata as null to R's NA.
Description
Note that this is run by default on loading a dataset with cldf()
Usage
nullify(cldfobj, nulls = NULL)
Arguments
cldfobj |
a CLDF Object |
nulls |
a dataframe of null values to replace (default=NULL). |
Value
A cldf object
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
cldfobj <- nullify(cldfobj)
Override defaults
Description
Merges two lists applying override values on top of the default values.
Usage
override_defaults(...)
Arguments
... |
any number of lists with configuration values |
Value
a list with the values from the first list replacing those in the second and so on
Plot CLDF Languages on an Interactive Map
Description
Creates a leaflet map showing all languages in the CLDF dataset that have geographic coordinates. Longitudes are standardized to a 0-360 range to ensure a continuous Pacific-centered view.
Usage
plot_languages(x, color_by = "ID")
Arguments
x |
A cldf object. |
color_by |
Character string specifying the column in |
Value
A leaflet map object.
Plot Distribution of a Specific Parameter
Description
Filters the dataset for a specific Parameter ID and maps the values across languages. This function automatically resolves whether the data is in a Form or Value table and joins it with geographic data.
Usage
plot_parameter(x, parameter = "1sg_a", color_by = "Value")
Arguments
x |
A cldf object. |
parameter |
Character string. The ID of the parameter to plot (e.g., "1sg_a"). |
color_by |
Character string. The column to use for the color scale (e.g., "Value"). |
Value
A leaflet map object.
Plot Words/Forms as Text Labels on a Map
Description
Similar to plot_parameter, but instead of circles, this function renders the
actual phonetic forms (Value) as text labels directly on the map. Labels are
color-coded based on the color_by column (e.g., Cognacy).
Usage
plot_word(x, parameter = "1sg_a", color_by = "Cognacy")
Arguments
x |
A cldf object. |
parameter |
Character string. The ID of the parameter (word) to plot. |
color_by |
Character string. Column used to categorize and color the text labels. |
Value
A leaflet map object.
Summarises the CLDF file
Description
Summarises the CLDF file
Usage
## S3 method for class 'cldf'
print(x, ...)
Arguments
x |
the CLDF dataset |
... |
Arguments to be passed to or from other methods. Currently not used. |
Value
No return value, called for side effects.
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
print(cldfobj)
Prints a CLDF schema
Description
Prints a CLDF schema
Usage
## S3 method for class 'cldf_schema'
print(x, ...)
Arguments
x |
the CLDF dataset |
... |
Arguments to be passed to or from other methods. Currently not used. |
Value
No return value, called for side effects.
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
print(schema(cldfobj))
Load and access bibliographic sources from a CLDF dataset
Description
Reads and parses the BibTeX sources file from a CLDF dataset, making
bibliographic references available in bibtex format. By default, sources
are not loaded automatically when using cldf() as BibTeX parsing can be
time-consuming. Use this function to load them, or pass load_bib=TRUE to
cldf() when loading the dataset.
Usage
read_bib(object)
Arguments
object |
A cldf object containing the dataset |
Value
The cldf object, modified to include a sources list with
parsed BibTeX data
Examples
# Load a dataset with sources
ds <- cldf(system.file("extdata/huon", "cldf-metadata.json",
package="rcldf"), load_bib=TRUE)
# Or load without sources first, then add them
ds_no_bib <- cldf(system.file("extdata/huon", "cldf-metadata.json",
package="rcldf"))
ds <- read_bib(ds_no_bib)
# View the sources
ds$sources
Relabels a column in a dataset for merging.
Description
Relabels a column in a dataset for merging.
Usage
relabel(column, table)
Arguments
column |
the tablename. |
table |
the tablename. |
Value
A string of "column.table"
Helper function to resolve the path (e.g. directory or md.json file)
Description
Helper function to resolve the path (e.g. directory or md.json file)
Usage
resolve_path(path, cache_dir = NA)
Arguments
path |
the path to resolve |
cache_dir |
a directory to cache downloaded files to |
Value
A list of two items:
path - string containing the path to the metadata.json file
metadata - a csvwr metadata object
Visualize CLDF Dataset Schema
Description
Extracts the CLDF dataset schema showing tables, columns, data types, and foreign key relationships.
Usage
schema(cldf_obj)
Arguments
cldf_obj |
A CLDF object created with |
Value
A schema object:
Examples
## Not run:
# Load a dataset
df <- cldf("path/to/dataset")
schema(df)
## End(Not run)
Expands all values with separators.
Description
Note that this is run by default on loading a dataset with cldf()
Usage
separate(cldfobj, separators = NULL)
Arguments
cldfobj |
a CLDF Object |
separators |
a dataframe of separator values to replace (default=NULL). |
Value
A cldf object
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
cldfobj <- separate(cldfobj)
Sets the cache dir for the current session.
Description
Sets the cache dir for the current session.
Usage
set_cache_dir(cache_dir = NA)
Arguments
cache_dir |
a directory to use |
Value
NULL. Sets an environment value.
Subset a CLDF object with Cascading Filters
Description
Subset a CLDF object with Cascading Filters
Usage
subset_cldf(x, expr)
Arguments
x |
A cldf object. |
expr |
A logical expression (e.g., Language_ID == 'kate') |
Summarises the CLDF file
Description
Summarises the CLDF file
Usage
## S3 method for class 'cldf'
summary(object, ...)
Arguments
object |
the CLDF dataset |
... |
Arguments to be passed to or from other methods. Currently not used. |
Value
None
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
summary(cldfobj)
Updates a table tbl based on expression e.
Description
Helper function to filter a table based on a logical expression.
Usage
update_table(e, tbl)
Arguments
e |
the expression. |
tbl |
the table. |
Value
A filtered tables.