In this vignette, you can see what a codebook generated from a dataset with rich metadata looks like. This dataset includes mock data for a short German Big Five personality inventory and an age variable. The dataset follows the format created when importing data from formr.org. However, data imported using the haven
package uses similar metadata. You can also add such metadata yourself, or use the codebook package for unannotated datasets.
As you can see below, the codebook
package automatically computes reliabilities for multi-item inventories, generates nicely labelled plots and outputs summary statistics. The same information is also stored in a table, which you can export to various formats. Additionally, codebook
can show you different kinds of (labelled) missing values, and show you common missingness patterns. As you cannot see, but search engines will, the codebook
package also generates JSON-LD metadata for the dataset. If you share your codebook as an HTML file online, this metadata should make it easier for others to find your data. See what Google sees here.
knit_by_pkgdown <- !is.null(knitr::opts_chunk$get("fig.retina"))
knitr::opts_chunk$set(warning = FALSE, message = TRUE, error = FALSE)
ggplot2::theme_set(ggplot2::theme_bw())
library(codebook)
data("bfi", package = 'codebook')
if (!knit_by_pkgdown) {
library(dplyr)
bfi <- bfi %>% select(-starts_with("BFIK_extra"),
-starts_with("BFIK_open"),
-starts_with("BFIK_consc"))
}
set.seed(1)
bfi$age <- rpois(nrow(bfi), 30)
library(labelled)
var_label(bfi$age) <- "Alter"
By default, we only set the required metadata attributes name
and description
to sensible values. However, there is a number of attributes you can set to describe the data better. Find out more.
metadata(bfi)$name <- "MOCK Big Five Inventory dataset (German metadata demo)"
metadata(bfi)$description <- "a small mock Big Five Inventory dataset"
metadata(bfi)$identifier <- "doi:10.5281/zenodo.1326520"
metadata(bfi)$datePublished <- "2016-06-01"
metadata(bfi)$creator <- list(
"@type" = "Person",
givenName = "Ruben", familyName = "Arslan",
email = "ruben.arslan@gmail.com",
affiliation = list("@type" = "Organization",
name = "MPI Human Development, Berlin"))
metadata(bfi)$citation <- "Arslan (2016). Mock BFI data."
metadata(bfi)$url <- "https://rubenarslan.github.io/codebook/articles/codebook.html"
metadata(bfi)$temporalCoverage <- "2016"
metadata(bfi)$spatialCoverage <- "Goettingen, Germany"
# We don't want to look at the code in the codebook.
knitr::opts_chunk$set(warning = TRUE, message = TRUE, echo = FALSE)
## Warning in doTryCatch(return(expr), name, parentenv, handler): Reliability CIs
## could not be computed for BFIK_neuro
## Warning in doTryCatch(return(expr), name, parentenv, handler): missing value
## where TRUE/FALSE needed
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
## Loading required namespace: GPArotation
Dataset name: MOCK Big Five Inventory dataset (German metadata demo)
a small mock Big Five Inventory dataset
Metadata for search engines
Temporal Coverage: 2016
Spatial Coverage: Goettingen, Germany
Citation: Arslan (2016). Mock BFI data.
URL: https://rubenarslan.github.io/codebook/articles/codebook.html
Identifier: doi:10.5281/zenodo.1326520
Date published: 2016-06-01
Creator:
name | value |
---|---|
@type | Person |
givenName | Ruben |
familyName | Arslan |
ruben.arslan@gmail.com | |
affiliation | list(@type = “Organization”, name = “MPI Human Development, Berlin”) |
|
28 completed rows, 28 who entered any information, 0 only viewed the first page. There are 0 expired rows (people who did not finish filling out in the requested time frame). In total, there are 28 rows including unfinished and expired rows.
There were 28 unique participants, of which 28 finished filling out at least one survey.
This survey was not repeated.
The first session started on 2016-07-08 09:54:16, the last session on 2016-11-02 21:19:50.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
People took on average 127.36 minutes (median 1.48) to answer the survey.
## Warning: Removed 4 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
#Variables