In this vignette, you can see what a codebook generated from a dataset with rich metadata looks like. This dataset includes mock data for a short German Big Five personality inventory and an age variable. The dataset follows the format created when importing data from formr.org. However, data imported using the haven package uses similar metadata. You can also add such metadata yourself, or use the codebook package for unannotated datasets.

As you can see below, the codebook package automatically computes reliabilities for multi-item inventories, generates nicely labelled plots and outputs summary statistics. The same information is also stored in a table, which you can export to various formats. Additionally, codebook can show you different kinds of (labelled) missing values, and show you common missingness patterns. As you cannot see, but search engines will, the codebook package also generates JSON-LD metadata for the dataset. If you share your codebook as an HTML file online, this metadata should make it easier for others to find your data. See what Google sees here.

knit_by_pkgdown <- !is.null(knitr::opts_chunk$get("fig.retina"))
knitr::opts_chunk$set(warning = FALSE, message = TRUE, error = FALSE)
ggplot2::theme_set(ggplot2::theme_bw())

library(codebook)
data("bfi", package = 'codebook')
if (!knit_by_pkgdown) {
  library(dplyr)
    bfi <- bfi %>% select(-starts_with("BFIK_extra"),
                        -starts_with("BFIK_open"),
                        -starts_with("BFIK_consc"))
}
set.seed(1)
bfi$age <- rpois(nrow(bfi), 30)
library(labelled)
var_label(bfi$age) <- "Alter"

By default, we only set the required metadata attributes name and description to sensible values. However, there is a number of attributes you can set to describe the data better. Find out more.

metadata(bfi)$name <- "MOCK Big Five Inventory dataset (German metadata demo)"
metadata(bfi)$description <- "a small mock Big Five Inventory dataset"
metadata(bfi)$identifier <- "doi:10.5281/zenodo.1326520"
metadata(bfi)$datePublished <- "2016-06-01"
metadata(bfi)$creator <- list(
      "@type" = "Person",
      givenName = "Ruben", familyName = "Arslan",
      email = "ruben.arslan@gmail.com", 
      affiliation = list("@type" = "Organization",
        name = "MPI Human Development, Berlin"))
metadata(bfi)$citation <- "Arslan (2016). Mock BFI data."
metadata(bfi)$url <- "https://rubenarslan.github.io/codebook/articles/codebook.html"
metadata(bfi)$temporalCoverage <- "2016" 
metadata(bfi)$spatialCoverage <- "Goettingen, Germany" 
# We don't want to look at the code in the codebook.
knitr::opts_chunk$set(warning = TRUE, message = TRUE, echo = FALSE)
## Warning in doTryCatch(return(expr), name, parentenv, handler): Reliability CIs
## could not be computed for BFIK_neuro
## Warning in doTryCatch(return(expr), name, parentenv, handler): missing value
## where TRUE/FALSE needed
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## Loading required namespace: GPArotation

Metadata

Description

Dataset name: MOCK Big Five Inventory dataset (German metadata demo)

a small mock Big Five Inventory dataset

Metadata for search engines

name value
@type Person
givenName Ruben
familyName Arslan
email
affiliation list(@type = “Organization”, name = “MPI Human Development, Berlin”)
x
session
created
modified
ended
expired
BFIK_agree_4R
BFIK_agree_1R
BFIK_neuro_2R
BFIK_agree_3R
BFIK_neuro_3
BFIK_neuro_4
BFIK_agree_2
BFIK_agree
BFIK_neuro
age

Survey overview

28 completed rows, 28 who entered any information, 0 only viewed the first page. There are 0 expired rows (people who did not finish filling out in the requested time frame). In total, there are 28 rows including unfinished and expired rows.

There were 28 unique participants, of which 28 finished filling out at least one survey.

This survey was not repeated.

The first session started on 2016-07-08 09:54:16, the last session on 2016-11-02 21:19:50.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Starting date times

People took on average 127.36 minutes (median 1.48) to answer the survey.

## Warning: Removed 4 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).

Duration people took for answering the survey

#Variables

Scale: BFIK_agree

Overview

Reliability: ωordinal [95% CI] = 0.61 [0.37;0.84].

Missing: 0.

Likert plot of scale BFIK_agree items

Distribution of scale BFIK_agree

Reliability details

## No viewer found, probably documenting or testing
Scale diagnosis
Reliability (internal consistency) estimates
Scale structure
Information about this scale
Dataframe: res$dat
Items: BFIK_agree_4R, BFIK_agree_1R, BFIK_agree_3R & BFIK_agree_2
Observations: 28
Positive correlations: 6
Number of correlations: 6
Percentage positive correlations: 100
Estimates assuming interval level
Omega (total): 0.82
Omega (hierarchical): 0.77
Revelle’s Omega (total): 0.89
Greatest Lower Bound (GLB): 0.89
Coefficient H: 0.88
Coefficient Alpha: 0.80

Confidence intervals

Omega (total): [0.71; 0.93]
Coefficient Alpha [0.68; 0.92]
Estimates assuming ordinal level
Ordinal Omega (total): 0.61
Ordinal Omega (hierarch.): 0.59
Ordinal Coefficient Alpha: 0.59

Confidence intervals

Ordinal Omega (total): [0.37; 0.84]
Ordinal Coefficient Alpha [0.33; 0.84]

Note: the normal point estimate and confidence interval for omega are based on the procedure suggested by Dunn, Baguley & Brunsden (2013) using the MBESS function ci.reliability, whereas the psych package point estimate was suggested in Revelle & Zinbarg (2008). See the help (‘?ufs::scaleStructure’) for more information.

Eigen values

2.539, 0.732, 0.54 & 0.189

Factor analysis (reproducing only shared variance)
ML1
BFIK_agree_4R 0.872
BFIK_agree_1R 0.614
BFIK_agree_3R 0.862
BFIK_agree_2 0.509
Component analysis (reproducing full covariance matrix)
PC1
BFIK_agree_4R 0.871
BFIK_agree_1R 0.748
BFIK_agree_3R 0.880
BFIK_agree_2 0.668
Item descriptives
mea n med ian var sd IQR se min q1 q3 max ske w kur t dip n NA val id
BFIK_agree_4R 2.92857142857143 3 1.4021164021164 1.1841099620037 2 0.223775748886852 1 2 4 5 0.291229375894641 -1.00398091847633 0.125 28 0 28
BFIK_agree_1R 3 3 0.888888888888889 0.942809041582063 2 0.17817416127495 2 2 4 5 0.285562353940721 -1.25653846153846 0.160714285714286 28 0 28
BFIK_agree_3R 3.03571428571429 3 1.66534391534392 1.29048204766433 2 0.243878183536658 1 2 4 5 0.0404804251434893 -1.2234635793667 0.142857142857143 28 0 28
BFIK_agree_2 3.5 4 1.59259259259259 1.26197963240006 2 0.238491733354234 1 2 5 5 -0.476294014441626 -0.815313059033989 0.125 28 0 28
Scattermatrix

Scatterplot

## No viewer found, probably documenting or testing

Summary statistics

name label type type_options data_type value_labels optional item_order n_missing complete_rate min median max mean sd n_value_labels hist
BFIK_agree_4R Ich kann mich schroff und abweisend anderen gegenüber verhalten. rating_button 5 haven_labelled 5. 1: Trifft überhaupt nicht zu,
4. 2,
3. 3,
2. 4,
1. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 5 0 1 1 3 5 2.928571 1.184110 6 ▂▇▁▃▁▅▁▂
BFIK_agree_1R Ich neige dazu, andere zu kritisieren. rating_button 5 haven_labelled 5. 1: Trifft überhaupt nicht zu,
4. 2,
3. 3,
2. 4,
1. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 7 0 1 2 3 5 3.000000 0.942809 6 ▇▁▅▁▁▆▁▁
BFIK_agree_3R Ich kann mich kalt und distanziert verhalten. rating_button 5 haven_labelled 5. 1: Trifft überhaupt nicht zu,
4. 2,
3. 3,
2. 4,
1. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 13 0 1 1 3 5 3.035714 1.290482 6 ▂▇▁▃▁▇▁▃
BFIK_agree_2 Ich schenke anderen leicht Vertrauen, glaube an das Gute im Menschen. rating_button 5 haven_labelled 1. 1: Trifft überhaupt nicht zu,
2. 2,
3. 3,
4. 4,
5. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 17 0 1 1 4 5 3.500000 1.261980 6 ▂▅▁▅▁▇▁▆

Scale: BFIK_neuro

Overview

Reliability: ωtotal [95% CI] = 0.82 [not computed].

Missing: 0.

Likert plot of scale BFIK_neuro items

Distribution of scale BFIK_neuro

Reliability details

## No viewer found, probably documenting or testing
Scale diagnosis
Reliability (internal consistency) estimates
Scale structure
Information about this scale
Dataframe: res$dat
Items: BFIK_neuro_2R, BFIK_neuro_3 & BFIK_neuro_4
Observations: 28
Positive correlations: 3
Number of correlations: 3
Percentage positive correlations: 100
Estimates assuming interval level
Omega (total): 0.82
Omega (hierarchical): 0.03
Revelle’s Omega (total): 0.80
Greatest Lower Bound (GLB): 0.83
Coefficient H: 0.98
Coefficient Alpha: 0.75
Estimates assuming ordinal level
Ordinal Omega (total): 277.19
Ordinal Omega (hierarch.): 278.28
Ordinal Coefficient Alpha: 0.45

Note: the normal point estimate and confidence interval for omega are based on the procedure suggested by Dunn, Baguley & Brunsden (2013) using the MBESS function ci.reliability, whereas the psych package point estimate was suggested in Revelle & Zinbarg (2008). See the help (‘?ufs::scaleStructure’) for more information.

Eigen values

2.015, 0.723 & 0.262

Factor analysis (reproducing only shared variance)
ML1
BFIK_neuro_2R 0.444
BFIK_neuro_3 0.732
BFIK_neuro_4 0.991
Component analysis (reproducing full covariance matrix)
PC1
BFIK_neuro_2R 0.670
BFIK_neuro_3 0.863
BFIK_neuro_4 0.907
Item descriptives
mea n med ian var sd IQR se min q1 q3 max ske w kur t dip n NA val id
BFIK_neuro_2R 3.10714285714286 3 0.765873015873016 0.875141711880434 2 0.165386237969641 2 2 4 5 0.137995073801346 -0.952614195115202 0.160714285714286 28 0 28
BFIK_neuro_3 3.07142857142857 3 1.62433862433862 1.27449543912037 2 0.240856998499897 1 2 4.5 5 0.0884692815326234 -0.956164268627212 0.125 28 0 28
BFIK_neuro_4 2.5 2 1.44444444444444 1.20185042515466 2.5 0.227128381289749 1 1 4 4 0.137854486635856 -1.55175238962221 0.160714285714286 28 0 28
Scattermatrix

Scatterplot

## No viewer found, probably documenting or testing

Summary statistics

name label type type_options data_type value_labels optional item_order n_missing complete_rate min median max mean sd n_value_labels hist
BFIK_neuro_2R Ich bin entspannt, lasse mich durch Stress nicht aus der Ruhe bringen. rating_button 5 haven_labelled 5. 1: Trifft überhaupt nicht zu,
4. 2,
3. 3,
2. 4,
1. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 9 0 1 2 3 5 3.107143 0.8751417 6 ▆▁▇▁▁▇▁▁
BFIK_neuro_3 Ich mache mir viele Sorgen. rating_button 5 haven_labelled 1. 1: Trifft überhaupt nicht zu,
2. 2,
3. 3,
4. 4,
5. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 15 0 1 1 3 5 3.071429 1.2744954 6 ▃▇▁▇▁▅▁▅
BFIK_neuro_4 Ich werde leicht nervös und unsicher. rating_button 5 haven_labelled 1. 1: Trifft überhaupt nicht zu,
2. 2,
3. 3,
4. 4,
5. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 16 0 1 1 2 4 2.500000 1.2018504 6 ▆▁▇▁▁▂▁▇

age

Alter

Distribution

Distribution of values for age

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
age Alter numeric 0 1 19 32 38 30.5 4.670633 ▂▂▇▇▅

Missingness report

JSON-LD metadata The following JSON-LD can be found by search engines, if you share this codebook publicly on the web.

{
  "name": "MOCK Big Five Inventory dataset (German metadata demo)",
  "description": "a small mock Big Five Inventory dataset\n\n\n## Table of variables\nThis table contains variable names, labels, and number of missing values.\nSee the complete codebook for more.\n\n|name          |label                                                                      | n_missing|\n|:-------------|:--------------------------------------------------------------------------|---------:|\n|session       |NA                                                                         |         0|\n|created       |user first opened survey                                                   |         0|\n|modified      |user last edited survey                                                    |         0|\n|ended         |user finished survey                                                       |         0|\n|expired       |NA                                                                         |        28|\n|BFIK_agree_4R |__Ich kann mich schroff und abweisend anderen gegenüber verhalten.__       |         0|\n|BFIK_agree_1R |__Ich neige dazu, andere zu kritisieren.__                                 |         0|\n|BFIK_neuro_2R |__Ich bin entspannt, lasse mich durch Stress nicht aus der Ruhe bringen.__ |         0|\n|BFIK_agree_3R |__Ich kann mich kalt und distanziert verhalten.__                          |         0|\n|BFIK_neuro_3  |__Ich mache mir viele Sorgen.__                                            |         0|\n|BFIK_neuro_4  |__Ich werde leicht nervös und unsicher.__                                  |         0|\n|BFIK_agree_2  |__Ich schenke anderen leicht Vertrauen, glaube an das Gute im Menschen.__  |         0|\n|BFIK_agree    |4 BFIK_agree items aggregated by aggregation_function                      |         0|\n|BFIK_neuro    |3 BFIK_neuro items aggregated by aggregation_function                      |         0|\n|age           |Alter                                                                      |         0|\n\n### Note\nThis dataset was automatically described using the [codebook R package](https://rubenarslan.github.io/codebook/) (version 0.9.2).",
  "identifier": "doi:10.5281/zenodo.1326520",
  "datePublished": "2016-06-01",
  "creator": {
    "@type": "Person",
    "givenName": "Ruben",
    "familyName": "Arslan",
    "email": "ruben.arslan@gmail.com",
    "affiliation": {
      "@type": "Organization",
      "name": "MPI Human Development, Berlin"
    }
  },
  "citation": "Arslan (2016). Mock BFI data.",
  "url": "https://rubenarslan.github.io/codebook/articles/codebook.html",
  "temporalCoverage": "2016",
  "spatialCoverage": "Goettingen, Germany",
  "keywords": ["session", "created", "modified", "ended", "expired", "BFIK_agree_4R", "BFIK_agree_1R", "BFIK_neuro_2R", "BFIK_agree_3R", "BFIK_neuro_3", "BFIK_neuro_4", "BFIK_agree_2", "BFIK_agree", "BFIK_neuro", "age"],
  "@context": "http://schema.org/",
  "@type": "Dataset",
  "variableMeasured": [
    {
      "name": "session",
      "@type": "propertyValue"
    },
    {
      "name": "created",
      "description": "user first opened survey",
      "@type": "propertyValue"
    },
    {
      "name": "modified",
      "description": "user last edited survey",
      "@type": "propertyValue"
    },
    {
      "name": "ended",
      "description": "user finished survey",
      "@type": "propertyValue"
    },
    {
      "name": "expired",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_agree_4R",
      "description": "__Ich kann mich schroff und abweisend anderen gegenüber verhalten.__",
      "value": "5. 1: Trifft überhaupt nicht zu,\n4. 2,\n3. 3,\n2. 4,\n1. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_agree_1R",
      "description": "__Ich neige dazu, andere zu kritisieren.__",
      "value": "5. 1: Trifft überhaupt nicht zu,\n4. 2,\n3. 3,\n2. 4,\n1. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_neuro_2R",
      "description": "__Ich bin entspannt, lasse mich durch Stress nicht aus der Ruhe bringen.__",
      "value": "5. 1: Trifft überhaupt nicht zu,\n4. 2,\n3. 3,\n2. 4,\n1. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_agree_3R",
      "description": "__Ich kann mich kalt und distanziert verhalten.__",
      "value": "5. 1: Trifft überhaupt nicht zu,\n4. 2,\n3. 3,\n2. 4,\n1. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_neuro_3",
      "description": "__Ich mache mir viele Sorgen.__",
      "value": "1. 1: Trifft überhaupt nicht zu,\n2. 2,\n3. 3,\n4. 4,\n5. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_neuro_4",
      "description": "__Ich werde leicht nervös und unsicher.__",
      "value": "1. 1: Trifft überhaupt nicht zu,\n2. 2,\n3. 3,\n4. 4,\n5. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_agree_2",
      "description": "__Ich schenke anderen leicht Vertrauen, glaube an das Gute im Menschen.__",
      "value": "1. 1: Trifft überhaupt nicht zu,\n2. 2,\n3. 3,\n4. 4,\n5. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_agree",
      "description": "4 BFIK_agree items aggregated by aggregation_function",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_neuro",
      "description": "3 BFIK_neuro items aggregated by aggregation_function",
      "@type": "propertyValue"
    },
    {
      "name": "age",
      "description": "Alter",
      "@type": "propertyValue"
    }
  ]
}`

Codebook table

name label type type_options data_type value_labels optional scale_item_names item_order n_missing complete_rate n_unique empty count min median max mean sd whitespace n_value_labels hist
session NA NA NA character NA NA NA NA 0 1 28 0 NA 64 NA 64 NA NA 0 NA NA
created user first opened survey NA NA POSIXct NA NA NA NA 0 1 28 NA NA 2016-07-08 09:54:16 2016-07-08 12:47:07 2016-11-02 21:19:50 NA NA NA NA NA
modified user last edited survey NA NA POSIXct NA NA NA NA 0 1 28 NA NA 2016-07-08 09:55:43 2016-07-08 14:23:22 2016-11-02 21:21:53 NA NA NA NA NA
ended user finished survey NA NA POSIXct NA NA NA NA 0 1 28 NA NA 2016-07-08 09:55:43 2016-07-08 14:23:22 2016-11-02 21:21:53 NA NA NA NA NA
expired NA NA NA logical NA NA NA NA 28 0 NA NA : NA NA NA NaN NA NA NA NA
BFIK_agree_4R Ich kann mich schroff und abweisend anderen gegenüber verhalten. rating_button 5 haven_labelled 5. 1: Trifft überhaupt nicht zu,
4. 2,
3. 3,
2. 4,
1. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 NA 5 0 1 NA NA NA 1 3 5 2.928571 1.1841100 NA 6 ▂▇▁▃▁▅▁▂
BFIK_agree_1R Ich neige dazu, andere zu kritisieren. rating_button 5 haven_labelled 5. 1: Trifft überhaupt nicht zu,
4. 2,
3. 3,
2. 4,
1. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 NA 7 0 1 NA NA NA 2 3 5 3.000000 0.9428090 NA 6 ▇▁▅▁▁▆▁▁
BFIK_neuro_2R Ich bin entspannt, lasse mich durch Stress nicht aus der Ruhe bringen. rating_button 5 haven_labelled 5. 1: Trifft überhaupt nicht zu,
4. 2,
3. 3,
2. 4,
1. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 NA 9 0 1 NA NA NA 2 3 5 3.107143 0.8751417 NA 6 ▆▁▇▁▁▇▁▁
BFIK_agree_3R Ich kann mich kalt und distanziert verhalten. rating_button 5 haven_labelled 5. 1: Trifft überhaupt nicht zu,
4. 2,
3. 3,
2. 4,
1. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 NA 13 0 1 NA NA NA 1 3 5 3.035714 1.2904820 NA 6 ▂▇▁▃▁▇▁▃
BFIK_neuro_3 Ich mache mir viele Sorgen. rating_button 5 haven_labelled 1. 1: Trifft überhaupt nicht zu,
2. 2,
3. 3,
4. 4,
5. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 NA 15 0 1 NA NA NA 1 3 5 3.071429 1.2744954 NA 6 ▃▇▁▇▁▅▁▅
BFIK_neuro_4 Ich werde leicht nervös und unsicher. rating_button 5 haven_labelled 1. 1: Trifft überhaupt nicht zu,
2. 2,
3. 3,
4. 4,
5. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 NA 16 0 1 NA NA NA 1 2 4 2.500000 1.2018504 NA 6 ▆▁▇▁▁▂▁▇
BFIK_agree_2 Ich schenke anderen leicht Vertrauen, glaube an das Gute im Menschen. rating_button 5 haven_labelled 1. 1: Trifft überhaupt nicht zu,
2. 2,
3. 3,
4. 4,
5. 5: Trifft voll und ganz zu,
NA. Item was never rendered for this user.
0 NA 17 0 1 NA NA NA 1 4 5 3.500000 1.2619796 NA 6 ▂▅▁▅▁▇▁▆
BFIK_agree 4 BFIK_agree items aggregated by aggregation_function NA NA numeric NA NA BFIK_agree_4R, BFIK_agree_1R, BFIK_agree_3R, BFIK_agree_2 NA 0 1 NA NA NA 1.5 3.0 4.8 3.116071 0.9316506 NA NA ▂▇▅▅▃
BFIK_neuro 3 BFIK_neuro items aggregated by aggregation_function NA NA numeric NA NA BFIK_neuro_2R, BFIK_neuro_3, BFIK_neuro_4 NA 0 1 NA NA NA 1.3 2.8 4.3 2.892857 0.9254231 NA NA ▅▇▇▆▇
age Alter NA NA numeric NA NA NA NA 0 1 NA NA NA 19.0 32.0 38.0 30.500000 4.6706332 NA NA ▂▂▇▇▅