In this vignette, you can see what a codebook generated from a dataset with rich metadata looks like. This dataset includes mock data for a short German Big Five personality inventory and an age variable. The dataset follows the format created when importing data from formr.org. However, data imported using the haven package uses similar metadata. You can also add such metadata yourself, or use the codebook package for unannotated datasets.

As you can see below, the codebook package automatically computes reliabilities for multi-item inventories, generates nicely labelled plots and outputs summary statistics. The same information is also stored in a table, which you can export to various formats. Additionally, codebook can show you different kinds of (labelled) missing values, and show you common missingness patterns. As you cannot see, but search engines will, the codebook package also generates JSON-LD metadata for the dataset. If you share your codebook as an HTML file online, this metadata should make it easier for others to find your data. See what Google sees here.

knit_by_pkgdown <- !is.null(knitr::opts_chunk$get("fig.retina"))
knitr::opts_chunk$set(warning = FALSE, message = TRUE, error = FALSE)
ggplot2::theme_set(ggplot2::theme_bw())

library(codebook)
data("bfi", package = 'codebook')
if (!knit_by_pkgdown) {
  library(dplyr)
    bfi <- bfi %>% select(-starts_with("BFIK_extra"),
                        -starts_with("BFIK_open"),
                        -starts_with("BFIK_consc"))
}
set.seed(1)
bfi$age <- rpois(nrow(bfi), 30)
library(labelled)
var_label(bfi$age) <- "Alter"

By default, we only set the required metadata attributes name and description to sensible values. However, there is a number of attributes you can set to describe the data better. Find out more.

metadata(bfi)$name <- "MOCK Big Five Inventory dataset (German metadata demo)"
metadata(bfi)$description <- "a small mock Big Five Inventory dataset"
metadata(bfi)$identifier <- "doi:10.5281/zenodo.1326520"
metadata(bfi)$datePublished <- "2016-06-01"
metadata(bfi)$creator <- list(
      "@type" = "Person",
      givenName = "Ruben", familyName = "Arslan",
      email = "ruben.arslan@gmail.com", 
      affiliation = list("@type" = "Organization",
        name = "MPI Human Development, Berlin"))
metadata(bfi)$citation <- "Arslan (2016). Mock BFI data."
metadata(bfi)$url <- "https://rubenarslan.github.io/codebook/articles/codebook.html"
metadata(bfi)$temporalCoverage <- "2016" 
metadata(bfi)$spatialCoverage <- "Goettingen, Germany"

# We don't want to look at the code in the codebook.
knitr::opts_chunk$set(warning = TRUE, message = TRUE, echo = FALSE)

Metadata

Description

Dataset name: MOCK Big Five Inventory dataset (German metadata demo)

a small mock Big Five Inventory dataset

Metadata for search engines

Temporal Coverage: 2016
Spatial Coverage: Goettingen, Germany
Citation: Arslan (2016). Mock BFI data.
URL: https://rubenarslan.github.io/codebook/articles/codebook.html
Identifier: doi:10.5281/zenodo.1326520
Date published: 2016-06-01
Creator:

name	value
@type	Person
givenName	Ruben
familyName	Arslan
email	ruben.arslan@gmail.com
affiliation	Organization , MPI Human Development, Berlin

x
session
created
modified
ended
expired
BFIK_agree_4R
BFIK_agree_1R
BFIK_neuro_2R
BFIK_agree_3R
BFIK_neuro_3
BFIK_neuro_4
BFIK_agree_2
BFIK_agree
BFIK_neuro
age

Survey overview

28 completed rows, 28 who entered any information, 0 only viewed the first page. There are 0 expired rows (people who did not finish filling out in the requested time frame). In total, there are 28 rows including unfinished and expired rows.

There were 28 unique participants, of which 28 finished filling out at least one survey.

This survey was not repeated.

The first session started on 2016-07-08 09:54:16, the last session on 2016-11-02 21:19:50.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Starting date times

People took on average 127.36 minutes (median 1.48) to answer the survey.

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).

Duration people took for answering the survey

Variables

Scale: BFIK_agree

Overview

Reliability: Cronbach’s α [95% CI] = 0.8 [0.68;0.92].

Missing: 0.

Likert plot of scale BFIK_agree items

Distribution of scale BFIK_agree

Reliability details

Reliability

95% Confidence Interval

lower	estimate	upper
0.6815382	0.8005842	0.9196302

	raw_alpha	std.alpha	G6(smc)	average_r	S/N	ase	mean	sd	median_r
	0.8005842	0.8032578	0.8025354	0.5051216	4.082794	0.0607377	3.116071	0.9316506	0.4955289

Reliability if an item is dropped:

	raw_alpha	std.alpha	G6(smc)	average_r	S/N	alpha se	var.r	med.r
BFIK_agree_4R	0.7039106	0.7059214	0.6277253	0.4444909	2.400452	0.0922357	0.0093288	0.4566167
BFIK_agree_1R	0.7822898	0.7821901	0.7633114	0.5448449	3.591160	0.0731005	0.0413123	0.5344410
BFIK_agree_3R	0.6758982	0.6925559	0.6242359	0.4288568	2.252623	0.1072194	0.0212505	0.3469923
BFIK_agree_2	0.8180575	0.8196006	0.7841113	0.6022937	4.543256	0.0568475	0.0219955	0.5971631

Item statistics

	n	raw.r	std.r	r.cor	r.drop	mean	sd
BFIK_agree_4R	28	0.8471206	0.8503385	0.8291974	0.7057555	2.928571	1.184110
BFIK_agree_1R	28	0.7168171	0.7554255	0.6253942	0.5538583	3.000000	0.942809
BFIK_agree_3R	28	0.8820884	0.8651249	0.8439281	0.7510051	3.035714	1.290482
BFIK_agree_2	28	0.7205961	0.7010915	0.5430779	0.4825103	3.500000	1.261980

Non missing response frequency for each item

	1	2	3	4	5
BFIK_agree_4R	0.0714286	0.3928571	0.1785714	0.2500000	0.1071429
BFIK_agree_1R	0.0000000	0.3928571	0.2500000	0.3214286	0.0357143
BFIK_agree_3R	0.1071429	0.3214286	0.1428571	0.2857143	0.1428571
BFIK_agree_2	0.0714286	0.1785714	0.1785714	0.3214286	0.2500000

Summary statistics

name	label	type	type_options	data_type	value_labels	item_order	complete_rate	min	median	max	mean	sd	n_value_labels	hist
BFIK_agree_4R	Ich kann mich schroff und abweisend anderen gegenüber verhalten.	rating_button	5	haven_labelled	5. 1: Trifft überhaupt nicht zu, 4. 2, 3. 3, 2. 4, 1. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	5	1	1	3	5	2.928571	1.184110	6	▂▇▁▃▁▅▁▂
BFIK_agree_1R	Ich neige dazu, andere zu kritisieren.	rating_button	5	haven_labelled	5. 1: Trifft überhaupt nicht zu, 4. 2, 3. 3, 2. 4, 1. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	7	1	2	3	5	3.000000	0.942809	6	▇▁▅▁▁▆▁▁
BFIK_agree_3R	Ich kann mich kalt und distanziert verhalten.	rating_button	5	haven_labelled	5. 1: Trifft überhaupt nicht zu, 4. 2, 3. 3, 2. 4, 1. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	13	1	1	3	5	3.035714	1.290482	6	▂▇▁▃▁▇▁▃
BFIK_agree_2	Ich schenke anderen leicht Vertrauen, glaube an das Gute im Menschen.	rating_button	5	haven_labelled	1. 1: Trifft überhaupt nicht zu, 2. 2, 3. 3, 4. 4, 5. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	17	1	1	4	5	3.500000	1.261980	6	▂▅▁▅▁▇▁▆

Scale: BFIK_neuro

Overview

Reliability: Cronbach’s α [95% CI] = 0.75 [0.61;0.9].

Missing: 0.

Likert plot of scale BFIK_neuro items

Distribution of scale BFIK_neuro

Reliability details

Reliability

95% Confidence Interval

lower	estimate	upper
0.6080638	0.7537326	0.8994015

	raw_alpha	std.alpha	G6(smc)	average_r	S/N	ase	mean	sd	median_r
	0.7537326	0.7476172	0.7145893	0.496833	2.962235	0.0743208	2.892857	0.9254231	0.440167

Reliability if an item is dropped:

	raw_alpha	std.alpha	G6(smc)	average_r	S/N	alpha se	var.r	med.r
BFIK_neuro_2R	0.8400000	0.8408387	0.7253854	0.7253854	5.2829328	0.0602555	NA	0.7253854
BFIK_neuro_3	0.5904682	0.6112721	0.4401670	0.4401670	1.5724937	0.1456732	NA	0.4401670
BFIK_neuro_4	0.4653928	0.4905053	0.3249467	0.3249467	0.9627289	0.1871605	NA	0.3249467

Item statistics

	n	raw.r	std.r	r.cor	r.drop	mean	sd
BFIK_neuro_2R	28	0.6549435	0.7217484	0.4638430	0.4100297	3.107143	0.8751417
BFIK_neuro_3	28	0.8755182	0.8383732	0.7625955	0.6528603	3.071429	1.2744954
BFIK_neuro_4	28	0.9046526	0.8854863	0.8409166	0.7420620	2.500000	1.2018504

Non missing response frequency for each item

	1	2	3	4	5
BFIK_neuro_2R	0.0000000	0.2857143	0.3571429	0.3214286	0.0357143
BFIK_neuro_3	0.1071429	0.2500000	0.2857143	0.1785714	0.1785714
BFIK_neuro_4	0.2500000	0.3214286	0.1071429	0.3214286	0.0000000

Summary statistics

name	label	type	type_options	data_type	value_labels	item_order	complete_rate	min	median	max	mean	sd	n_value_labels	hist
BFIK_neuro_2R	Ich bin entspannt, lasse mich durch Stress nicht aus der Ruhe bringen.	rating_button	5	haven_labelled	5. 1: Trifft überhaupt nicht zu, 4. 2, 3. 3, 2. 4, 1. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	9	1	2	3	5	3.107143	0.8751417	6	▆▁▇▁▁▇▁▁
BFIK_neuro_3	Ich mache mir viele Sorgen.	rating_button	5	haven_labelled	1. 1: Trifft überhaupt nicht zu, 2. 2, 3. 3, 4. 4, 5. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	15	1	1	3	5	3.071429	1.2744954	6	▃▇▁▇▁▅▁▅
BFIK_neuro_4	Ich werde leicht nervös und unsicher.	rating_button	5	haven_labelled	1. 1: Trifft überhaupt nicht zu, 2. 2, 3. 3, 4. 4, 5. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	16	1	1	2	4	2.500000	1.2018504	6	▆▁▇▁▁▂▁▇

age

Alter

Distribution

Distribution of values for age

0 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	hist
age	Alter	numeric	0	1	19	32	38	30.5	4.670633	▂▂▇▇▅

Missingness report

JSON-LD metadata

The following JSON-LD can be found by search engines, if you share this codebook publicly on the web.

{
  "name": "MOCK Big Five Inventory dataset (German metadata demo)",
  "description": "a small mock Big Five Inventory dataset\n\n\n## Table of variables\nThis table contains variable names, labels, and number of missing values.\nSee the complete codebook for more.\n\n|name          |label                                                                      | n_missing|\n|:-------------|:--------------------------------------------------------------------------|---------:|\n|session       |NA                                                                         |         0|\n|created       |user first opened survey                                                   |         0|\n|modified      |user last edited survey                                                    |         0|\n|ended         |user finished survey                                                       |         0|\n|expired       |NA                                                                         |        28|\n|BFIK_agree_4R |__Ich kann mich schroff und abweisend anderen gegenüber verhalten.__       |         0|\n|BFIK_agree_1R |__Ich neige dazu, andere zu kritisieren.__                                 |         0|\n|BFIK_neuro_2R |__Ich bin entspannt, lasse mich durch Stress nicht aus der Ruhe bringen.__ |         0|\n|BFIK_agree_3R |__Ich kann mich kalt und distanziert verhalten.__                          |         0|\n|BFIK_neuro_3  |__Ich mache mir viele Sorgen.__                                            |         0|\n|BFIK_neuro_4  |__Ich werde leicht nervös und unsicher.__                                  |         0|\n|BFIK_agree_2  |__Ich schenke anderen leicht Vertrauen, glaube an das Gute im Menschen.__  |         0|\n|BFIK_agree    |4 BFIK_agree items aggregated by aggregation_function                      |         0|\n|BFIK_neuro    |3 BFIK_neuro items aggregated by aggregation_function                      |         0|\n|age           |Alter                                                                      |         0|\n\n### Note\nThis dataset was automatically described using the [codebook R package](https://rubenarslan.github.io/codebook/) (version 0.9.6).",
  "identifier": "doi:10.5281/zenodo.1326520",
  "datePublished": "2016-06-01",
  "creator": {
    "@type": "Person",
    "givenName": "Ruben",
    "familyName": "Arslan",
    "email": "ruben.arslan@gmail.com",
    "affiliation": {
      "@type": "Organization",
      "name": "MPI Human Development, Berlin"
    }
  },
  "citation": "Arslan (2016). Mock BFI data.",
  "url": "https://rubenarslan.github.io/codebook/articles/codebook.html",
  "temporalCoverage": "2016",
  "spatialCoverage": "Goettingen, Germany",
  "keywords": ["session", "created", "modified", "ended", "expired", "BFIK_agree_4R", "BFIK_agree_1R", "BFIK_neuro_2R", "BFIK_agree_3R", "BFIK_neuro_3", "BFIK_neuro_4", "BFIK_agree_2", "BFIK_agree", "BFIK_neuro", "age"],
  "@context": "https://schema.org/",
  "@type": "Dataset",
  "variableMeasured": [
    {
      "name": "session",
      "@type": "propertyValue"
    },
    {
      "name": "created",
      "description": "user first opened survey",
      "@type": "propertyValue"
    },
    {
      "name": "modified",
      "description": "user last edited survey",
      "@type": "propertyValue"
    },
    {
      "name": "ended",
      "description": "user finished survey",
      "@type": "propertyValue"
    },
    {
      "name": "expired",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_agree_4R",
      "description": "__Ich kann mich schroff und abweisend anderen gegenüber verhalten.__",
      "value": "5. 1: Trifft überhaupt nicht zu,\n4. 2,\n3. 3,\n2. 4,\n1. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_agree_1R",
      "description": "__Ich neige dazu, andere zu kritisieren.__",
      "value": "5. 1: Trifft überhaupt nicht zu,\n4. 2,\n3. 3,\n2. 4,\n1. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_neuro_2R",
      "description": "__Ich bin entspannt, lasse mich durch Stress nicht aus der Ruhe bringen.__",
      "value": "5. 1: Trifft überhaupt nicht zu,\n4. 2,\n3. 3,\n2. 4,\n1. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_agree_3R",
      "description": "__Ich kann mich kalt und distanziert verhalten.__",
      "value": "5. 1: Trifft überhaupt nicht zu,\n4. 2,\n3. 3,\n2. 4,\n1. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_neuro_3",
      "description": "__Ich mache mir viele Sorgen.__",
      "value": "1. 1: Trifft überhaupt nicht zu,\n2. 2,\n3. 3,\n4. 4,\n5. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_neuro_4",
      "description": "__Ich werde leicht nervös und unsicher.__",
      "value": "1. 1: Trifft überhaupt nicht zu,\n2. 2,\n3. 3,\n4. 4,\n5. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_agree_2",
      "description": "__Ich schenke anderen leicht Vertrauen, glaube an das Gute im Menschen.__",
      "value": "1. 1: Trifft überhaupt nicht zu,\n2. 2,\n3. 3,\n4. 4,\n5. 5: Trifft voll und ganz zu,\nNA. Item was never rendered for this user.",
      "maxValue": 5,
      "minValue": 1,
      "measurementTechnique": "self-report",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_agree",
      "description": "4 BFIK_agree items aggregated by aggregation_function",
      "@type": "propertyValue"
    },
    {
      "name": "BFIK_neuro",
      "description": "3 BFIK_neuro items aggregated by aggregation_function",
      "@type": "propertyValue"
    },
    {
      "name": "age",
      "description": "Alter",
      "@type": "propertyValue"
    }
  ]
}`

Codebook table

name	label	type	type_options	data_type	value_labels	optional	scale_item_names	item_order	n_missing	complete_rate	n_unique	empty	count	min	median	max	mean	sd	whitespace	n_value_labels	hist
session	NA	NA	NA	character	NA	NA	NA	NA	0	1	28	0	NA	64	NA	64	NA	NA	0	NA	NA
created	user first opened survey	NA	NA	POSIXct	NA	NA	NA	NA	0	1	28	NA	NA	2016-07-08 09:54:16	2016-07-08 12:47:07.5	2016-11-02 21:19:50	NA	NA	NA	NA	NA
modified	user last edited survey	NA	NA	POSIXct	NA	NA	NA	NA	0	1	28	NA	NA	2016-07-08 09:55:43	2016-07-08 14:23:22.5	2016-11-02 21:21:53	NA	NA	NA	NA	NA
ended	user finished survey	NA	NA	POSIXct	NA	NA	NA	NA	0	1	28	NA	NA	2016-07-08 09:55:43	2016-07-08 14:23:22.5	2016-11-02 21:21:53	NA	NA	NA	NA	NA
expired	NA	NA	NA	logical	NA	NA	NA	NA	28	0	NA	NA	:	NA	NA	NA	NaN	NA	NA	NA	NA
BFIK_agree_4R	Ich kann mich schroff und abweisend anderen gegenüber verhalten.	rating_button	5	haven_labelled	5. 1: Trifft überhaupt nicht zu, 4. 2, 3. 3, 2. 4, 1. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	0	NA	5	0	1	NA	NA	NA	1	3	5	2.928571	1.1841100	NA	6	▂▇▁▃▁▅▁▂
BFIK_agree_1R	Ich neige dazu, andere zu kritisieren.	rating_button	5	haven_labelled	5. 1: Trifft überhaupt nicht zu, 4. 2, 3. 3, 2. 4, 1. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	0	NA	7	0	1	NA	NA	NA	2	3	5	3.000000	0.9428090	NA	6	▇▁▅▁▁▆▁▁
BFIK_neuro_2R	Ich bin entspannt, lasse mich durch Stress nicht aus der Ruhe bringen.	rating_button	5	haven_labelled	5. 1: Trifft überhaupt nicht zu, 4. 2, 3. 3, 2. 4, 1. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	0	NA	9	0	1	NA	NA	NA	2	3	5	3.107143	0.8751417	NA	6	▆▁▇▁▁▇▁▁
BFIK_agree_3R	Ich kann mich kalt und distanziert verhalten.	rating_button	5	haven_labelled	5. 1: Trifft überhaupt nicht zu, 4. 2, 3. 3, 2. 4, 1. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	0	NA	13	0	1	NA	NA	NA	1	3	5	3.035714	1.2904820	NA	6	▂▇▁▃▁▇▁▃
BFIK_neuro_3	Ich mache mir viele Sorgen.	rating_button	5	haven_labelled	1. 1: Trifft überhaupt nicht zu, 2. 2, 3. 3, 4. 4, 5. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	0	NA	15	0	1	NA	NA	NA	1	3	5	3.071429	1.2744954	NA	6	▃▇▁▇▁▅▁▅
BFIK_neuro_4	Ich werde leicht nervös und unsicher.	rating_button	5	haven_labelled	1. 1: Trifft überhaupt nicht zu, 2. 2, 3. 3, 4. 4, 5. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	0	NA	16	0	1	NA	NA	NA	1	2	4	2.500000	1.2018504	NA	6	▆▁▇▁▁▂▁▇
BFIK_agree_2	Ich schenke anderen leicht Vertrauen, glaube an das Gute im Menschen.	rating_button	5	haven_labelled	1. 1: Trifft überhaupt nicht zu, 2. 2, 3. 3, 4. 4, 5. 5: Trifft voll und ganz zu, NA. Item was never rendered for this user.	0	NA	17	0	1	NA	NA	NA	1	4	5	3.500000	1.2619796	NA	6	▂▅▁▅▁▇▁▆
BFIK_agree	4 BFIK_agree items aggregated by aggregation_function	NA	NA	numeric	NA	NA	BFIK_agree_4R, BFIK_agree_1R, BFIK_agree_3R, BFIK_agree_2	NA	0	1	NA	NA	NA	1.5	3.0	4.8	3.116071	0.9316506	NA	NA	▂▇▅▅▃
BFIK_neuro	3 BFIK_neuro items aggregated by aggregation_function	NA	NA	numeric	NA	NA	BFIK_neuro_2R, BFIK_neuro_3, BFIK_neuro_4	NA	0	1	NA	NA	NA	1.3	2.8	4.3	2.892857	0.9254231	NA	NA	▅▇▇▆▇
age	Alter	NA	NA	numeric	NA	NA	NA	NA	0	1	NA	NA	NA	19.0	32.0	38.0	30.500000	4.6706332	NA	NA	▂▂▇▇▅

Codebook example with formr.org data

Ruben Arslan

2024-12-23

Metadata

Description

Survey overview

Variables

Scale: BFIK_agree

Overview

Reliability details

Reliability

95% Confidence Interval

Reliability if an item is dropped:

Item statistics

Non missing response frequency for each item

Summary statistics

Scale: BFIK_neuro

Overview

Reliability details

Reliability

95% Confidence Interval

Reliability if an item is dropped:

Item statistics

Non missing response frequency for each item

Summary statistics

age

Distribution

Summary statistics

Missingness report

Codebook table