Tutorial

Adding and changing metadata

Variable labels

The last codebook you generated could already be useful if the variables had meaningful names and self-explanatory values. Unfortunately, this is rarely the case. Generally, you will need more metadata: labels for variables and values, a dataset description, and so on. The codebook package can use metadata that are stored in R attributes. Attributes in R are most commonly used to store the type of a variable; for instance, datetime in R is just a number with two attributes (a time zone and a class). However, they can just as easily store other metadata; the Hmisc (Harrell, 2018), haven (Wickham & Miller, 2018), and rio (Chan & Leeper, 2018) packages, for example, use attributes to store labels. The benefit of storing variable metadata in attributes is that even datasets that are the product of merging and processing raw data retain the necessary metadata. The haven and rio packages set these attributes when importing data from SPSS or Stata files. However, it is also easy to add metadata yourself:

attributes(codebook_data$C5)$label <- "Waste my time."

You have just assigned a new label to a variable. Because it is inconvenient to do this over and over again, the labelled package (Larmarange, 2018) adds a few convenience functions. Load the labelled package by writing the following in your codebook.Rmd

library(labelled)

Now label the C5 item.

var_label(codebook_data$C5) <- "Waste my time."

You can also label values in this manner (label in quotes before the equal sign, value after):

val_labels(codebook_data$C1) <- c("Very Inaccurate" = 1, "Very Accurate" = 6)

Write these labelling commands after loading the dataset and click “Knit” again. As you can see in the viewer pane, the graph for the C1 variable now has a label at the top and the lowest and highest values on the X axis are labelled. If the prospect of adding labels for every single variable seems tedious, do not fear. Many researchers already have a codebook in the form of a spreadsheet that they want to import in order to avoid entering labels one-by-one. The bfi dataset in the psych package is a good example of this, because it comes with a tabular dictionary. On the line after loading the bfi data, type the following to import this data dictionary:

if (knit_by_pkgdown) {
  dict <- rio::import("https://osf.io/download/cs678", "csv")
} else {
  dict_path <- system.file("extdata", "bfi_dict_tutorial.rds", package = "codebook")
  dict <- readRDS(dict_path)
}

Original source for the full dictionary: Open Science Framework

To see what you just loaded, click the “dict” row in the environment tab in the top right panel. As you can see, the dictionary has information on the constructs on which this item loads and on the direction with which it should load on the construct. You can make these metadata usable through the codebook package. You will often need to work on the data frames to help you do this; to make this easier, use the dplyr package (Wickham, François, Henry, & Müller, 2018). Load it by typing the following

library(dplyr)

Your next goal is to use the variable labels that are already in the dictionary. Because you want to label many variables at once, you need a list of variable labels. Instead of assigning one label to one variable as you just did, you can assign many labels to the whole dataset from a named list. Here, each element of the list is one item that you want to label.

var_label(codebook_data) <- list(
        C5 = "Waste my time.", 
        C1 = "Am exacting in my work."
)

There are already a list of variables and labels in your data dictionary that you can use, so you do not have to perform the tedious task of writing out the list. You do have to reshape it slightly though, because it is currently in the form of a rectangular data frame, not a named list. To do so, use a convenience function from the codebook function called dict_to_list. This function expects to receive a data frame with two columns: the first should be the variable names, the second the variable labels. To select these columns, use the select function from the dplyr package. You will also need to use a special operator, called a pipe, which looks like this %>% and allows you to read and write R code from left to right, almost like an English sentence. First, you need to take the dict dataset, then select the variable and label columns, then use the dict_to_list function. You also need to assign the result of this operation to become the variable labels of codebook_data. You can do all this in a single line using pipes. Add the following line after importing the dictionary.

var_label(codebook_data) <- dict %>% select(variable, label) %>% dict_to_list()

Click “codebook_data” in the Environment tab again. You should now see the variable labels below the variable names. If you click “Knit” again, you will see that your codebook now contains the variable labels. They are both part of the plots and part of the codebook table at the end. They are also part of the metadata that can be found using, for example, Google Dataset Search, but this will not be visible to you.

Value labels

So far, so good. But you may have noticed that education is shown as a number. Does this indicate years of education? The average is 3, so that seems unlikely. In fact, these numbers signify levels of education. In the dict data frame, you can see that there are are value labels for the levels of this variable. However, these levels of education are abbreviated, and you can probably imagine that it would be difficult for an automated program to understand how these map to the values in your dataset. You can do better, using another function from the labelled package: not var_label this time, but val_labels. Unlike var_label, val_labels expects not just one label, but a named vector, with a name for each value that you want to label. You do not need to label all values. Named vectors are created using the c() function. Add the following lines right after the last one.

val_labels(codebook_data$gender) <- c("male" = 1, "female" = 2)
val_labels(codebook_data$education) <- c("in high school" = 1,
   "finished high school" = 2,
              "some college" = 3, 
               "college graduate" = 4, 
              "graduate degree" = 5)

Click the “Knit” button. The bars in the graphs for education and gender should now be labelled. Now, on to the many Likert items, which all have the same value labels. You could assign them in the same way you did for gender and education, entering the lines for each variable over and over, or you could let a function do the job for you instead. Creating a function is actually very simple. Just pick a name, ideally one to remember it by—I chose add_likert_labels—and assign the keyword function followed by two different kinds of brackets. Round brackets surround the variable x. The x here is a placeholder for the many variables you will use this function for in the next step. Curly braces show that you intend to write out what you plan to do with the variable x. Inside the curly braces, use the val_labels function from above and assign a named vector.

add_likert_labels <- function(x) {
  val_labels(x) <- c("Very Inaccurate" = 1, 
                  "Moderately Inaccurate" = 2, 
                  "Slightly Inaccurate" = 3,
                  "Slightly Accurate" = 4,
                  "Moderately Accurate" = 5,
                  "Very Accurate" = 6)
  x
}

A function is just a tool and does nothing on its own; you have not used it yet. To use it only on the Likert items, you need a list of them. An easy way to achieve this is to subset the dict dataframe to only take those variables that are part of the Big Six. To do so, use the filter and pull functions from the dplyr package.

likert_items <- dict %>% filter(Big6 != "") %>% pull(variable)

To apply your new function to these items, use another function from the dplyr package called mutate_at. It expects a list of variables and a function to apply to each. You have both! You can now add value labels to all Likert items in the codebook_data.

codebook_data <- codebook_data %>% mutate_at(likert_items,  add_likert_labels)

Click “Knit” again. All items should now have value labels. However, this display is quite repetitive. How about grouping the items by the factor that they are supposed to load on? And while you are at it, how can the metadata about keying (or reverse-coded items) in your dictionary become part of the dataset?

Adding scales

The codebook package relies on a simple convention to be able to summarise psychological scales, such as the Big Five dimension extraversion, which are aggregates across several items. Your next step will be to assign a new variable, extraversion, to the result of selecting all extraversion items in the data and passing them to the aggregate_and_document_scale function. This function takes the mean of its inputs and assigns a label to the result, so that you can still tell which variables it is an aggregate of.

codebook_data$extraversion <- codebook_data %>% select(E1:E5) %>% aggregate_and_document_scale()

Try knitting now. In the resulting codebook, the items for extraversion have been grouped in one graph. In addition, several internal consistency coefficients have been calculated. However, they are oddly low. You need to reverse items which negatively load on the extraversion factor, such as “Don’t talk a lot.” To do so, I suggest following a simple convention early on, when you come up with names for your items—namely the format scale_numberR (e.g., bfi_extra_1R for a reverse-coded extraversion item, bfi_neuro_2 for a neuroticism item). That way, the analyst always knows how an item relates to a scale. This information is encoded in the data dictionary from the data you just imported. Rename the reverse-coded items so that you cannot forget about its direction. First, you need to grab all items with a negative keying from your dictionary. Add the following three lines above the aggregate_and_document_scale() line from above.

reversed_items <- dict %>% filter(Keying == -1) %>% pull(variable)

You can see in your Environment tab that names such as A1, C4, and C5 are now stored in the reversed_items vector. You can now refer to this vector using the rename_at function, which applies a function to all variables you list. Use the very simple function add_R, which does exactly what its name indicates.

codebook_data <- codebook_data %>% 
  rename_at(reversed_items,  add_R)

Click “codebook_data” in the Environment tab and you will see that some variables have been renamed: A1R, C4R, and C5R, and so on. This could lead to an ambiguity: Does the suffix R means “should be reversed before aggregation” or “has already been reversed”? With the help of metadata in the form of labelled values, there is no potential for confusion. You can reverse the underlying values, but keep the value labels right. So, if somebody responded “Very accurate,” that remains the case, but the underlying value will switch from 6 to 1 for a reversed item. The data you generally import will rarely include labels that remain correct regardless of whether underlying values are reversed, but the codebook package makes it easy to bring the data into this shape. A command using dplyr functions and the reverse_labelled_values function can easily remedy this.

codebook_data <- codebook_data %>% 
    mutate_at(vars(matches("\\dR$")), reverse_labelled_values)

All this statement does is find variable names which end with a number (\d is the regular expression codeword for a number; a dollar sign denotes the end of the string) and R and reverse them. Because the extraversion items have been renamed, we have to amend our scale aggregation line slightly.

codebook_data$extraversion <- codebook_data %>% select(E1R:E5) %>% aggregate_and_document_scale()

Try knitting again. The reliability for the extraversion scale should be much higher and all items should load positively. Adding further scales is easy: Just repeat the above line, changing the names of the scale and the items. Adding scales that integrate smaller scales is also straightforward. The data dictionary mentions the Giant Three—try adding one, Plasticity, which subsumes Extraversion and Openness.

codebook_data$plasticity <- codebook_data %>% select(E1R:E5, O1:O5R) %>% aggregate_and_document_scale()

Note that writing E1R:E5 only works if the items are all in order in your dataset. If you mixed items across constructs, you will need a different way to select them. One option is to list all items, writing select(E1R, E2R, E3, E4, E5). This can get tedious when listing many items. Another solution is to write select(starts_with("E")). Although this is quite elegant, it will not work in this case because you have more than one variable that starts with E; this command would include education items along with the extraversion items you want. This is a good reason to give items descriptive stems such as extraversion_ or bfik_extra. Longer stems not only make confusion less likely, they also make it possible for you to refer to groups of items by their stems, and ideally to their aggregates by only the stem. If you have already named your item too minimally, another solution is to use a regular expression, as I introduced above for matching reversed items. In this scenario, select(matches("^E\\dR?$")) would work.

The Codebook

Metadata

Description

Dataset name: 25 Personality items representing 5 factors

25 personality self report items taken from the International Personality Item Pool (ipip.ori.org)[…]

Metadata for search engines

Temporal Coverage: Spring 2010
Spatial Coverage: Online
Citation: Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods for examining the personality-cognition link. In A. Gruszka, G. Matthews, & B. Szymura (Eds.), Handbook of individual differences in cognition: Attention, memory, and executive control (pp. 27–49). New York, NY: Springer.
URL: https://CRAN.R-project.org/package=psych
Identifier: https://dx.doi.org/10.17605/OSF.IO/K39BG
Date published: 2010-01-01
Creator:

name	value
1	William Revelle

name	value
keywords	A1R, A2, A3, A4, A5, C1, C2, C3, C4R, C5R, E1R, E2R, E3, E4, E5, N1R, N2R, N3R, N4R, N5R, O1, O2R, O3, O4, O5R, gender, education, age, extraversion, plasticity

Variables

A1R

Am indifferent to the feelings of others.

Distribution

Distribution of values for A1R

1 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
A1R	Am indifferent to the feelings of others.	haven_labelled	1	0.9966667	1	5	6	4.662207	1.376801	6	▁▂▁▃▃▁▆▇

Value labels

Response choices
name	value
Very Inaccurate	6
Moderately Inaccurate	5
Slightly Inaccurate	4
Slightly Accurate	3
Moderately Accurate	2
Very Accurate	1

A2

Inquire about others’ well-being.

Distribution

Distribution of values for A2

2 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
A2	Inquire about others’ well-being.	haven_labelled	2	0.9933333	1	5	6	4.781879	1.132356	6	▁▁▁▂▅▁▇▇

Value labels

Response choices
name	value
Very Inaccurate	1
Moderately Inaccurate	2
Slightly Inaccurate	3
Slightly Accurate	4
Moderately Accurate	5
Very Accurate	6

A3

Know how to comfort others.

Distribution

Distribution of values for A3

3 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
A3	Know how to comfort others.	haven_labelled	3	0.99	1	5	6	4.632997	1.28004	6	▁▂▁▁▅▁▇▆

Value labels

Response choices
name	value
Very Inaccurate	1
Moderately Inaccurate	2
Slightly Inaccurate	3
Slightly Accurate	4
Moderately Accurate	5
Very Accurate	6

A4

Love children.

Distribution

Distribution of values for A4

1 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
A4	Love children.	haven_labelled	1	0.9966667	1	5	6	4.491639	1.506952	6	▁▂▁▃▃▁▆▇

Value labels

Response choices
name	value
Very Inaccurate	1
Moderately Inaccurate	2
Slightly Inaccurate	3
Slightly Accurate	4
Moderately Accurate	5
Very Accurate	6

A5

Make people feel at ease.

Distribution

Distribution of values for A5

0 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
A5	Make people feel at ease.	haven_labelled	0	1	1	5	6	4.46	1.272976	6	▁▂▁▃▅▁▇▅

Value labels

Response choices
name	value
Very Inaccurate	1
Moderately Inaccurate	2
Slightly Inaccurate	3
Slightly Accurate	4
Moderately Accurate	5
Very Accurate	6

C1

Am exacting in my work.

Distribution

Distribution of values for C1

1 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
C1	Am exacting in my work.	haven_labelled	1	0.9966667	1	5	6	4.41806	1.243429	6	▁▂▁▂▅▁▇▃

Value labels

Response choices
name	value
Very Inaccurate	1
Moderately Inaccurate	2
Slightly Inaccurate	3
Slightly Accurate	4
Moderately Accurate	5
Very Accurate	6

C2

Continue until everything is perfect.

Distribution

Distribution of values for C2

1 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
C2	Continue until everything is perfect.	haven_labelled	1	0.9966667	1	4	6	4.187291	1.302439	6	▁▂▁▃▇▁▇▃

Value labels

Response choices
name	value
Very Inaccurate	1
Moderately Inaccurate	2
Slightly Inaccurate	3
Slightly Accurate	4
Moderately Accurate	5
Very Accurate	6

C3

Do things according to a plan.

Distribution

Distribution of values for C3

2 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
C3	Do things according to a plan.	haven_labelled	2	0.9933333	1	4	6	4.291946	1.227629	6	▁▂▁▃▇▁▇▃

Value labels

Response choices
name	value
Very Inaccurate	1
Moderately Inaccurate	2
Slightly Inaccurate	3
Slightly Accurate	4
Moderately Accurate	5
Very Accurate	6

C4R

Do things in a half-way manner.

Distribution

Distribution of values for C4R

0 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
C4R	Do things in a half-way manner.	haven_labelled	0	1	1	5	6	4.283333	1.376969	6	▁▂▁▆▅▁▇▆

Value labels

Response choices
name	value
Very Inaccurate	6
Moderately Inaccurate	5
Slightly Inaccurate	4
Slightly Accurate	3
Moderately Accurate	2
Very Accurate	1

C5R

Waste my time.

Distribution

Distribution of values for C5R

0 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
C5R	Waste my time.	haven_labelled	0	1	1	3	6	3.526667	1.674905	6	▅▇▁▇▃▁▆▆

Value labels

Response choices
name	value
Very Inaccurate	6
Moderately Inaccurate	5
Slightly Inaccurate	4
Slightly Accurate	3
Moderately Accurate	2
Very Accurate	1

N1R

Get angry easily.

Distribution

Distribution of values for N1R

2 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
N1R	Get angry easily.	haven_labelled	2	0.9933333	1	4	6	4.07047	1.543445	6	▂▅▁▇▅▁▇▇

Value labels

Response choices
name	value
Very Inaccurate	6
Moderately Inaccurate	5
Slightly Inaccurate	4
Slightly Accurate	3
Moderately Accurate	2
Very Accurate	1

N2R

Get irritated easily.

Distribution

Distribution of values for N2R

1 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
N2R	Get irritated easily.	haven_labelled	1	0.9966667	1	3	6	3.501672	1.454857	6	▂▇▁▇▆▁▇▃

Value labels

Response choices
name	value
Very Inaccurate	6
Moderately Inaccurate	5
Slightly Inaccurate	4
Slightly Accurate	3
Moderately Accurate	2
Very Accurate	1

N3R

Have frequent mood swings.

Distribution

Distribution of values for N3R

0 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
N3R	Have frequent mood swings.	haven_labelled	0	1	1	4	6	3.993333	1.532247	6	▂▃▁▅▃▁▇▅

Value labels

Response choices
name	value
Very Inaccurate	6
Moderately Inaccurate	5
Slightly Inaccurate	4
Slightly Accurate	3
Moderately Accurate	2
Very Accurate	1

N4R

Often feel blue.

Distribution

Distribution of values for N4R

4 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
N4R	Often feel blue.	haven_labelled	4	0.9866667	1	4	6	3.878378	1.460928	6	▂▃▁▆▅▁▇▃

Value labels

Response choices
name	value
Very Inaccurate	6
Moderately Inaccurate	5
Slightly Inaccurate	4
Slightly Accurate	3
Moderately Accurate	2
Very Accurate	1

N5R

Panic easily.

Distribution

Distribution of values for N5R

3 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
N5R	Panic easily.	haven_labelled	3	0.99	1	4	6	4.037037	1.555931	6	▃▃▁▇▆▁▇▇

Value labels

Response choices
name	value
Very Inaccurate	6
Moderately Inaccurate	5
Slightly Inaccurate	4
Slightly Accurate	3
Moderately Accurate	2
Very Accurate	1

gender

Distribution

Distribution of values for gender

0 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
gender	gender	haven_labelled	0	1	1	2	2	1.586667	0.4932544	2	▆▁▁▁▁▁▁▇

Value labels

Response choices
name	value
male	1
female	2

education

Distribution

Distribution of values for education

45 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
education	education	haven_labelled	45	0.85	1	3	5	3.270588	1.22688	5	▂▂▁▇▁▂▁▅

Value labels

Response choices
name	value
in high school	1
finished high school	2
some college	3
college graduate	4
graduate degree	5

age

Distribution

Distribution of values for age

0 missing values.

Summary statistics

name	label	data_type	n_missing	complete_rate	min	median	max	mean	sd	hist
age	age	numeric	0	1	14	23	68	26.24667	10.25784	▇▃▁▁▁

Scale: extraversion

Overview

Reliability: Cronbach’s α [95% CI] = 0.77 [0.73;0.81].

Missing: 6.

Likert plot of scale extraversion items

Distribution of scale extraversion

Reliability details

Reliability

95% Confidence Interval

lower	estimate	upper
0.7347327	0.7744404	0.8141481

	raw_alpha	std.alpha	G6(smc)	average_r	S/N	ase	mean	sd	median_r
	0.7744404	0.7735745	0.7410532	0.4059263	3.416465	0.020259	4.163	1.046022	0.3859404

Reliability if an item is dropped:

	raw_alpha	std.alpha	G6(smc)	average_r	S/N	alpha se	var.r	med.r
E1R	0.7204801	0.7208681	0.6649939	0.3923314	2.582535	0.0261335	0.0023785	0.3859404
E2R	0.7045566	0.7068611	0.6489878	0.3761067	2.411353	0.0275699	0.0021530	0.3595373
E3	0.7518839	0.7500611	0.7008699	0.4286513	3.000978	0.0229248	0.0071417	0.4325061
E4	0.7307827	0.7308567	0.6833309	0.4043624	2.715493	0.0249602	0.0062545	0.3841405
E5	0.7516705	0.7497001	0.6999799	0.4281799	2.995207	0.0229559	0.0065778	0.4117249

Item statistics

	n	raw.r	std.r	r.cor	r.drop	mean	sd
E1R	300	0.7700636	0.7469111	0.6664852	0.5844861	4.176667	1.610622
E2R	298	0.7871682	0.7737883	0.7094964	0.6250330	3.906040	1.527932
E3	299	0.6656335	0.6867448	0.5576596	0.4882306	3.956522	1.301051
E4	299	0.7299222	0.7269809	0.6240849	0.5531044	4.381271	1.445353
E5	298	0.6661609	0.6875256	0.5602102	0.4889012	4.402685	1.304844

Non missing response frequency for each item

	1	2	3	4	5	6	miss
E1R	0.0800000	0.1133333	0.1366667	0.1533333	0.2533333	0.2633333	0.0000000
E2R	0.0604027	0.1510067	0.2181208	0.1442953	0.2449664	0.1812081	0.0066667
E3	0.0434783	0.1036789	0.1806020	0.3110368	0.2474916	0.1137124	0.0033333
E4	0.0334448	0.1137124	0.1237458	0.1538462	0.3177258	0.2575251	0.0033333
E5	0.0302013	0.0738255	0.1174497	0.2281879	0.3422819	0.2080537	0.0066667

Summary statistics

name	label	data_type	value_labels	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
E1R	Don’t talk a lot.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	0	1.0000000	1	5	6	4.176667	1.610622	6	▂▃▁▅▅▁▇▇
E2R	Find it difficult to approach others.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	2	0.9933333	1	4	6	3.906040	1.527932	6	▂▅▁▇▅▁▇▆
E3	Know how to captivate people.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	1	0.9966667	1	4	6	3.956522	1.301051	6	▁▂▁▅▇▁▆▃
E4	Make friends easily.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	1	0.9966667	1	5	6	4.381271	1.445353	6	▁▃▁▃▃▁▇▆
E5	Take charge.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	2	0.9933333	1	5	6	4.402685	1.304844	6	▁▂▁▃▅▁▇▅

Scale: plasticity

Overview

Reliability: Cronbach’s α [95% CI] = 0.73 [0.68;0.78].

Missing: 10.

Likert plot of scale plasticity items

Distribution of scale plasticity

Reliability details

Reliability

95% Confidence Interval

lower	estimate	upper
0.6846984	0.7299705	0.7752426

	raw_alpha	std.alpha	G6(smc)	average_r	S/N	ase	mean	sd	median_r
	0.7299705	0.7328124	0.7611879	0.2152363	2.742689	0.023098	4.356259	0.7419752	0.2406108

Reliability if an item is dropped:

	raw_alpha	std.alpha	G6(smc)	average_r	S/N	alpha se	var.r	med.r
E1R	0.6978393	0.7048568	0.7299326	0.2097073	2.388186	0.0262187	0.0228582	0.2289372
E2R	0.6866966	0.6960338	0.7136576	0.2028230	2.289839	0.0272275	0.0198934	0.2289372
E3	0.6974777	0.7002771	0.7296854	0.2060982	2.336415	0.0260940	0.0275501	0.2078927
E4	0.7056746	0.7120188	0.7371356	0.2155119	2.472449	0.0254654	0.0236430	0.2415198
E5	0.6922044	0.6964269	0.7283938	0.2031237	2.294100	0.0265819	0.0287037	0.1984957
O1	0.7052966	0.7044824	0.7356852	0.2094093	2.383894	0.0251979	0.0311831	0.2289372
O2R	0.7351700	0.7324679	0.7596215	0.2332510	2.737870	0.0226011	0.0299195	0.2549686
O3	0.6891396	0.6872341	0.7153288	0.1962334	2.197280	0.0265895	0.0296954	0.1782063
O4	0.7468721	0.7523346	0.7663361	0.2523492	3.037706	0.0218617	0.0191781	0.2489360
O5R	0.7211453	0.7218962	0.7487828	0.2238555	2.595779	0.0237567	0.0292759	0.2444104

Item statistics

	n	raw.r	std.r	r.cor	r.drop	mean	sd
E1R	300	0.6165587	0.5786794	0.5313795	0.4528424	4.176667	1.610622
E2R	298	0.6566691	0.6244095	0.6069212	0.5141378	3.906040	1.527932
E3	299	0.5949015	0.6026533	0.5445428	0.4640599	3.956522	1.301051
E4	299	0.5641167	0.5401217	0.4766103	0.4072927	4.381271	1.445353
E5	298	0.6241080	0.6224119	0.5646735	0.4982226	4.402685	1.304844
O1	299	0.5454705	0.5806588	0.5079744	0.4206705	4.698997	1.162775
O2R	300	0.4426466	0.4222871	0.2994091	0.2448687	4.256667	1.599787
O3	297	0.6432940	0.6681822	0.6339167	0.5356929	4.427609	1.186486
O4	300	0.2521758	0.2954242	0.1737646	0.0944162	4.866667	1.177644
O5R	300	0.4650927	0.4846979	0.3891324	0.3055567	4.493333	1.337630

Non missing response frequency for each item

	1	2	3	4	5	6	miss
E1R	0.0800000	0.1133333	0.1366667	0.1533333	0.2533333	0.2633333	0.0000000
E2R	0.0604027	0.1510067	0.2181208	0.1442953	0.2449664	0.1812081	0.0066667
E3	0.0434783	0.1036789	0.1806020	0.3110368	0.2474916	0.1137124	0.0033333
E4	0.0334448	0.1137124	0.1237458	0.1538462	0.3177258	0.2575251	0.0033333
E5	0.0302013	0.0738255	0.1174497	0.2281879	0.3422819	0.2080537	0.0066667
O1	0.0066890	0.0501672	0.0903010	0.2307692	0.3344482	0.2876254	0.0033333
O2R	0.0766667	0.0933333	0.1500000	0.1466667	0.2433333	0.2900000	0.0000000
O3	0.0202020	0.0471380	0.1279461	0.2828283	0.3333333	0.1885522	0.0100000
O4	0.0100000	0.0500000	0.0600000	0.1866667	0.3300000	0.3633333	0.0000000
O5R	0.0300000	0.0666667	0.1200000	0.2133333	0.3033333	0.2666667	0.0000000

Summary statistics

name	label	data_type	value_labels	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
E1R	Don’t talk a lot.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	0	1.0000000	1	5	6	4.176667	1.610622	6	▂▃▁▅▅▁▇▇
E2R	Find it difficult to approach others.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	2	0.9933333	1	4	6	3.906040	1.527932	6	▂▅▁▇▅▁▇▆
E3	Know how to captivate people.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	1	0.9966667	1	4	6	3.956522	1.301051	6	▁▂▁▅▇▁▆▃
E4	Make friends easily.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	1	0.9966667	1	5	6	4.381271	1.445353	6	▁▃▁▃▃▁▇▆
E5	Take charge.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	2	0.9933333	1	5	6	4.402685	1.304844	6	▁▂▁▃▅▁▇▅
O1	Am full of ideas.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	1	0.9966667	1	5	6	4.698997	1.162775	6	▁▁▁▂▆▁▇▇
O2R	Avoid difficult reading material.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	0	1.0000000	1	5	6	4.256667	1.599787	6	▂▂▁▅▅▁▇▇
O3	Carry the conversation to a higher level.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	3	0.9900000	1	5	6	4.427609	1.186486	6	▁▁▁▃▇▁▇▅
O4	Spend time reflecting on things.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	0	1.0000000	1	5	6	4.866667	1.177644	6	▁▁▁▁▅▁▇▇
O5R	Will not probe deeply into a subject.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	0	1.0000000	1	5	6	4.493333	1.337630	6	▁▂▁▃▆▁▇▇

Missingness report

Codebook table

name	label	data_type	value_labels	scale_item_names	n_missing	complete_rate	min	median	max	mean	sd	n_value_labels	hist
A1R	Am indifferent to the feelings of others.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	1	0.9966667	1	5	6	4.662207	1.3768008	6	▁▂▁▃▃▁▆▇
A2	Inquire about others’ well-being.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	2	0.9933333	1	5	6	4.781879	1.1323557	6	▁▁▁▂▅▁▇▇
A3	Know how to comfort others.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	3	0.9900000	1	5	6	4.632997	1.2800399	6	▁▂▁▁▅▁▇▆
A4	Love children.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	1	0.9966667	1	5	6	4.491639	1.5069516	6	▁▂▁▃▃▁▆▇
A5	Make people feel at ease.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	0	1.0000000	1	5	6	4.460000	1.2729761	6	▁▂▁▃▅▁▇▅
C1	Am exacting in my work.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	1	0.9966667	1	5	6	4.418060	1.2434290	6	▁▂▁▂▅▁▇▃
C2	Continue until everything is perfect.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	1	0.9966667	1	4	6	4.187291	1.3024393	6	▁▂▁▃▇▁▇▃
C3	Do things according to a plan.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	2	0.9933333	1	4	6	4.291946	1.2276290	6	▁▂▁▃▇▁▇▃
C4R	Do things in a half-way manner.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	0	1.0000000	1	5	6	4.283333	1.3769685	6	▁▂▁▆▅▁▇▆
C5R	Waste my time.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	0	1.0000000	1	3	6	3.526667	1.6749049	6	▅▇▁▇▃▁▆▆
E1R	Don’t talk a lot.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	0	1.0000000	1	5	6	4.176667	1.6106218	6	▂▃▁▅▅▁▇▇
E2R	Find it difficult to approach others.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	2	0.9933333	1	4	6	3.906040	1.5279320	6	▂▅▁▇▅▁▇▆
E3	Know how to captivate people.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	1	0.9966667	1	4	6	3.956522	1.3010512	6	▁▂▁▅▇▁▆▃
E4	Make friends easily.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	1	0.9966667	1	5	6	4.381271	1.4453526	6	▁▃▁▃▃▁▇▆
E5	Take charge.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	2	0.9933333	1	5	6	4.402685	1.3048444	6	▁▂▁▃▅▁▇▅
N1R	Get angry easily.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	2	0.9933333	1	4	6	4.070470	1.5434451	6	▂▅▁▇▅▁▇▇
N2R	Get irritated easily.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	1	0.9966667	1	3	6	3.501672	1.4548567	6	▂▇▁▇▆▁▇▃
N3R	Have frequent mood swings.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	0	1.0000000	1	4	6	3.993333	1.5322472	6	▂▃▁▅▃▁▇▅
N4R	Often feel blue.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	4	0.9866667	1	4	6	3.878378	1.4609280	6	▂▃▁▆▅▁▇▃
N5R	Panic easily.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	3	0.9900000	1	4	6	4.037037	1.5559309	6	▃▃▁▇▆▁▇▇
O1	Am full of ideas.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	1	0.9966667	1	5	6	4.698997	1.1627751	6	▁▁▁▂▆▁▇▇
O2R	Avoid difficult reading material.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	0	1.0000000	1	5	6	4.256667	1.5997875	6	▂▂▁▅▅▁▇▇
O3	Carry the conversation to a higher level.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	3	0.9900000	1	5	6	4.427609	1.1864858	6	▁▁▁▃▇▁▇▅
O4	Spend time reflecting on things.	haven_labelled	1. Very Inaccurate, 2. Moderately Inaccurate, 3. Slightly Inaccurate, 4. Slightly Accurate, 5. Moderately Accurate, 6. Very Accurate	NA	0	1.0000000	1	5	6	4.866667	1.1776439	6	▁▁▁▁▅▁▇▇
O5R	Will not probe deeply into a subject.	haven_labelled	6. Very Inaccurate, 5. Moderately Inaccurate, 4. Slightly Inaccurate, 3. Slightly Accurate, 2. Moderately Accurate, 1. Very Accurate	NA	0	1.0000000	1	5	6	4.493333	1.3376296	6	▁▂▁▃▆▁▇▇
gender	gender	haven_labelled	1. male, 2. female	NA	0	1.0000000	1	2	2	1.586667	0.4932544	2	▆▁▁▁▁▁▁▇
education	education	haven_labelled	1. in high school, 2. finished high school, 3. some college, 4. college graduate, 5. graduate degree	NA	45	0.8500000	1	3	5	3.270588	1.2268797	5	▂▂▁▇▁▂▁▅
age	age	numeric	NA	NA	0	1.0000000	14.0	23.0	68	26.246667	10.2578376	NA	▇▃▁▁▁
extraversion	5 E items aggregated by rowMeans	numeric	NA	E1R, E2R, E3, E4, E5	6	0.9800000	1.0	4.2	6	4.171429	1.0472783	NA	▁▃▇▇▅
plasticity	10 items aggregated by rowMeans	numeric	NA	E1R, E2R, E3, E4, E5, O1, O2R, O3, O4, O5R	10	0.9666667	2.2	4.5	6	4.362414	0.7442985	NA	▁▃▆▇▂

Tutorial

Ruben Arslan

2026-03-02

Loading data

Adding and changing metadata

Variable labels

Value labels

Adding scales

Metadata about the entire dataset

Exporting and sharing the data with metadata

Releasing the codebook publicly

The Codebook

Metadata

Description

Variables

A1R

Distribution

Summary statistics

Value labels

A2

Distribution

Summary statistics

Value labels

A3

Distribution

Summary statistics

Value labels

A4

Distribution

Summary statistics

Value labels

A5

Distribution

Summary statistics

Value labels

C1

Distribution

Summary statistics

Value labels

C2

Distribution

Summary statistics

Value labels

C3

Distribution

Summary statistics

Value labels

C4R

Distribution

Summary statistics

Value labels

C5R

Distribution

Summary statistics

Value labels

N1R

Distribution

Summary statistics

Value labels

N2R

Distribution

Summary statistics

Value labels

N3R

Distribution

Summary statistics

Value labels

N4R

Distribution

Summary statistics

Value labels

N5R

Distribution

Summary statistics

Value labels

gender

Distribution

Summary statistics

Value labels

education