Tutorial

Ruben Arslan

2020-06-06

This is the practical part of a tutorial manuscript for this package, which you can find in full on PsyArXiv or if you prefer a copy-edited open access version it is published in Advances in Methods and Practices in Psychological Science.

You can cite it, if you use the package in a publication:

Arslan, R. C. (2019). How to automatically document data with the codebook package to facilitate data re-use. Advances in Methods and Practices in Psychological Science, 2(2), 169–187. https://doi.org/10.1177/2515245919838783

Using the codebook package locally in RStudio

knit_by_pkgdown <- !is.null(knitr::opts_chunk$get("fig.retina"))
knitr::opts_chunk$set(
  warning = TRUE, # show warnings during codebook generation
  message = TRUE, # show messages during codebook generation
  error = TRUE, # do not interrupt codebook generation in case of errors,
                # TRUE is usually better for debugging
  echo = TRUE  # show R code
)
ggplot2::theme_set(ggplot2::theme_bw())

Loading data

It is time to load some data. In this Tutorial, I will walk you through the process by using the “bfi” dataset made available in the psych package (Goldberg, 1999; Revelle et al., 2016; Revelle, Wilt, & Rosenthal, 2010). The bfi dataset is already very well documented in the psych R package, but using the codebook package, we can add automatically computed reliabilities, graphs, and machine-readable metadata to the mix. The dataset is available within R, but this will not usually be the case; I have therefore uploaded it to the OSF, which also features many other publicly available datasets. A new package in R, rio (Chan & Leeper, 2018), makes loading data from websites in almost any format as easy as loading local data. You can import the dataset directly from the OSF by replacing the line

library(codebook)
codebook_data <- codebook::bfi

with

codebook_data <- rio::import("https://osf.io/s87kd/download", "csv")

on line 34. R Markdown documents have to be reproducible and self-contained, so it is not enough for a dataset to be loaded locally; you must load the dataset at the beginning of the document. You can also use the document interactively, although this will not work seamlessly for the codebook package. To see how this works, execute the line you just added by pressing Command + Enter (for a Mac) or Ctrl + Enter (for other platforms).

RStudio has a convenient data viewer you can use to check whether your command worked. In the environment tab on the top right, you should see “codebook_data”. Click that row to open a spreadsheet view of the dataset in RStudio. As you can see, it is not particularly informative. Just long columns of numbers with variable names like A4. Are we talking aggressiveness, agreeableness, or the German industrial norm for paper size? The lack of useful metadata is palpable. Click “Knit” again to see what the codebook package can do with this. This time, it will take longer. Once the result is shown in the viewer tab, scroll through it. You can see a few warnings stating that the package saw items that might form part of a scale, but there was no aggregated scale. You will also see graphs of the distribution for each item and summary statistics.

Adding and changing metadata

Variable labels

The last codebook you generated could already be useful if the variables had meaningful names and self-explanatory values. Unfortunately, this is rarely the case. Generally, you will need more metadata: labels for variables and values, a dataset description, and so on. The codebook package can use metadata that are stored in R attributes. Attributes in R are most commonly used to store the type of a variable; for instance, datetime in R is just a number with two attributes (a time zone and a class). However, they can just as easily store other metadata; the Hmisc (Harrell, 2018), haven (Wickham & Miller, 2018), and rio (Chan & Leeper, 2018) packages, for example, use attributes to store labels. The benefit of storing variable metadata in attributes is that even datasets that are the product of merging and processing raw data retain the necessary metadata. The haven and rio packages set these attributes when importing data from SPSS or Stata files. However, it is also easy to add metadata yourself:

attributes(codebook_data$C5)$label <- "Waste my time."

You have just assigned a new label to a variable. Because it is inconvenient to do this over and over again, the labelled package (Larmarange, 2018) adds a few convenience functions. Load the labelled package by writing the following in your codebook.Rmd

library(labelled)

Now label the C5 item.

var_label(codebook_data$C5) <- "Waste my time."

You can also label values in this manner (label in quotes before the equal sign, value after):

val_labels(codebook_data$C1) <- c("Very Inaccurate" = 1, "Very Accurate" = 6)

Write these labelling commands after loading the dataset and click “Knit” again. As you can see in the viewer pane, the graph for the C1 variable now has a label at the top and the lowest and highest values on the X axis are labelled. If the prospect of adding labels for every single variable seems tedious, do not fear. Many researchers already have a codebook in the form of a spreadsheet that they want to import in order to avoid entering labels one-by-one. The bfi dataset in the psych package is a good example of this, because it comes with a tabular dictionary. On the line after loading the bfi data, type the following to import this data dictionary:

dict <- rio::import("https://osf.io/cs678/download", "csv")

To see what you just loaded, click the “dict” row in the environment tab in the top right panel. As you can see, the dictionary has information on the constructs on which this item loads and on the direction with which it should load on the construct. You can make these metadata usable through the codebook package. You will often need to work on the data frames to help you do this; to make this easier, use the dplyr package (Wickham, François, Henry, & Müller, 2018). Load it by typing the following

library(dplyr)

Your next goal is to use the variable labels that are already in the dictionary. Because you want to label many variables at once, you need a list of variable labels. Instead of assigning one label to one variable as you just did, you can assign many labels to the whole dataset from a named list. Here, each element of the list is one item that you want to label.

var_label(codebook_data) <- list(
        C5 = "Waste my time.", 
        C1 = "Am exacting in my work."
)

There are already a list of variables and labels in your data dictionary that you can use, so you do not have to perform the tedious task of writing out the list. You do have to reshape it slightly though, because it is currently in the form of a rectangular data frame, not a named list. To do so, use a convenience function from the codebook function called dict_to_list. This function expects to receive a data frame with two columns: the first should be the variable names, the second the variable labels. To select these columns, use the select function from the dplyr package. You will also need to use a special operator, called a pipe, which looks like this %>% and allows you to read and write R code from left to right, almost like an English sentence. First, you need to take the dict dataset, then select the variable and label columns, then use the dict_to_list function. You also need to assign the result of this operation to become the variable labels of codebook_data. You can do all this in a single line using pipes. Add the following line after importing the dictionary.

var_label(codebook_data) <- dict %>% select(variable, label) %>% dict_to_list()

Click “codebook_data” in the Environment tab again. You should now see the variable labels below the variable names. If you click “Knit” again, you will see that your codebook now contains the variable labels. They are both part of the plots and part of the codebook table at the end. They are also part of the metadata that can be found using, for example, Google Dataset Search, but this will not be visible to you.

Value labels

So far, so good. But you may have noticed that education is shown as a number. Does this indicate years of education? The average is 3, so that seems unlikely. In fact, these numbers signify levels of education. In the dict data frame, you can see that there are are value labels for the levels of this variable. However, these levels of education are abbreviated, and you can probably imagine that it would be difficult for an automated program to understand how these map to the values in your dataset. You can do better, using another function from the labelled package: not var_label this time, but val_labels. Unlike var_label, val_labels expects not just one label, but a named vector, with a name for each value that you want to label. You do not need to label all values. Named vectors are created using the c() function. Add the following lines right after the last one.

val_labels(codebook_data$gender) <- c("male" = 1, "female" = 2)
val_labels(codebook_data$education) <- c("in high school" = 1,
   "finished high school" = 2,
              "some college" = 3, 
               "college graduate" = 4, 
              "graduate degree" = 5)

Click the “Knit” button. The bars in the graphs for education and gender should now be labelled. Now, on to the many Likert items, which all have the same value labels. You could assign them in the same way you did for gender and education, entering the lines for each variable over and over, or you could let a function do the job for you instead. Creating a function is actually very simple. Just pick a name, ideally one to remember it by—I chose add_likert_labels—and assign the keyword function followed by two different kinds of brackets. Round brackets surround the variable x. The x here is a placeholder for the many variables you will use this function for in the next step. Curly braces show that you intend to write out what you plan to do with the variable x. Inside the curly braces, use the val_labels function from above and assign a named vector.

add_likert_labels <- function(x) {
  val_labels(x) <- c("Very Inaccurate" = 1, 
                  "Moderately Inaccurate" = 2, 
                  "Slightly Inaccurate" = 3,
                  "Slightly Accurate" = 4,
                  "Moderately Accurate" = 5,
                  "Very Accurate" = 6)
  x
}

A function is just a tool and does nothing on its own; you have not used it yet. To use it only on the Likert items, you need a list of them. An easy way to achieve this is to subset the dict dataframe to only take those variables that are part of the Big Six. To do so, use the filter and pull functions from the dplyr package.

likert_items <- dict %>% filter(Big6 != "") %>% pull(variable)

To apply your new function to these items, use another function from the dplyr package called mutate_at. It expects a list of variables and a function to apply to each. You have both! You can now add value labels to all Likert items in the codebook_data.

codebook_data <- codebook_data %>% mutate_at(likert_items,  add_likert_labels)

Click “Knit” again. All items should now have value labels. However, this display is quite repetitive. How about grouping the items by the factor that they are supposed to load on? And while you are at it, how can the metadata about keying (or reverse-coded items) in your dictionary become part of the dataset?

Adding scales

The codebook package relies on a simple convention to be able to summarise psychological scales, such as the Big Five dimension extraversion, which are aggregates across several items. Your next step will be to assign a new variable, extraversion, to the result of selecting all extraversion items in the data and passing them to the aggregate_and_document_scale function. This function takes the mean of its inputs and assigns a label to the result, so that you can still tell which variables it is an aggregate of.

codebook_data$extraversion <- codebook_data %>% select(E1:E5) %>% aggregate_and_document_scale()

Try knitting now. In the resulting codebook, the items for extraversion have been grouped in one graph. In addition, several internal consistency coefficients have been calculated. However, they are oddly low. You need to reverse items which negatively load on the extraversion factor, such as “Don’t talk a lot.” To do so, I suggest following a simple convention early on, when you come up with names for your items—namely the format scale_numberR (e.g., bfi_extra_1R for a reverse-coded extraversion item, bfi_neuro_2 for a neuroticism item). That way, the analyst always knows how an item relates to a scale. This information is encoded in the data dictionary from the data you just imported. Rename the reverse-coded items so that you cannot forget about its direction. First, you need to grab all items with a negative keying from your dictionary. Add the following three lines above the aggregate_and_document_scale() line from above.

reversed_items <- dict %>% filter(Keying == -1) %>% pull(variable)

You can see in your Environment tab that names such as A1, C4, and C5 are now stored in the reversed_items vector. You can now refer to this vector using the rename_at function, which applies a function to all variables you list. Use the very simple function add_R, which does exactly what its name indicates.

codebook_data <- codebook_data %>% 
  rename_at(reversed_items,  add_R)

Click “codebook_data” in the Environment tab and you will see that some variables have been renamed: A1R, C4R, and C5R, and so on. This could lead to an ambiguity: Does the suffix R means “should be reversed before aggregation” or “has already been reversed”? With the help of metadata in the form of labelled values, there is no potential for confusion. You can reverse the underlying values, but keep the value labels right. So, if somebody responded “Very accurate,” that remains the case, but the underlying value will switch from 6 to 1 for a reversed item. The data you generally import will rarely include labels that remain correct regardless of whether underlying values are reversed, but the codebook package makes it easy to bring the data into this shape. A command using dplyr functions and the reverse_labelled_values function can easily remedy this.

codebook_data <- codebook_data %>% 
    mutate_at(vars(matches("\\dR$")), reverse_labelled_values)

All this statement does is find variable names which end with a number (\d is the regular expression codeword for a number; a dollar sign denotes the end of the string) and R and reverse them. Because the extraversion items have been renamed, we have to amend our scale aggregation line slightly.

codebook_data$extraversion <- codebook_data %>% select(E1R:E5) %>% aggregate_and_document_scale()

Try knitting again. The reliability for the extraversion scale should be much higher and all items should load positively. Adding further scales is easy: Just repeat the above line, changing the names of the scale and the items. Adding scales that integrate smaller scales is also straightforward. The data dictionary mentions the Giant Three—try adding one, Plasticity, which subsumes Extraversion and Openness.

codebook_data$plasticity <- codebook_data %>% select(E1R:E5, O1:O5R) %>% aggregate_and_document_scale() 

Note that writing E1R:E5 only works if the items are all in order in your dataset. If you mixed items across constructs, you will need a different way to select them. One option is to list all items, writing select(E1R, E2R, E3, E4, E5). This can get tedious when listing many items. Another solution is to write select(starts_with("E")). Although this is quite elegant, it will not work in this case because you have more than one variable that starts with E; this command would include education items along with the extraversion items you want. This is a good reason to give items descriptive stems such as extraversion_ or bfik_extra. Longer stems not only make confusion less likely, they also make it possible for you to refer to groups of items by their stems, and ideally to their aggregates by only the stem. If you have already named your item too minimally, another solution is to use a regular expression, as I introduced above for matching reversed items. In this scenario, select(matches("^E\\dR?$")) would work.

Metadata about the entire dataset

Finally, you might want to sign your work and add a few descriptive words about the entire dataset. If you simply edit the R Markdown document to add a description, this information would not become part of the machine-readable metadata. Metadata (or attributes) of the dataset as a whole are a lot less persistent than metadata about variables. Hence, you should add your description right before calling the codebook function. Adding metadata about the dataset is very simple: Just wrap the metadata function around codebook_data and assign a value to a field. The fields “name” and “description” are required. If you do not edit them, they will be automatically generated based on the data frame name and its contents. To overwrite them, enter the following lines above the call codebook(codebook_data):

metadata(codebook_data)$name <- "25 Personality items representing 5 factors"
metadata(codebook_data)$description <- "25 personality self report items taken from the International Personality Item Pool (ipip.ori.org)[...]"

It is good practice to give datasets a canonical identifier. This way, if a dataset is described in multiple locations, it can still be identified as the same dataset. For instance, when I did this I did not want to use the URL of the R package from which I took the package because URLs can change; instead, I generated a persistent document object identifier (DOI) on the OSF and specified it here.

metadata(codebook_data)$identifier <- "https://dx.doi.org/10.17605/OSF.IO/K39BG"

In order to let others know who they can contact about the dataset, how to cite it, and where to find more information, I set the attributes creator, citation, and URL below.

metadata(codebook_data)$creator <- "William Revelle"
metadata(codebook_data)$citation <- "Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods for examining the personality-cognition link. In A. Gruszka, G. Matthews, & B. Szymura (Eds.), Handbook of individual differences in cognition: Attention, memory, and executive control (pp. 27–49). New York, NY: Springer."
metadata(codebook_data)$url <- "https://CRAN.R-project.org/package=psych"

Lastly, it is useful to note when and where the data was collected, as well as when it was published. Ideally, you would make more specific information available here, but this is all I know about the BFI dataset.

metadata(codebook_data)$datePublished <- "2010-01-01"
metadata(codebook_data)$temporalCoverage <- "Spring 2010" 
metadata(codebook_data)$spatialCoverage <- "Online" 

These attributes are documented in more depth on https://schema.org/Dataset. You can also add attributes that are not documented there, but they will not become part of the machine-readable metadata. Click “Knit” again. In the viewer tab, you can see that the metadata section of the codebook has been populated with your additions.

Exporting and sharing the data with metadata

Having added all the variable-level metadata, you might want to re-use the marked-up data elsewhere or share it with collaborators or the public. You can most easily export it using the rio package (Chan & Leeper, 2018), which permits embedding the variable metadata in the dataset file for those formats that support it. The only way to keep all metadata in one file is by staying in R:

rio::export(codebook_data, "bfi.rds") # to R data structure file

The variable-level metadata can also be transferred to SPSS and Stata files. Please note that this export is based on reverse-engineering the SPSS and Stata file structure, so the resulting files should be tested before sharing.

rio::export(codebook_data, "bfi.sav") # to SPSS file
rio::export(codebook_data, "bfi.dta") # to Stata file

Releasing the codebook publicly

If you want to share your codebook with others, there is a codebook.html file in the project folder you created at the start. You can email it to collaborators or upload it to the OSF file storage. However, if you want Google Dataset Search to index your dataset, this is not sufficient. The OSF will not render your HTML files for security reasons and Google will not index the content of your emails (at least not publicly). You need to post your codebook online. If you are familiar with Github or already have your own website, uploading the html file to your own website should be easy. The simplest way I found for publishing the HTML for the codebook is as follows. First, rename the codebook.html to index.html. Then create an account on netlify.com. Once you’re signed in, drag and drop the folder containing the codebook to the Netlify web page (make sure the folder does not contain anything you do not want to share, such as raw data). Netlify will upload the files and create a random URL like estranged-armadillo.netlify.com. You can change this to something more meaningful, like bfi-study.netlify.com, in the settings. Next, visit the URL to check that you can see the codebook. The last step is to publicly share a link to the codebook so that search engines can discover it; for instance, you could tweet the link with the hashtag #codebook2—ideally with a link from the repository where you are sharing the raw data or the related study’s supplementary material. For instance, I added a link to the bfi-study codebook on the OSF (https://osf.io/k39bg/), where I had also shared the data. Depending on the speed of the search engine crawler, the dataset should be findable on Google Dataset Search in anywhere from three to 21 days.

The Codebook

Metadata

Description

Dataset name: 25 Personality items representing 5 factors

25 personality self report items taken from the International Personality Item Pool (ipip.ori.org)[…]

Metadata for search engines

  • Temporal Coverage: Spring 2010

  • Spatial Coverage: Online

  • Citation: Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods for examining the personality-cognition link. In A. Gruszka, G. Matthews, & B. Szymura (Eds.), Handbook of individual differences in cognition: Attention, memory, and executive control (pp. 27–49). New York, NY: Springer.

  • URL: https://CRAN.R-project.org/package=psych

  • Identifier: https://dx.doi.org/10.17605/OSF.IO/K39BG

  • Date published: 2010-01-01

  • Creator:

name value
1 William Revelle
x
A1R
A2
A3
A4
A5
C1
C2
C3
C4R
C5R
E1R
E2R
E3
E4
E5
N1R
N2R
N3R
N4R
N5R
O1
O2R
O3
O4
O5R
gender
education
age
extraversion
plasticity

##Variables

A1R

Am indifferent to the feelings of others.

Distribution

Distribution of values for A1R

16 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
A1R Am indifferent to the feelings of others. haven_labelled 16 0.9942857 1 5 6 4.586566 1.407737 6 ▁▂▁▃▃▁▇▇
Value labels
Response choices
name value
Very Inaccurate 6
Moderately Inaccurate 5
Slightly Inaccurate 4
Slightly Accurate 3
Moderately Accurate 2
Very Accurate 1

A2

Inquire about others’ well-being.

Distribution

Distribution of values for A2

27 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
A2 Inquire about others’ well-being. haven_labelled 27 0.9903571 1 5 6 4.80238 1.17202 6 ▁▁▁▁▅▁▇▇
Value labels
Response choices
name value
Very Inaccurate 1
Moderately Inaccurate 2
Slightly Inaccurate 3
Slightly Accurate 4
Moderately Accurate 5
Very Accurate 6

A3

Know how to comfort others.

Distribution

Distribution of values for A3

26 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
A3 Know how to comfort others. haven_labelled 26 0.9907143 1 5 6 4.603821 1.301834 6 ▁▂▁▂▅▁▇▆
Value labels
Response choices
name value
Very Inaccurate 1
Moderately Inaccurate 2
Slightly Inaccurate 3
Slightly Accurate 4
Moderately Accurate 5
Very Accurate 6

A4

Love children.

Distribution

Distribution of values for A4

19 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
A4 Love children. haven_labelled 19 0.9932143 1 5 6 4.699748 1.479633 6 ▁▂▁▁▃▁▅▇
Value labels
Response choices
name value
Very Inaccurate 1
Moderately Inaccurate 2
Slightly Inaccurate 3
Slightly Accurate 4
Moderately Accurate 5
Very Accurate 6

A5

Make people feel at ease.

Distribution

Distribution of values for A5

16 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
A5 Make people feel at ease. haven_labelled 16 0.9942857 1 5 6 4.560345 1.258512 6 ▁▂▁▂▅▁▇▆
Value labels
Response choices
name value
Very Inaccurate 1
Moderately Inaccurate 2
Slightly Inaccurate 3
Slightly Accurate 4
Moderately Accurate 5
Very Accurate 6

C1

Am exacting in my work.

Distribution

Distribution of values for C1

21 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
C1 Am exacting in my work. haven_labelled 21 0.9925 1 5 6 4.502339 1.241346 6 ▁▁▁▂▅▁▇▅
Value labels
Response choices
name value
Very Inaccurate 1
Moderately Inaccurate 2
Slightly Inaccurate 3
Slightly Accurate 4
Moderately Accurate 5
Very Accurate 6

C2

Continue until everything is perfect.

Distribution

Distribution of values for C2

24 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
C2 Continue until everything is perfect. haven_labelled 24 0.9914286 1 5 6 4.369957 1.318347 6 ▁▂▁▂▆▁▇▅
Value labels
Response choices
name value
Very Inaccurate 1
Moderately Inaccurate 2
Slightly Inaccurate 3
Slightly Accurate 4
Moderately Accurate 5
Very Accurate 6

C3

Do things according to a plan.

Distribution

Distribution of values for C3

20 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
C3 Do things according to a plan. haven_labelled 20 0.9928571 1 5 6 4.303957 1.288552 6 ▁▂▁▂▆▁▇▅
Value labels
Response choices
name value
Very Inaccurate 1
Moderately Inaccurate 2
Slightly Inaccurate 3
Slightly Accurate 4
Moderately Accurate 5
Very Accurate 6

C4R

Do things in a half-way manner.

Distribution

Distribution of values for C4R

26 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
C4R Do things in a half-way manner. haven_labelled 26 0.9907143 1 5 6 4.446647 1.375118 6 ▁▂▁▅▅▁▇▇
Value labels
Response choices
name value
Very Inaccurate 6
Moderately Inaccurate 5
Slightly Inaccurate 4
Slightly Accurate 3
Moderately Accurate 2
Very Accurate 1

C5R

Waste my time.

Distribution

Distribution of values for C5R

16 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
C5R Waste my time. haven_labelled 16 0.9942857 1 4 6 3.703305 1.628542 6 ▃▆▁▇▅▁▇▆
Value labels
Response choices
name value
Very Inaccurate 6
Moderately Inaccurate 5
Slightly Inaccurate 4
Slightly Accurate 3
Moderately Accurate 2
Very Accurate 1

N1R

Get angry easily.

Distribution

Distribution of values for N1R

22 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
N1R Get angry easily. haven_labelled 22 0.9921429 1 4 6 4.070914 1.570917 6 ▂▅▁▆▅▁▇▇
Value labels
Response choices
name value
Very Inaccurate 6
Moderately Inaccurate 5
Slightly Inaccurate 4
Slightly Accurate 3
Moderately Accurate 2
Very Accurate 1

N2R

Get irritated easily.

Distribution

Distribution of values for N2R

21 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
N2R Get irritated easily. haven_labelled 21 0.9925 1 3 6 3.492263 1.525944 6 ▃▆▁▇▅▁▆▃
Value labels
Response choices
name value
Very Inaccurate 6
Moderately Inaccurate 5
Slightly Inaccurate 4
Slightly Accurate 3
Moderately Accurate 2
Very Accurate 1

N3R

Have frequent mood swings.

Distribution

Distribution of values for N3R

11 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
N3R Have frequent mood swings. haven_labelled 11 0.9960714 1 4 6 3.783435 1.602902 6 ▃▆▁▇▅▁▇▆
Value labels
Response choices
name value
Very Inaccurate 6
Moderately Inaccurate 5
Slightly Inaccurate 4
Slightly Accurate 3
Moderately Accurate 2
Very Accurate 1

N4R

Often feel blue.

Distribution

Distribution of values for N4R

36 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
N4R Often feel blue. haven_labelled 36 0.9871429 1 4 6 3.814399 1.569685 6 ▃▅▁▇▅▁▇▆
Value labels
Response choices
name value
Very Inaccurate 6
Moderately Inaccurate 5
Slightly Inaccurate 4
Slightly Accurate 3
Moderately Accurate 2
Very Accurate 1

N5R

Panic easily.

Distribution

Distribution of values for N5R

29 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
N5R Panic easily. haven_labelled 29 0.9896429 1 4 6 4.030314 1.618647 6 ▃▃▁▆▅▁▇▇
Value labels
Response choices
name value
Very Inaccurate 6
Moderately Inaccurate 5
Slightly Inaccurate 4
Slightly Accurate 3
Moderately Accurate 2
Very Accurate 1

gender

gender

Distribution

Distribution of values for gender

0 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
gender gender haven_labelled 0 1 1 2 2 1.671786 0.4696471 2 ▃▁▁▁▁▁▁▇
Value labels
Response choices
name value
male 1
female 2

education

education

Distribution

Distribution of values for education

223 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
education education haven_labelled 223 0.9203571 1 3 5 3.190144 1.107714 5 ▂▂▁▇▁▂▁▃
Value labels
Response choices
name value
in high school 1
finished high school 2
some college 3
college graduate 4
graduate degree 5

age

age

Distribution

Distribution of values for age

0 missing values.

Summary statistics
name label data_type n_missing complete_rate min median max mean sd hist
age age numeric 0 1 3 26 86 28.78214 11.12755 ▃▇▂▁▁

Scale: extraversion

Overview

Reliability: ωordinal [95% CI] = 0.8 [0.78;0.81].

Missing: 87.

Likert plot of scale extraversion items

Distribution of scale extraversion

Reliability details
## No viewer found, probably documenting or testing
Scale diagnosis
Reliability (internal consistency) estimates
Scale structure
Information about this scale
Dataframe: res$dat
Items: E1R, E2R, E3, E4 & E5
Observations: 2713
Positive correlations: 10
Number of correlations: 10
Percentage positive correlations: 100
Estimates assuming interval level
Omega (total): 0.77
Omega (hierarchical): 0.65
Revelle’s Omega (total): 0.80
Greatest Lower Bound (GLB): 0.80
Coefficient H: 0.78
Coefficient Alpha: 0.76

Confidence intervals

Omega (total): [0.75; 0.78]
Coefficient Alpha [0.75; 0.78]
Estimates assuming ordinal level
Ordinal Omega (total): 0.80
Ordinal Omega (hierarch.): 0.79
Ordinal Coefficient Alpha: 0.79

Confidence intervals

Ordinal Omega (total): [0.78; 0.81]
Ordinal Coefficient Alpha [0.78; 0.81]

Note: the normal point estimate and confidence interval for omega are based on the procedure suggested by Dunn, Baguley & Brunsden (2013) using the MBESS function ci.reliability, whereas the psych package point estimate was suggested in Revelle & Zinbarg (2008). See the help (‘?ufs::scaleStructure’) for more information.

Eigen values

2.565, 0.768, 0.643, 0.561 & 0.464

Factor analysis (reproducing only shared variance)
ML1
E1R 0.608
E2R 0.730
E3 0.572
E4 0.692
E5 0.518
Component analysis (reproducing full covariance matrix)
PC1
E1R 0.700
E2R 0.780
E3 0.691
E4 0.758
E5 0.644
Item descriptives
mea n med ian var sd IQR se min q1 q3 max ske w kur t dip n NA val id
E1R 4.02838186509399 4 2.66475464468575 1.63240762209865 2 0.0313403409660919 1 2 6 6 -0.376504826875993 -1.09014414232021 0.117766310357538 2713 0 2713
E2R 3.85551050497604 4 2.58309738862486 1.60720172617654 2 0.0308564168763851 1 2 5 6 -0.221782408801999 -1.14964220379993 0.107077036490969 2713 0 2713
E3 4 4 1.82890855457227 1.35237145584054 2 0.0259639700066847 1 2 5 6 -0.472637065238653 -0.453422519621274 0.133247327681533 2713 0 2713
E4 4.42093623295245 5 2.13543171901486 1.46131164335841 2 0.0280554957846216 1 3 6 6 -0.826763197808318 -0.305165196969047 0.130114264651677 2713 0 2713
E5 4.41835606339845 5 1.78693431712491 1.33676262557154 1 0.0256642984932158 1 4 6 6 -0.784737560486994 -0.081589121721925 0.111131588647254 2713 0 2713
Scattermatrix

Scatterplot

## No viewer found, probably documenting or testing
Summary statistics
name label data_type value_labels n_missing complete_rate min median max mean sd n_value_labels hist
E1R Don’t talk a lot. haven_labelled 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
23 0.9917857 1 4 6 4.025567 1.631506 6 ▃▅▁▆▅▁▇▇
E2R Find it difficult to approach others. haven_labelled 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
16 0.9942857 1 4 6 3.858118 1.605210 6 ▃▅▁▇▅▁▇▆
E3 Know how to captivate people. haven_labelled 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
25 0.9910714 1 4 6 4.000721 1.352719 6 ▂▃▁▃▇▁▇▃
E4 Make friends easily. haven_labelled 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
9 0.9967857 1 5 6 4.422429 1.457517 6 ▁▂▁▂▃▁▇▆
E5 Take charge. haven_labelled 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
21 0.9925000 1 5 6 4.416337 1.334768 6 ▁▂▁▂▅▁▇▅

Scale: plasticity

Overview

Reliability: ωordinal [95% CI] = 0.75 [0.73;0.76].

Missing: 149.