Anatomy of a Variable Name

library(dySEM)

Introduction

Most users will follow a four-step process to use dySEM:

  1. Scrape Variable Information
  2. Script a Model
  3. Fit a Model
  4. Output Desired Values

Of these steps, scraping variable information is one of the most important, yet unintuitive, as researchers do not often take the process of naming their variables for granted.

In this short tutorial, we are going to focus on the patterns of repetition that are present in the variable names in many data sets, and provide a breakdown and vocabulary for the “anatomy” of these variable names. Understanding the elements that make up a variable name will therefore help you to apply more efficient names in your own dataset(s), and to navigate the scraping step of using dySEM more easily using a function like scrapeVarCross().

Effective Variable Name Styling

Try to remember the last spreadsheet of data you interacted with. Now think of the variables (i.e., columns) within it. The names of those variables could probably be categorized in terms of a few dimensions of styling, namely, the clarity and repetition of the variable name’s stem (i.e., the text content of the variable’s name). In terms of clarity, some variable name stems you thought of were probably ambiguous or cryptic (e.g., Q1, Q19_1)–you or someone else wouldn’t immediately be able to understand what kind of data was captured in that variable based on the stem alone.

Some Examples of Ambiguous Variable Names

Other variable name stems, meanwhile, were probably more intuitive. Variables corresponding to the 18 Experiences in Close Relationships survey items, for example, might be named using the intuitive acronym of ecr, which could be more immediately recognized and understood by someone familiar with the study.

Some Examples of Intuitive Variable Names

The second dimension, repetition, is sometimes so subtle you might not immediately recognize it as a feature of variable name stems (and an important feature at that!). Some variable names–even if they refer to a set of related variables–might be highly inconsistent or idiosyncratic. The same exemplar variables corresponding to the 18 Experiences in Close Relationships survey items might be named with different descriptive text in each variable name, or using different patterns of text casing or delimitation. This kind of inconsistent naming might strike you as unlikely to occur “in the wild”. We agree, and we have therefore developed dySEM accordingly, betting that they are the exception, not the rule. The point, however, is that inconsistent variable names are, strictly speaking, conceivable. But for the most part, we anticipate related variables will be named repetitiously: the stem, casing, and delimitation applied to one of many related variables will be applied to all.

Some Examples of Inconsistent (oh, the horror) and Consistent Variable Names


These two dimensions of variable name styling play important, but different, roles when scraping variable information via dySEM. Clear names are not essential for scraping functions (e.g., scrapeVarCross()) to work properly, but they will make for quicker and less error-prone coding. Repetitious names, meanwhile, are (for now, at least) essential for scraping functions. Thus, be sure to export or prepare your dataset in such a way that the related set(s) of variables you intend to use in dySEM have repetitious names.



Double-Trouble with Dyadic Variable Names

Variable names get a bit more complicated in the context of “dyad” (KENNY REF) or “=ide” (MLM REF) datasets, because these datasets contain the same variable twice, in two different columns: one for each member of a dyad.



Each variable needs its own unique name, and so typically, dyad datasets involve appending a distinguishing character(s) or number(s) to make clear to which partner’s value a given variable refers. These distinguishers come in one of two varieties, corresponding to the type of dyads under study. If the dyads in the dataset are “indistinguishable”–that is, there’s no systematic and consistent way to assign dyad members to a particular role (e.g., friends, coworkers)–then distinguishers are usually arbitrary in name and designation. For example, some partner’s variables might be given the distinguisher “1” or “A”, while the other partner’s variables might be given the distinguisher “2” or “B”. The particular two distinguishers selected in these cases ultimately do not matter (as they do not convey any meaningful role-based information), as long as they are applied repetitiously between dyad members.