Getting started with stbl

library(stbl)

The goal of {stbl} is to help you stabilize function arguments before you use them. This is especially important when a function performs a slow or expensive operation, like writing to a database or calling a web API. You want to fail fast and with a clear error message if the inputs aren’t right.

This vignette demonstrates this principle by incrementally building a single function, register_user(), which validates several arguments before hypothetically sending them to an external service.

The register_user() Function

Here is the base function we’ll be improving. Without any checks, it’s vulnerable to bad inputs that could cause cryptic errors later on or send corrupt data to our external service.

register_user <- function(username,
                          email_address,
                          age,
                          is_premium_member,
                          interests) {
  # Imagine this is a slow API call
  list(
    username = username,
    email_address = email_address,
    age = age,
    is_premium_member = is_premium_member,
    interests = interests
  )
}

Step 1: Handling a Vector with to_*()

Let’s start adding checks. The first check will be for the interests argument. We expect this to be a character vector, but we’re not picky about the content. to_chr() will convert inputs that are character-like (like factors or a simple list of strings) into a proper character vector.

register_user <- function(username,
                          email_address,
                          age,
                          is_premium_member,
                          interests) {
  interests <- to_chr(interests)

  list(
    username = username,
    email_address = email_address,
    age = age,
    is_premium_member = is_premium_member,
    interests = interests
  )
}

# It works with a character vector or a list of characters. We use `str()` to 
# make the output easier to see.
register_user(
  username = "test_user", 
  email_address = "test@example.com", 
  age = 42, 
  is_premium_member = TRUE, 
  interests = c("R", "hiking")
) |> str()
#> List of 5
#>  $ username         : chr "test_user"
#>  $ email_address    : chr "test@example.com"
#>  $ age              : num 42
#>  $ is_premium_member: logi TRUE
#>  $ interests        : chr [1:2] "R" "hiking"
register_user(
  username = "test_user", 
  email_address = "test@example.com", 
  age = 42, 
  is_premium_member = TRUE, 
  interests = list("R", "hiking")
) |> str()
#> List of 5
#>  $ username         : chr "test_user"
#>  $ email_address    : chr "test@example.com"
#>  $ age              : num 42
#>  $ is_premium_member: logi TRUE
#>  $ interests        : chr [1:2] "R" "hiking"

If the input is something that cannot be reasonably flattened to a character vector, it fails with a helpful message.

# Fails because the list contains a function, which is not character-like.
register_user(
  username = "test_user", 
  email_address = "test@example.com", 
  age = 42, 
  is_premium_member = TRUE, 
  interests = list("R", mean)
)
#> Error in `register_user()`:
#> ! Can't coerce `interests` <list> to <character>.

Step 2: Simple Scalar Coercion with to_*_scalar()

Next, we’ll add checks for age and is_premium_member. These arguments must each contain a single value. We’ll use the _scalar variants: to_int_scalar() and to_lgl_scalar(). These functions are liberal in what they accept. For example, to_lgl_scalar() understands that 1, "T", and "True" all mean TRUE.

register_user <- function(username,
                          email_address,
                          age,
                          is_premium_member,
                          interests) {
  interests <- to_chr(interests)
  age <- to_int_scalar(age)
  is_premium_member <- to_lgl_scalar(is_premium_member)

  list(
    username = username,
    email_address = email_address,
    age = age,
    is_premium_member = is_premium_member,
    interests = interests
  )
}

# This works.
register_user(
  username = "test_user", 
  email_address = "test@example.com", 
  age = "42", 
  is_premium_member = TRUE, 
  interests = c("R", "hiking")
) |> str()
#> List of 5
#>  $ username         : chr "test_user"
#>  $ email_address    : chr "test@example.com"
#>  $ age              : int 42
#>  $ is_premium_member: logi TRUE
#>  $ interests        : chr [1:2] "R" "hiking"

They fail clearly if the input isn’t a single value or can’t be coerced.

# Fails because age is not a single value.
register_user(
  username = "test_user", 
  email_address = "test@example.com", 
  age = c(30, 31), 
  is_premium_member = TRUE, 
  interests = c("R", "hiking")
)
#> Error in `register_user()`:
#> ! `age` must be a single <integer>.
#> ✖ `age` has 2 values.

# Fails because "forty-two" cannot be converted to an integer.
register_user(
  username = "test_user", 
  email_address = "test@example.com", 
  age = "forty-two", 
  is_premium_member = TRUE, 
  interests = c("R", "hiking")
)
#> Error in `register_user()`:
#> ! `age` <character> must be coercible to <integer>
#> ✖ Can't convert some values due to incompatible values.
#> • Locations: 1

Step 3: Complex Validation with stabilize_*()

Finally, let’s add our most complex validation for username and email_address. For these, simple type coercion isn’t enough; we need to check their content and structure using stabilize_chr_scalar(). This function first coerces the input, then applies a list of validation rules. You should prefer the faster to_*() functions and only “upgrade” to stabilize_*() when you need these additional checks.

This is our final, fully stabilized function:

register_user <- function(username,
                          email_address,
                          age,
                          is_premium_member,
                          interests) {
  interests <- to_chr(interests)
  age <- to_int_scalar(age)
  is_premium_member <- to_lgl_scalar(is_premium_member)

  space_regex <- c("must not contain spaces" = "\\s")
  attr(space_regex, "negate") <- TRUE
  username <- stabilize_chr_scalar(
    username,
    regex = space_regex
  )

  email_regex <- "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
  email_address <- stabilize_chr_scalar(
    email_address,
    regex = c("must be a valid email address" = email_regex)
  )

  list(
    username = username,
    email_address = email_address,
    age = age,
    is_premium_member = is_premium_member,
    interests = interests
  )
}

# A successful call.
register_user(
  username = "test_user", 
  email_address = "test@example.com", 
  age = 42, 
  is_premium_member = TRUE, 
  interests = c("R", "hiking")
) |> str()
#> List of 5
#>  $ username         : chr "test_user"
#>  $ email_address    : chr "test@example.com"
#>  $ age              : int 42
#>  $ is_premium_member: logi TRUE
#>  $ interests        : chr [1:2] "R" "hiking"

And here are some examples of it failing, with informative error messages for each of our new rules.

# Fails because the username has a space.
register_user(
  username = "test user", 
  email_address = "test@example.com", 
  age = 42, 
  is_premium_member = TRUE, 
  interests = c("R", "hiking")
)
#> Error in `register_user()`:
#> ! `username` must not contain spaces
#> ✖ "test user" fails the check.

# Fails because the email address is invalid.
register_user(
  username = "test_user", 
  email_address = "not-a-valid-email", 
  age = 42, 
  is_premium_member = TRUE, 
  interests = c("R", "hiking")
)
#> Error in `register_user()`:
#> ! `email_address` must be a valid email address
#> ✖ "not-a-valid-email" fails the check.