Managing Personal Access Tokens

library(gh)

gh generally sends a Personal Access Token (PAT) with its requests. Some endpoints of the GitHub API can be accessed without authenticating yourself. But once your API use becomes more frequent, you will want a PAT to prevent problems with rate limits and to access all possible endpoints.

This article describes how to store your PAT, so that gh can find it (automatically, in most cases). The function gh uses for this is gh_token().

More resources on PAT management:

PAT and host

gh::gh() allows the user to provide a PAT via the .token argument and to specify a host other than “github.com” via the .api_url argument. (Some companies and universities run their own instance of GitHub Enterprise.)

gh(endpoint, ..., .token = NULL, ..., .api_url = NULL, ...)

However, it’s annoying to always provide your PAT or host and it’s unsafe for your PAT to appear explicitly in your R code. It’s important to make it possible for the user to provide the PAT and/or API URL directly, but it should rarely be necessary. gh::gh() is designed to play well with more secure, less fiddly methods for expressing what you want.

How are .api_url and .token determined when the user does not provide them?

  1. .api_url defaults to the value of the GITHUB_API_URL environment variable and, if that is unset, falls back to "https://api.github.com". This is always done before worrying about the PAT.
  2. The PAT is obtained via a call to gh_token(.api_url). That is, the token is looked up based on the host.

The gitcreds package

gh now uses the gitcreds package to interact with the Git credential store.

gh calls gitcreds::gitcreds_get() with a URL to try to find a matching PAT. gitcreds::gitcreds_get() checks session environment variables and then the local Git credential store. Therefore, if you have previously used a PAT with, e.g., command line Git, gh may retrieve and re-use it. You can call gitcreds::gitcreds_get() directly, yourself, if you want to see what is found for a specific URL.

gitcreds::gitcreds_get()

If you see something like this:

#> <gitcreds>
#>   protocol: https
#>   host    : github.com
#>   username: PersonalAccessToken
#>   password: <-- hidden -->

that means that gitcreds could get the PAT from the Git credential store. You can call gitcreds_get()$password to see the actual PAT.

If no matching PAT is found, gitcreds::gitcreds_get() errors.

PAT in an environment variable

If you don’t have a Git installation, or your Git installation does not have a working credential store, then you can specify the PAT in an environment variable. For github.com you can set the GITHUB_PAT_GITHUB_COM or GITHUB_PAT variable. For a different GitHub host, call gitcreds::gitcreds_cache_envvar() with the API URL to see the environment variable you need to set. For example:

gitcreds::gitcreds_cache_envvar("https://github.acme.com")
#> [1] "GITHUB_PAT_GITHUB_ACME_COM"

Recommendations

On a machine used for interactive development, we recommend:

On a headless system, such as on a CI/CD platform, provide the necessary PAT(s) via secure environment variables. Regular environment variables can be used to configure less sensitive settings, such as the API host. Don’t expose your PAT by doing something silly like dumping all environment variables to a log file.

Note that on GitHub Actions, specifically, a personal access token is automatically available to the workflow as the GITHUB_TOKEN secret. That is why many workflows in the R community contain this snippet:

env:
  GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

This makes the automatic PAT available as the GITHUB_PAT environment variable. If that PAT doesn’t have the right permissions, then you’ll need to explicitly provide one that does (see link above for more).

Failure

If there is no PAT to be had, gh::gh() sends a request with no token. (Internally, the Authorization header is omitted if the PAT is found to be the empty string, "".)

What do PAT-related failures look like?

If no PAT is sent and the endpoint requires no auth, the request probably succeeds! At least until you run up against rate limits. If the endpoint requires auth, you’ll get an HTTP error, possibly this one:

GitHub API error (401): 401 Unauthorized
Message: Requires authentication

If a PAT is first discovered in an environment variable, it is taken at face value. The two most common ways to arrive here are PAT specification via .Renviron or as a secret in a CI/CD platform, such as GitHub Actions. If the PAT is invalid, the first affected request will fail, probably like so:

GitHub API error (401): 401 Unauthorized
Message: Bad credentials

This will also be the experience if an invalid PAT is provided directly via .token.

Even a valid PAT can lead to a downstream error, if it has insufficient scopes with respect to a specific request.