OCSdata Instructions

Table of Contents
Introduction
Arguments
	casestudy
	outpath
	fork_repo
How to Use
	raw_data()
	simpler_import_data()
	extra_data()
	imported_data()
	Loading RDA Files
	wrangled_csv()
	wrangled_rda()
	zip_ocs()
	clone_ocs()

Arguments

casestudy

All of the OCSdata functions require a case study ID to be input to the casestudy argument field. This ID should match with the case study you are intending to download data from. See the table below to see the case study names and their corresponding ID.

Case Study Name	Case Study ID
Exploring global patterns of obesity across rural and urban regions	ocs-bp-rural-and-urban-obesity
Predicting Annual Air Pollution	ocs-bp-air-pollution
Vaping Behaviors in American Youth	ocs-bp-vaping-case-study
Opioids in United States	ocs-bp-opioid-rural-urban
Influence of Multicollinearity on Measured Impact of Right-to-Carry Gun Laws Part 1	ocs-bp-RTC-wrangling
Influence of Multicollinearity on Measured Impact of Right-to-Carry Gun Laws Part 2	ocs-bp-RTC-analysis
Disparities in Youth Disconnection	ocs-bp-youth-disconnection
Mental Health of American Youth	ocs-bp-youth-mental-health
School Shootings in the United States	ocs-bp-school-shootings-dashboard
Exploring CO2 emissions across time	ocs-bp-co2-emissions
Exploring global patterns of dietary behaviors associated with health risk	ocs-bp-diet

\(\color{blue}{\text{The examples below use the "Opioids in United States" case study. To download data from a different case study,}}\) \(\color{blue}{\text{change "ocs-bp-opioid-rural-urban" to the ID of the case study you are interested in. (See table for list of IDs)}}\)

\(\color{blue}{\text{Note: All of the case studies have at least raw, imported, and wrangled data. However, not all of them have extra}}\) \(\color{blue}{\text{ or simpler_import data. Keep this in mind when using the extra_data() and simpler_import_data() functions.}}\)

outpath

All of the functions also have an outpath argument to specify where the files should be saved to on your computer. This argument defaults to NULL which will ask you to specify a file path interactively, suggesting your current working directory as an option. If the user’s session is not interactive, an error message is returned that tells the user to input a valid file path to outpath. The user is required to specify a file path to avoid unintended overwriting.

Temporary directories are used in all of the examples provided in the package documentation. This is to prevent the functions from overwriting users’ local files. To test the package functions with our examples and actually view the downloaded data folder, replace tempdir() with the file path to the desired directory as a character string.

fork_repo

This is a logical argument and only used for clone_ocs(). FALSE will clone the repo, while TRUE will fork the repo and then clone the fork. Defaults to NA which will fork or clone based on your repository permissions.

How to Use

The following examples illustrate all of the different functions and how you can use them to stop and start at different sections of the case study. These examples will download the data into temporary directories to prevent overwriting local files. To download them somewhere else, specify the path to the desired directory (folder) in the outpath argument.

\(\color{red}{\text{Note: To download the data into your current working directory, change the input for `outpath` to `getwd()`.}}\)

# install.packages("OCSdata")
library(OCSdata)

Starting at data import:

The raw_data function will download the raw data files that can be imported into R.

raw_data("ocs-bp-opioid-rural-urban", outpath = tempdir())

The function will create an OCS_data/data/raw directory where the raw data files can be found: raw data directory

If the input to outpath is the path to a folder called “demo,” the directory structure will look like this: raw directory structure

For file formats that are easier to import:

The simpler_import_data function will download raw data files that have been converted to file formats that are easier to import into R, typically .csv. Some case studies offer this option when the original raw files require a more complicated import step.

simpler_import_data("ocs-bp-opioid-rural-urban", outpath = tempdir())

The function will create an OCS_data/data/simpler_import directory where the data files can be found: simpler_import data directory

If the input to outpath is the path to a folder called “demo,” the directory structure will look like this: simpler_import directory structure

For more data on this topic:

The extra_data function will download raw data files that are not used in the case study, but are available for users to further analyze.

extra_data("ocs-bp-opioid-rural-urban", outpath = tempdir())

The function will create an OCS_data/data/extra directory where the data files can be found: extra data directory

If the input to outpath is the path to a folder called “demo,” the directory structure will look like this: extra directory structure

Starting at data exploration/wrangling sections:

The imported_data function will download raw data files in .rda format. This means the data have already been imported into R objects.

imported_data("ocs-bp-opioid-rural-urban", outpath = tempdir())

The function will create an OCS_data/data/imported directory where the imported data files can be found: imported data directory

If the input to outpath is the path to a folder called “demo,” the directory structure will look like this: imported directory structure

Loading RDA Files

RDA files can be imported into R by either double clicking on the files in Rstudio or using the load() function. The following examples show how to use both methods with the “land_area.rda” file from the imported data folder we just downloaded.

Double Click Method:

Load Function Method:

file_path = "~/Desktop/demo/OCS_data/data/imported/land_area.rda"
load(file_path)

In this case the OCS_data folder is saved to a demo folder in the Desktop directory. To use this method, replace the value assigned to file_path with the file path to your RDA file.

Both of these methods will load the RDA file into your global environment as an R object that is ready to be used.

Starting at data visualization/analysis sections:

The following functions will download the data files that have already been wrangled and are ready to be analyzed. These come in both .csv and .rda formats.

CSV:

wrangled_csv("ocs-bp-opioid-rural-urban", outpath = tempdir())

The function will create an OCS_data/data/wrangled directory where the wrangled csv files can be found: wrangled csv directory

RDA:

wrangled_rda("ocs-bp-opioid-rural-urban", outpath = tempdir())

The function will create an OCS_data/data/wrangled directory where the wrangled rda files can be found: wrangled rda directory

These files can be loaded into R using the methods described above in the “Loading RDA Files” section.

If the input to outpath is the path to a folder called “demo,” the directory structure will look like this: wrangled directory structure

Download case study repository zip file:

The zip_ocs function will download the all of the repository files in a .zip folder and unzip them into a specified directory.

zip_ocs("ocs-bp-opioid-rural-urban", outpath = tempdir())

Clone the case study GitHub repository:

The clone_ocs function will clone the specified case study’s GitHub repository with git and download the whole repository to a specified directory. This function requires your GitHub personal access token (PAT) to be registered in R/RStudio.

clone_ocs("ocs-bp-opioid-rural-urban", outpath = tempdir(), fork_repo = TRUE)

Setting fork_repo = TRUE will fork the repo first and then clone the fork, while FALSE will clone the repo directly from the Open Case Studies GitHub. The default is fork_repo = NA, which will fork or clone based on your repository permissions.