A tutorial of vartors

Raw material and main problem : a simple database

To be useful, vartors need a simple database. Simple databases are defined as a single table database, with one variable by column, one observation by line and the name of each variable in the first line (header).

As a example, we will use the bad_database.csv given in the vartors package.

# Load the database
raw_data <- read.csv(file = paste0(path.package("vartors"),"/examples/bad_database.csv"))

This database have 10 variables of differents types and 100 observations. It seem's to be OK but if we check the class of each variable, there is some troubles.

str(raw_data)
#> 'data.frame':    100 obs. of  10 variables:
#>  $ subject    : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ initial    : Factor w/ 59 levels "ge","gh","gj",..: 31 51 23 34 53 34 18 9 21 49 ...
#>  $ birth      : Factor w/ 97 levels "2000-01-01","2000-01-12",..: 1 5 77 27 36 24 78 97 25 55 ...
#>  $ sex        : Factor w/ 2 levels "female","male": 1 1 2 1 1 2 2 1 2 2 ...
#>  $ height     : Factor w/ 37 levels "","1.43","1.49",..: 7 9 18 21 28 37 27 13 4 7 ...
#>  $ weight     : Factor w/ 80 levels "*","101.9","43.1",..: 52 39 34 NA 13 12 45 28 35 47 ...
#>  $ siblings   : Factor w/ 10 levels "?","0","1","2",..: 7 3 4 2 NA 5 3 5 4 NA ...
#>  $ study_level: Factor w/ 4 levels "not provided",..: 4 2 2 3 2 3 NA 2 3 3 ...
#>  $ Q1         : int  NA 2 2 3 NA 1 2 1 1 NA ...
#>  $ Q2         : int  0 2 NA NA NA NA 1 NA NA NA ...

We observe these issues :

If you want to import this dataframe properly in R you will have to transform manually each variable, for example by writing a script like this

clean_data <- raw_data
clean_data$initial <- as.character(raw_data$initial)
clean_data$birth <- as.Date(raw_data$birth, format = "%Y-%m-%d")

And this for each variable. Here it's easy because there is only 10 variables but it become quickly boring, time consuming and error prone for 50 variables. Furthermost, we have no information about the labels for study_levels, Q1 and Q2. Then we need some information about.

Create a description of the variable

Create a skeleton with descvars_skeleton

The idea is to be explict about each variable. To achieve this, we could create a variable description table. vartors have the function descvars_skeleton to help you to create a skeleton of this a variables description table.

library(vartors)
desc_skeleton <- descvars_skeleton(raw_data)
kable(desc_skeleton[,1:12])
column originalname varlabel description comment unit type rname flevel1 flabel1 flevel2 flabel2
A subject subject NA NA NA integer subject NA NA NA NA
B initial initial NA NA NA character initial NA NA NA NA
C birth birth NA NA NA character birth NA NA NA NA
D sex sex NA NA NA factor sex female female male male
E height height NA NA NA character height NA NA NA NA
F weight weight NA NA NA character weight NA NA NA NA
G siblings siblings NA NA NA character siblings NA NA NA NA
H study_level study_level NA NA NA factor study_level not provided not provided primary primary
I Q1 Q1 NA NA NA factor Q1 0 0 1 1
J Q2 Q2 NA NA NA factor Q2 0 0 1 1

Now, you have to ask the person who give you the database to explain each variable and fullfill this description of variable table. Just edit this, by using for example edit

desc_complete <- edit(desc_skeleton)

or in a more handy way, by saving the data.frame in .csv use a spreadsheet software like LibreOffice. This way, you should send it to the person who send you this database and ask him to fullfill it or do it with him.

write.csv(desc_skeleton, file = "variables_description.csv")

Fulfill this file is the most time consumming part in the vartors process but normaly if the database was well formated, it's easy to do it.

Import the variable description with import_vardesc

Next step is to import this table with variables description to a format that vartors should handle.

Import the table in R as a dataframe

# Path to csv in the vartors package. 
# It's a specific case. In real usage, use the path to your file instead
path_to_vardesc <- paste0(path.package("vartors"),
                          "/examples/variables_description_bad_database.csv")
# Import the csv
complete_vardesc <- read.csv(file = path_to_vardesc)

The result is show below in two parts

Complete variable description, first 8 columns :

column originalname varlabel description comment unit type rname
A subject Unique id NA Don't perform any analysis. Must be unique NA character id
B initial initial First letters of the name and surname Not useful for analysis NA not_used
C birth Birthdate NA Use it to calculate age (reference date = 2014-07-31) %Y-%m-%d date birthdate
D sex Gender NA NA NA factor sex
E height Height NA NA m numeric height
F weight Weight NA NA kg numeric weight
G siblings Number of siblings Original question : How many sisters and brothers do you have ? count variable -> try a Poisson distribution NA integer nb_siblings
H study_level Maximal study level Original question : What's your higher education level ? NA NA ordered study_level
I Q1 Like chocolate Original question : Do you like chocolate ? Used a Likert scale. NA ordered q1_chocolate
J Q2 Like french cheese Original question : Do you like french cheese ? Used a Likert scale. NA ordered q2_cheese

Complete variable description, first 2 columns and last columns :

column originalname flevel1 flabel1 flevel2 flabel2 flevel3 flabel3 flevel4 flabel4 flevel5
A subject NA NA NA NA NA NA NA NA NA
B initial NA NA NA NA NA NA NA NA NA
C birth NA NA NA NA NA NA NA NA NA
D sex female Women male Man NA NA NA NA NA
E height NA NA NA NA NA NA NA NA NA
F weight NA NA NA NA NA NA NA NA NA
G siblings NA NA NA NA NA NA NA NA NA
H study_level primary primary secondary secondary superior superior NA NA NA
I Q1 0 Strongly disagree 1 Disagree 2 Neither agree nor disagree 3 Agree 4
J Q2 0 Strongly disagree 1 Disagree 2 Neither agree nor disagree 3 Agree 4

This way each variable is explicit. Note that you should used the type not_used in order to discard variable.

Then you have to transform this data.frame to a DatabaseDef object, which could be understood by vartors. To to this, use import_vardef

suppressWarnings(
database_def_object <- import_vardef(complete_vardesc)
)

If you don't suppress the warnings, you will show some message that's say your rnames are not perfect but will work.

You could show this object

database_def_object
#> An object of class "DatabaseDef"
#> Slot "variables_definitions":
#> [[1]]
#> varlabel = Unique id 
#> type = character 
#> rname = id 
#> 
#> [[2]]
#> varlabel = initial 
#> comment = First letters of the name and surname 
#> type = not_used 
#> rname = initial 
#> 
#> [[3]]
#> varlabel = Birthdate 
#> unit = %d/%m/%Y 
#> type = date 
#> rname = birthdate 
#> 
#> [[4]]
#> varlabel = Gender 
#> type = factor 
#> rname = sex 
#> levels = female, male 
#> names = Women, Man 
#> 
#> [[5]]
#> varlabel = Height 
#> type = numeric 
#> rname = height 
#> 
#> [[6]]
#> varlabel = Weight 
#> type = numeric 
#> rname = weight 
#> 
#> [[7]]
#> varlabel = Number of siblings 
#> comment = Original question : How many sisters and brothers do you have ? 
#> type = integer 
#> rname = nb_siblings 
#> 
#> [[8]]
#> varlabel = Maximal study level 
#> comment = Original question : What's your higher education level ? 
#> type = ordered 
#> rname = study_level 
#> levels = primary, secondary, superior 
#> names = primary, secondary, superior 
#> 
#> [[9]]
#> varlabel = Like chocolate 
#> comment = Original question : Do you like chocolate ? 
#> type = ordered 
#> rname = q1_chocolate 
#> levels = 0, 1, 2, 3, 4 
#> names = Strongly disagree, Disagree, Neither agree nor disagree, Agree, Strongly agree 
#> 
#> [[10]]
#> varlabel = Like french cheese 
#> comment = Original question : Do you like french cheese ? 
#> type = ordered 
#> rname = q2_cheese 
#> levels = 0, 1, 2, 3, 4 
#> names = Strongly disagree, Disagree, Neither agree nor disagree, Agree, Strongly agree

You see that import_vardef parsed the table of variable definition. For example if you don't give rname in your table of variable definition, it will find one by reading the varlabel column or originalname if there is no varlabel.

Create the script with create_script

It's time to create a script with this. It's really easy. Just use the create_script method

simple_script <- create_script(var_desc = database_def_object)

That's that simple! You have a script you can explore

simple_script
#> #- Start of the script in R -#
#> ######### Importation script ##########
#> # import data
#> raw_data <- read.csv("rep_path_to_database")
#> 
#> # Change headers
#> names(raw_data) <- c('id', 'initial', 'birthdate', 'sex', 'height', 'weight', 'nb_siblings', 'study_level', 'q1_chocolate', 'q2_cheese')
#> 
#> # Make a copy
#> clean_data <- raw_data
#> 
#> 
#> 
#> ####### Clean the variable initial #####
#> # The variable initial is not used for analysis
#> clean_data$initial <- NULL
#> 
#> 
#> ####### Clean the variable birthdate #####
#> 
#> # explore the raw data
#> head(raw_data$birthdate)
#> str(raw_data$birthdate)
#> 
#> # set this variable as a date
#> clean_data$birthdate <- as.Date(raw_data$birthdate, format="%d/%m/%Y")
#> # set the label
#> attr(clean_data$birthdate, "label") <- "Birthdate"
#> 
#> head(clean_data$birthdate)
#> str(clean_data$birthdate)
#> summary(clean_data$birthdate)
#> 
#> # number of NA
#> 
#> sum(is.na(clean_data$birthdate))
#> 
#> 
#> ####### Clean the variable sex #####
#> 
#> # explore the raw data
#> head(raw_data$sex)
#> str(raw_data$sex)
#> 
#> # Set rep_varlable as a factor
#> clean_data$sex <- factor(
#>   x = raw_data$sex,
#>   levels = c('female', 'male'),
#>   labels = c('Women', 'Man')
#> )
#> # set the label
#> attr(clean_data$sex, "label") <- "Gender"
#> 
#> head(clean_data$sex)
#> str(clean_data$sex)
#> summary(clean_data$sex)
#> 
#> # number of NA
#> 
#> sum(is.na(clean_data$sex))
#> # Make a plot
#> plot(clean_data$sex)
#> 
#> 
#> ####### Clean the variable height #####
#> 
#> # explore the raw data
#> head(raw_data$height)
#> str(raw_data$height)
#> 
#> # Set rep_varlable as a numeric
#> clean_data$height <- as.numeric(raw_data$height)
#> # set the label
#> attr(clean_data$height, "label") <- "Height"
#> 
#> head(clean_data$height)
#> str(clean_data$height)
#> summary(clean_data$height)
#> 
#> # number of NA
#> 
#> sum(is.na(clean_data$height))
#> hist(clean_data$height)
#> 
#> 
#> ####### Clean the variable weight #####
#> 
#> # explore the raw data
#> head(raw_data$weight)
#> str(raw_data$weight)
#> 
#> # Set rep_varlable as a numeric
#> clean_data$weight <- as.numeric(raw_data$weight)
#> # set the label
#> attr(clean_data$weight, "label") <- "Weight"
#> 
#> head(clean_data$weight)
#> str(clean_data$weight)
#> summary(clean_data$weight)
#> 
#> # number of NA
#> 
#> sum(is.na(clean_data$weight))
#> hist(clean_data$weight)
#> 
#> 
#> ####### Clean the variable nb_siblings #####
#> 
#> # explore the raw data
#> head(raw_data$nb_siblings)
#> str(raw_data$nb_siblings)
#> 
#> # Set rep_varlable as an integer
#> clean_data$nb_siblings <- as.integer(raw_data$nb_siblings)
#> # set the label
#> attr(clean_data$nb_siblings, "label") <- "Number of siblings"
#> 
#> head(clean_data$nb_siblings)
#> str(clean_data$nb_siblings)
#> summary(clean_data$nb_siblings)
#> 
#> # number of NA
#> 
#> sum(is.na(clean_data$nb_siblings))
#> hist(clean_data$nb_siblings)
#> 
#> 
#> clean_data$study_level <- factor(
#>   x = raw_data$study_level,
#>   levels = c('primary', 'secondary', 'superior'),
#>   labels = c('primary', 'secondary', 'superior'),
#>   ordered = TRUE
#> )
#> 
#> 
#> clean_data$q1_chocolate <- factor(
#>   x = raw_data$q1_chocolate,
#>   levels = c('0', '1', '2', '3', '4'),
#>   labels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'),
#>   ordered = TRUE
#> )
#> 
#> 
#> clean_data$q2_cheese <- factor(
#>   x = raw_data$q2_cheese,
#>   levels = c('0', '1', '2', '3', '4'),
#>   labels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'),
#>   ordered = TRUE
#> )
#> ##### watch all ######
#> 
#> str(clean_data)
#> 
#> ####### Save the cleaned data ######
#> save(clean_data, file="clean_data.Rdata")
#> #- End of the script in R -#

and you can write it in the this script in a file

write_file(object = simple_script, filepath = "my_import_script1.R")

The fast way

One time you have your table with variables definition loaded in a data.frame, it's possible to do all the process in one line.

Remember, before we just imported it to a data.frame called complete_vardesc.

write_file(create_script(var_desc = complete_vardesc), filepath = "my_import_script1.R")

What about templates?

Choose a builtin with import_template

One of the strength of vartors, is his template system. Just before we created a script in R. But maybe you want it in .Rmd and then produce a report using knitr? And let's say you are a French user, then we want one in french.

To see what are the builtin template available, read the documentation of import_template function

?import_template

In the Details section, one can show there is a template that should match to our needs template_fr.Rmd. To import it, use import_template function and put the name of the builtin template in the builtin argument

rmd_template <- import_template(builtin = "template_fr.Rmd")

Then recreate a script with this template

rmd_script <- create_script(var_desc = database_def_object, template = rmd_template)

And you have your script in Rmd !

rmd_script
#> #- Start of the script in Rmd -#
#> ---
#> title: "Import des données"
#> author: "Nom de l'auteur"
#> date: "30 juin 2014"
#> output:
#>   html_document:
#>     number_sections: yes
#>     toc: yes
#>   pdf_document:
#>     toc: yes
#> ---
#> 
#> ```{r, echo=FALSE}
#> # Warning: encoding = UTF-8
#> ```
#> 
#> ```{r, echo = FALSE, message = FALSE}
#> # Load ggplot2 package to plot
#> library(ggplot2)
#> 
#> # Create a label function to access easly to labels
#> # without using label function from packages
#> # like those from Hmisc
#> label <- function(object) attr(x = object, which = "label")
#> 
#> ```
#> 
#> # Import du tableau de données
#> 
#> Tableau de données au format Excel. 
#> 
#> ```{r importcsv}
#> library(openxlsx)
#> raw_data <- read.xlsx( "rep_path_to_database")
#> ```
#> 
#> Changer les noms de colonnes avec ceux définis dans le cahier de variable
#> 
#> ```{r changenoms}
#> colnames(raw_data) <-  c('id', 'initial', 'birthdate', 'sex', 'height', 'weight', 'nb_siblings', 'study_level', 'q1_chocolate', 'q2_cheese')
#> ```
#> 
#> Créer une copie du tableau de variable pour le nettoyage
#> 
#> ```{r copie}
#> clean_data <- raw_data
#> ```
#> 
#> # Nettoyage variable par variable
#> 
#> 
#> ## initial
#> ### Description
#> - Nom complet : initial
#> - Nom dans R : initial
#> - Description : NA
#> - Unitée : NA
#> - Type : not_used
#> - Commentaires : First letters of the name and surname
#> 
#> ### Exploration des données brutes
#> ```{r brute_initial}
#> head(raw_data$initial, 10)
#> ```
#> 
#> ### Transformation
#> Variable non utilisée dans l'analyse. A supprimer
#> ```{r supprime_initial}
#> clean_data$initial <- NULL
#> ```
#> 
#> 
#> ## birthdate
#> ### Description
#> - Nom complet : Birthdate
#> - Nom dans R : birthdate
#> - Description : NA
#> - Unitée : %d/%m/%Y
#> - Type : date
#> - Commentaires : NA
#> 
#> ### Exploration des données brutes
#> ```{r brute_birthdate}
#> head(raw_data$birthdate, 10)
#> ```
#> 
#> ### Transformation
#> ```{r transfo_birthdate}
#> clean_data$birthdate <- as.Date(
#>   x = raw_data$birthdate, 
#>   format = "%d/%m/%Y"
#> )
#> ```
#> 
#> Ajouter un _label_ (étiquette).
#> ```{r label_birthdate}
#> attr(clean_data$birthdate, "label") <- "Birthdate"
#> ```
#> 
#> ### Vérifier
#> ```{r check_birthdate}
#> # Premières données
#> head(clean_data$birthdate, 10)
#> 
#> # Résumé
#> summary(clean_data$birthdate)
#> 
#> # Graphique
#> qplot(clean_data$birthdate, xlab = label(clean_data$birthdate))
#> ```
#> 
#> 
#> 
#> ## sex
#> ### Description
#> - Nom complet : Gender
#> - Nom dans R : sex
#> - Description : NA
#> - Unitée : NA
#> - Type : factor
#> - Commentaires : NA
#> 
#> ### Exploration des données brutes
#> ```{r brute_sex}
#> head(raw_data$sex, 10)
#> ```
#> 
#> ### Transformation
#> ```{r transfo_sex}
#> clean_data$sex <- factor(
#>   x = raw_data$sex,
#>   levels = c('female', 'male'),
#>   labels = c('Women', 'Man')
#> )
#> ```
#> 
#> Ajouter un _label_ (étiquette).
#> ```{r label_sex}
#> attr(clean_data$sex, "label") <- "Gender"
#> ```
#> 
#> ### Vérifier
#> ```{r check_sex}
#> # Premières données
#> head(clean_data$sex, 10)
#> 
#> # Résumé
#> summary(clean_data$sex)
#> 
#> # Graphique
#> qplot(clean_data$sex, xlab = label(clean_data$sex))
#> ```
#> 
#> 
#> 
#> ## height
#> ### Description
#> - Nom complet : Height
#> - Nom dans R : height
#> - Description : NA
#> - Unitée : NA
#> - Type : numeric
#> - Commentaires : NA
#> 
#> ### Exploration des données brutes
#> ```{r brute_height}
#> head(raw_data$height, 10)
#> ```
#> 
#> ### Transformation
#> ```{r transfo_height}
#> clean_data$height <- as.numeric(raw_data$height)
#> ```
#> 
#> Ajouter un _label_ (étiquette).
#> ```{r label_height}
#> attr(clean_data$height, "label") <- "Height"
#> ```
#> 
#> ### Vérifier
#> ```{r check_height}
#> # Premières données
#> head(clean_data$height, 10)
#> 
#> # Résumé
#> summary(clean_data$height)
#> 
#> # Graphique
#> qplot(clean_data$height, xlab = label(clean_data$height))
#> ```
#> 
#> 
#> 
#> ## weight
#> ### Description
#> - Nom complet : Weight
#> - Nom dans R : weight
#> - Description : NA
#> - Unitée : NA
#> - Type : numeric
#> - Commentaires : NA
#> 
#> ### Exploration des données brutes
#> ```{r brute_weight}
#> head(raw_data$weight, 10)
#> ```
#> 
#> ### Transformation
#> ```{r transfo_weight}
#> clean_data$weight <- as.numeric(raw_data$weight)
#> ```
#> 
#> Ajouter un _label_ (étiquette).
#> ```{r label_weight}
#> attr(clean_data$weight, "label") <- "Weight"
#> ```
#> 
#> ### Vérifier
#> ```{r check_weight}
#> # Premières données
#> head(clean_data$weight, 10)
#> 
#> # Résumé
#> summary(clean_data$weight)
#> 
#> # Graphique
#> qplot(clean_data$weight, xlab = label(clean_data$weight))
#> ```
#> 
#> 
#> 
#> ## nb_siblings
#> ### Description
#> - Nom complet : Number of siblings
#> - Nom dans R : nb_siblings
#> - Description : NA
#> - Unitée : NA
#> - Type : integer
#> - Commentaires : Original question : How many sisters and brothers do you have ?
#> 
#> ### Exploration des données brutes
#> ```{r brute_nb_siblings}
#> head(raw_data$nb_siblings, 10)
#> ```
#> 
#> ### Transformation
#> ```{r transfo_nb_siblings}
#> clean_data$nb_siblings <- as.integer(raw_data$nb_siblings)
#> ```
#> 
#> Ajouter un _label_ (étiquette).
#> ```{r label_nb_siblings}
#> attr(clean_data$nb_siblings, "label") <- "Number of siblings"
#> ```
#> 
#> ### Vérifier
#> ```{r check_nb_siblings}
#> # Premières données
#> head(clean_data$nb_siblings, 10)
#> 
#> # Résumé
#> summary(clean_data$nb_siblings)
#> 
#> # Graphique
#> qplot(clean_data$nb_siblings, xlab = label(clean_data$nb_siblings))
#> ```
#> 
#> 
#> 
#> ## study_level
#> ### Description
#> - Nom complet : Maximal study level
#> - Nom dans R : study_level
#> - Description : NA
#> - Unitée : NA
#> - Type : ordered
#> - Commentaires : Original question : What's your higher education level ?
#> 
#> ### Exploration des données brutes
#> ```{r brute_study_level}
#> head(raw_data$study_level, 10)
#> ```
#> 
#> ### Transformation
#> ```{r transfo_study_level}
#> clean_data$study_level <- factor(
#>   x = raw_data$study_level,
#>   levels = c('primary', 'secondary', 'superior'),
#>   labels = c('primary', 'secondary', 'superior'),
#>   ordered = TRUE
#> )
#> ```
#> 
#> Ajouter un _label_ (étiquette).
#> ```{r label_study_level}
#> attr(clean_data$study_level, "label") <- "Maximal study level"
#> ```
#> 
#> ### Vérifier
#> ```{r check_study_level}
#> # Premières données
#> head(clean_data$study_level, 10)
#> 
#> # Résumé
#> summary(clean_data$study_level)
#> 
#> # Graphique
#> qplot(clean_data$study_level, xlab = label(clean_data$study_level))
#> ```
#> 
#> 
#> 
#> ## q1_chocolate
#> ### Description
#> - Nom complet : Like chocolate
#> - Nom dans R : q1_chocolate
#> - Description : NA
#> - Unitée : NA
#> - Type : ordered
#> - Commentaires : Original question : Do you like chocolate ?
#> 
#> ### Exploration des données brutes
#> ```{r brute_q1_chocolate}
#> head(raw_data$q1_chocolate, 10)
#> ```
#> 
#> ### Transformation
#> ```{r transfo_q1_chocolate}
#> clean_data$q1_chocolate <- factor(
#>   x = raw_data$q1_chocolate,
#>   levels = c('0', '1', '2', '3', '4'),
#>   labels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'),
#>   ordered = TRUE
#> )
#> ```
#> 
#> Ajouter un _label_ (étiquette).
#> ```{r label_q1_chocolate}
#> attr(clean_data$q1_chocolate, "label") <- "Like chocolate"
#> ```
#> 
#> ### Vérifier
#> ```{r check_q1_chocolate}
#> # Premières données
#> head(clean_data$q1_chocolate, 10)
#> 
#> # Résumé
#> summary(clean_data$q1_chocolate)
#> 
#> # Graphique
#> qplot(clean_data$q1_chocolate, xlab = label(clean_data$q1_chocolate))
#> ```
#> 
#> 
#> 
#> ## q2_cheese
#> ### Description
#> - Nom complet : Like french cheese
#> - Nom dans R : q2_cheese
#> - Description : NA
#> - Unitée : NA
#> - Type : ordered
#> - Commentaires : Original question : Do you like french cheese ?
#> 
#> ### Exploration des données brutes
#> ```{r brute_q2_cheese}
#> head(raw_data$q2_cheese, 10)
#> ```
#> 
#> ### Transformation
#> ```{r transfo_q2_cheese}
#> clean_data$q2_cheese <- factor(
#>   x = raw_data$q2_cheese,
#>   levels = c('0', '1', '2', '3', '4'),
#>   labels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'),
#>   ordered = TRUE
#> )
#> ```
#> 
#> Ajouter un _label_ (étiquette).
#> ```{r label_q2_cheese}
#> attr(clean_data$q2_cheese, "label") <- "Like french cheese"
#> ```
#> 
#> ### Vérifier
#> ```{r check_q2_cheese}
#> # Premières données
#> head(clean_data$q2_cheese, 10)
#> 
#> # Résumé
#> summary(clean_data$q2_cheese)
#> 
#> # Graphique
#> qplot(clean_data$q2_cheese, xlab = label(clean_data$q2_cheese))
#> ```
#> 
#> # Exploration globale
#> 
#> Recherche de données manquantes graphiquement
#> ```{r }
#> library(dfexplore)
#> dfplot(clean_data)
#> ```
#> 
#> # Sauvegarder
#> 
#> ```{r}
#> donnees <- clean_data
#> save(donnees, file = "donnees.RData")
#> ```
#> #- End of the script in Rmd -#

Create your own template with export_template

If you don't find any template that feet your need in the builtin ones, create one! Template are just R and Rmd files with some delimiters. More information about how to produce your template in the documentation

?template

Basicaly, the idea is to export an builtin template and change it. To to this, use export_template. For example, if you want to modify the preceding template, export it with :

export_template(builtin = "template_fr.Rmd", to = "mytemplate.Rmd")

When you are happy with your template, import it with import_template.

my_template <- import_template(path = "mytemplate.Rmd")

and then use it to produce your script

create_script(var_desc = database_def_object, template = my_template)

What to do with this script skeleton?

The job of vartors ends with the script skeleton creation. We you choose to produce a script with this package and not directly the cleaned database to give the availabilty to the user to adapt perfecly the importation phase to his needs. Then what to do with this script?

  1. Run the script line by line to check if it work and adapt it when necessary and then save the well formated data.frame
  2. When you have a clean script, use tools like knitr to produce a report in HTML, PDF or other format.