To be useful, vartors need a simple database. Simple databases are defined as a single table database, with one variable by column, one observation by line and the name of each variable in the first line (header).
As a example, we will use the bad_database.csv
given in the vartors package.
# Load the database
raw_data <- read.csv(file = paste0(path.package("vartors"),"/examples/bad_database.csv"))
This database have 10 variables of differents types and 100 observations. It seem's to be OK but if we check the class of each variable, there is some troubles.
str(raw_data)
#> 'data.frame': 100 obs. of 10 variables:
#> $ subject : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ initial : Factor w/ 59 levels "ge","gh","gj",..: 31 51 23 34 53 34 18 9 21 49 ...
#> $ birth : Factor w/ 97 levels "2000-01-01","2000-01-12",..: 1 5 77 27 36 24 78 97 25 55 ...
#> $ sex : Factor w/ 2 levels "female","male": 1 1 2 1 1 2 2 1 2 2 ...
#> $ height : Factor w/ 37 levels "","1.43","1.49",..: 7 9 18 21 28 37 27 13 4 7 ...
#> $ weight : Factor w/ 80 levels "*","101.9","43.1",..: 52 39 34 NA 13 12 45 28 35 47 ...
#> $ siblings : Factor w/ 10 levels "?","0","1","2",..: 7 3 4 2 NA 5 3 5 4 NA ...
#> $ study_level: Factor w/ 4 levels "not provided",..: 4 2 2 3 2 3 NA 2 3 3 ...
#> $ Q1 : int NA 2 2 3 NA 1 2 1 1 NA ...
#> $ Q2 : int 0 2 NA NA NA NA 1 NA NA NA ...
We observe these issues :
birth
was reconized as a factor but in reality, it's a data. read.csv
don't detect dates.height
and weight
were reconized as factor too but are numerics. It's because there is multiple definition for missing data (NA, ?, empty cell and some comments)study_levels
Q1
and Q2
initial
was reconized as a factor but in reality, must be a character. It's because read.csv
have the argument stringsAsFactors = TRUE
by default.If you want to import this dataframe properly in R you will have to transform manually each variable, for example by writing a script like this
clean_data <- raw_data
clean_data$initial <- as.character(raw_data$initial)
clean_data$birth <- as.Date(raw_data$birth, format = "%Y-%m-%d")
And this for each variable. Here it's easy because there is only 10 variables but it become quickly boring, time consuming and error prone for 50 variables. Furthermost, we have no information about the labels for study_levels, Q1 and Q2. Then we need some information about.
descvars_skeleton
The idea is to be explict about each variable. To achieve this, we could create a variable description table. vartors have the function descvars_skeleton
to help you to create a skeleton of this a variables description table.
library(vartors)
desc_skeleton <- descvars_skeleton(raw_data)
kable(desc_skeleton[,1:12])
column | originalname | varlabel | description | comment | unit | type | rname | flevel1 | flabel1 | flevel2 | flabel2 |
---|---|---|---|---|---|---|---|---|---|---|---|
A | subject | subject | NA | NA | NA | integer | subject | NA | NA | NA | NA |
B | initial | initial | NA | NA | NA | character | initial | NA | NA | NA | NA |
C | birth | birth | NA | NA | NA | character | birth | NA | NA | NA | NA |
D | sex | sex | NA | NA | NA | factor | sex | female | female | male | male |
E | height | height | NA | NA | NA | character | height | NA | NA | NA | NA |
F | weight | weight | NA | NA | NA | character | weight | NA | NA | NA | NA |
G | siblings | siblings | NA | NA | NA | character | siblings | NA | NA | NA | NA |
H | study_level | study_level | NA | NA | NA | factor | study_level | not provided | not provided | primary | primary |
I | Q1 | Q1 | NA | NA | NA | factor | Q1 | 0 | 0 | 1 | 1 |
J | Q2 | Q2 | NA | NA | NA | factor | Q2 | 0 | 0 | 1 | 1 |
Now, you have to ask the person who give you the database to explain each variable and fullfill this description of variable table. Just edit this, by using for example edit
desc_complete <- edit(desc_skeleton)
or in a more handy way, by saving the data.frame in .csv
use a spreadsheet
software like LibreOffice. This way, you should send it to the person who send you this database and ask him to fullfill it or do it with him.
write.csv(desc_skeleton, file = "variables_description.csv")
Fulfill this file is the most time consumming part in the vartors process but normaly if the database was well formated, it's easy to do it.
import_vardesc
Next step is to import this table with variables description to a format that vartors should handle.
Import the table in R as a dataframe
# Path to csv in the vartors package.
# It's a specific case. In real usage, use the path to your file instead
path_to_vardesc <- paste0(path.package("vartors"),
"/examples/variables_description_bad_database.csv")
# Import the csv
complete_vardesc <- read.csv(file = path_to_vardesc)
The result is show below in two parts
Complete variable description, first 8 columns :
column | originalname | varlabel | description | comment | unit | type | rname |
---|---|---|---|---|---|---|---|
A | subject | Unique id | NA | Don't perform any analysis. Must be unique | NA | character | id |
B | initial | initial | First letters of the name and surname | Not useful for analysis | NA | not_used | |
C | birth | Birthdate | NA | Use it to calculate age (reference date = 2014-07-31) | %Y-%m-%d | date | birthdate |
D | sex | Gender | NA | NA | NA | factor | sex |
E | height | Height | NA | NA | m | numeric | height |
F | weight | Weight | NA | NA | kg | numeric | weight |
G | siblings | Number of siblings | Original question : How many sisters and brothers do you have ? | count variable -> try a Poisson distribution | NA | integer | nb_siblings |
H | study_level | Maximal study level | Original question : What's your higher education level ? | NA | NA | ordered | study_level |
I | Q1 | Like chocolate | Original question : Do you like chocolate ? | Used a Likert scale. | NA | ordered | q1_chocolate |
J | Q2 | Like french cheese | Original question : Do you like french cheese ? | Used a Likert scale. | NA | ordered | q2_cheese |
Complete variable description, first 2 columns and last columns :
column | originalname | flevel1 | flabel1 | flevel2 | flabel2 | flevel3 | flabel3 | flevel4 | flabel4 | flevel5 |
---|---|---|---|---|---|---|---|---|---|---|
A | subject | NA | NA | NA | NA | NA | NA | NA | NA | NA |
B | initial | NA | NA | NA | NA | NA | NA | NA | NA | NA |
C | birth | NA | NA | NA | NA | NA | NA | NA | NA | NA |
D | sex | female | Women | male | Man | NA | NA | NA | NA | NA |
E | height | NA | NA | NA | NA | NA | NA | NA | NA | NA |
F | weight | NA | NA | NA | NA | NA | NA | NA | NA | NA |
G | siblings | NA | NA | NA | NA | NA | NA | NA | NA | NA |
H | study_level | primary | primary | secondary | secondary | superior | superior | NA | NA | NA |
I | Q1 | 0 | Strongly disagree | 1 | Disagree | 2 | Neither agree nor disagree | 3 | Agree | 4 |
J | Q2 | 0 | Strongly disagree | 1 | Disagree | 2 | Neither agree nor disagree | 3 | Agree | 4 |
This way each variable is explicit. Note that you should used the type not_used in order to discard variable.
Then you have to transform this data.frame
to a DatabaseDef
object, which could be understood by vartors. To to this, use import_vardef
suppressWarnings(
database_def_object <- import_vardef(complete_vardesc)
)
If you don't suppress the warnings, you will show some message that's say your rnames are not perfect but will work.
You could show this object
database_def_object
#> An object of class "DatabaseDef"
#> Slot "variables_definitions":
#> [[1]]
#> varlabel = Unique id
#> type = character
#> rname = id
#>
#> [[2]]
#> varlabel = initial
#> comment = First letters of the name and surname
#> type = not_used
#> rname = initial
#>
#> [[3]]
#> varlabel = Birthdate
#> unit = %d/%m/%Y
#> type = date
#> rname = birthdate
#>
#> [[4]]
#> varlabel = Gender
#> type = factor
#> rname = sex
#> levels = female, male
#> names = Women, Man
#>
#> [[5]]
#> varlabel = Height
#> type = numeric
#> rname = height
#>
#> [[6]]
#> varlabel = Weight
#> type = numeric
#> rname = weight
#>
#> [[7]]
#> varlabel = Number of siblings
#> comment = Original question : How many sisters and brothers do you have ?
#> type = integer
#> rname = nb_siblings
#>
#> [[8]]
#> varlabel = Maximal study level
#> comment = Original question : What's your higher education level ?
#> type = ordered
#> rname = study_level
#> levels = primary, secondary, superior
#> names = primary, secondary, superior
#>
#> [[9]]
#> varlabel = Like chocolate
#> comment = Original question : Do you like chocolate ?
#> type = ordered
#> rname = q1_chocolate
#> levels = 0, 1, 2, 3, 4
#> names = Strongly disagree, Disagree, Neither agree nor disagree, Agree, Strongly agree
#>
#> [[10]]
#> varlabel = Like french cheese
#> comment = Original question : Do you like french cheese ?
#> type = ordered
#> rname = q2_cheese
#> levels = 0, 1, 2, 3, 4
#> names = Strongly disagree, Disagree, Neither agree nor disagree, Agree, Strongly agree
You see that import_vardef
parsed the table of variable definition. For example if you don't give rname in your table of variable definition, it will find one by reading the varlabel column or originalname if there is no varlabel.
create_script
It's time to create a script with this. It's really easy. Just use the create_script
method
simple_script <- create_script(var_desc = database_def_object)
That's that simple! You have a script you can explore
simple_script
#> #- Start of the script in R -#
#> ######### Importation script ##########
#> # import data
#> raw_data <- read.csv("rep_path_to_database")
#>
#> # Change headers
#> names(raw_data) <- c('id', 'initial', 'birthdate', 'sex', 'height', 'weight', 'nb_siblings', 'study_level', 'q1_chocolate', 'q2_cheese')
#>
#> # Make a copy
#> clean_data <- raw_data
#>
#>
#>
#> ####### Clean the variable initial #####
#> # The variable initial is not used for analysis
#> clean_data$initial <- NULL
#>
#>
#> ####### Clean the variable birthdate #####
#>
#> # explore the raw data
#> head(raw_data$birthdate)
#> str(raw_data$birthdate)
#>
#> # set this variable as a date
#> clean_data$birthdate <- as.Date(raw_data$birthdate, format="%d/%m/%Y")
#> # set the label
#> attr(clean_data$birthdate, "label") <- "Birthdate"
#>
#> head(clean_data$birthdate)
#> str(clean_data$birthdate)
#> summary(clean_data$birthdate)
#>
#> # number of NA
#>
#> sum(is.na(clean_data$birthdate))
#>
#>
#> ####### Clean the variable sex #####
#>
#> # explore the raw data
#> head(raw_data$sex)
#> str(raw_data$sex)
#>
#> # Set rep_varlable as a factor
#> clean_data$sex <- factor(
#> x = raw_data$sex,
#> levels = c('female', 'male'),
#> labels = c('Women', 'Man')
#> )
#> # set the label
#> attr(clean_data$sex, "label") <- "Gender"
#>
#> head(clean_data$sex)
#> str(clean_data$sex)
#> summary(clean_data$sex)
#>
#> # number of NA
#>
#> sum(is.na(clean_data$sex))
#> # Make a plot
#> plot(clean_data$sex)
#>
#>
#> ####### Clean the variable height #####
#>
#> # explore the raw data
#> head(raw_data$height)
#> str(raw_data$height)
#>
#> # Set rep_varlable as a numeric
#> clean_data$height <- as.numeric(raw_data$height)
#> # set the label
#> attr(clean_data$height, "label") <- "Height"
#>
#> head(clean_data$height)
#> str(clean_data$height)
#> summary(clean_data$height)
#>
#> # number of NA
#>
#> sum(is.na(clean_data$height))
#> hist(clean_data$height)
#>
#>
#> ####### Clean the variable weight #####
#>
#> # explore the raw data
#> head(raw_data$weight)
#> str(raw_data$weight)
#>
#> # Set rep_varlable as a numeric
#> clean_data$weight <- as.numeric(raw_data$weight)
#> # set the label
#> attr(clean_data$weight, "label") <- "Weight"
#>
#> head(clean_data$weight)
#> str(clean_data$weight)
#> summary(clean_data$weight)
#>
#> # number of NA
#>
#> sum(is.na(clean_data$weight))
#> hist(clean_data$weight)
#>
#>
#> ####### Clean the variable nb_siblings #####
#>
#> # explore the raw data
#> head(raw_data$nb_siblings)
#> str(raw_data$nb_siblings)
#>
#> # Set rep_varlable as an integer
#> clean_data$nb_siblings <- as.integer(raw_data$nb_siblings)
#> # set the label
#> attr(clean_data$nb_siblings, "label") <- "Number of siblings"
#>
#> head(clean_data$nb_siblings)
#> str(clean_data$nb_siblings)
#> summary(clean_data$nb_siblings)
#>
#> # number of NA
#>
#> sum(is.na(clean_data$nb_siblings))
#> hist(clean_data$nb_siblings)
#>
#>
#> clean_data$study_level <- factor(
#> x = raw_data$study_level,
#> levels = c('primary', 'secondary', 'superior'),
#> labels = c('primary', 'secondary', 'superior'),
#> ordered = TRUE
#> )
#>
#>
#> clean_data$q1_chocolate <- factor(
#> x = raw_data$q1_chocolate,
#> levels = c('0', '1', '2', '3', '4'),
#> labels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'),
#> ordered = TRUE
#> )
#>
#>
#> clean_data$q2_cheese <- factor(
#> x = raw_data$q2_cheese,
#> levels = c('0', '1', '2', '3', '4'),
#> labels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'),
#> ordered = TRUE
#> )
#> ##### watch all ######
#>
#> str(clean_data)
#>
#> ####### Save the cleaned data ######
#> save(clean_data, file="clean_data.Rdata")
#> #- End of the script in R -#
and you can write it in the this script in a file
write_file(object = simple_script, filepath = "my_import_script1.R")
One time you have your table with variables definition loaded in a data.frame, it's possible to do all the process in one line.
Remember, before we just imported it to a data.frame called complete_vardesc.
write_file(create_script(var_desc = complete_vardesc), filepath = "my_import_script1.R")
import_template
One of the strength of vartors, is his template system. Just before we created a script in R. But maybe you want it in .Rmd and then produce a report using knitr
?
And let's say you are a French user, then we want one in french.
To see what are the builtin template available, read the documentation of import_template
function
?import_template
In the Details section, one can show there is a template that should match to our needs template_fr.Rmd. To import it, use import_template
function and put the name of the builtin template in the builtin argument
rmd_template <- import_template(builtin = "template_fr.Rmd")
Then recreate a script with this template
rmd_script <- create_script(var_desc = database_def_object, template = rmd_template)
And you have your script in Rmd !
rmd_script
#> #- Start of the script in Rmd -#
#> ---
#> title: "Import des données"
#> author: "Nom de l'auteur"
#> date: "30 juin 2014"
#> output:
#> html_document:
#> number_sections: yes
#> toc: yes
#> pdf_document:
#> toc: yes
#> ---
#>
#> ```{r, echo=FALSE}
#> # Warning: encoding = UTF-8
#> ```
#>
#> ```{r, echo = FALSE, message = FALSE}
#> # Load ggplot2 package to plot
#> library(ggplot2)
#>
#> # Create a label function to access easly to labels
#> # without using label function from packages
#> # like those from Hmisc
#> label <- function(object) attr(x = object, which = "label")
#>
#> ```
#>
#> # Import du tableau de données
#>
#> Tableau de données au format Excel.
#>
#> ```{r importcsv}
#> library(openxlsx)
#> raw_data <- read.xlsx( "rep_path_to_database")
#> ```
#>
#> Changer les noms de colonnes avec ceux définis dans le cahier de variable
#>
#> ```{r changenoms}
#> colnames(raw_data) <- c('id', 'initial', 'birthdate', 'sex', 'height', 'weight', 'nb_siblings', 'study_level', 'q1_chocolate', 'q2_cheese')
#> ```
#>
#> Créer une copie du tableau de variable pour le nettoyage
#>
#> ```{r copie}
#> clean_data <- raw_data
#> ```
#>
#> # Nettoyage variable par variable
#>
#>
#> ## initial
#> ### Description
#> - Nom complet : initial
#> - Nom dans R : initial
#> - Description : NA
#> - Unitée : NA
#> - Type : not_used
#> - Commentaires : First letters of the name and surname
#>
#> ### Exploration des données brutes
#> ```{r brute_initial}
#> head(raw_data$initial, 10)
#> ```
#>
#> ### Transformation
#> Variable non utilisée dans l'analyse. A supprimer
#> ```{r supprime_initial}
#> clean_data$initial <- NULL
#> ```
#>
#>
#> ## birthdate
#> ### Description
#> - Nom complet : Birthdate
#> - Nom dans R : birthdate
#> - Description : NA
#> - Unitée : %d/%m/%Y
#> - Type : date
#> - Commentaires : NA
#>
#> ### Exploration des données brutes
#> ```{r brute_birthdate}
#> head(raw_data$birthdate, 10)
#> ```
#>
#> ### Transformation
#> ```{r transfo_birthdate}
#> clean_data$birthdate <- as.Date(
#> x = raw_data$birthdate,
#> format = "%d/%m/%Y"
#> )
#> ```
#>
#> Ajouter un _label_ (étiquette).
#> ```{r label_birthdate}
#> attr(clean_data$birthdate, "label") <- "Birthdate"
#> ```
#>
#> ### Vérifier
#> ```{r check_birthdate}
#> # Premières données
#> head(clean_data$birthdate, 10)
#>
#> # Résumé
#> summary(clean_data$birthdate)
#>
#> # Graphique
#> qplot(clean_data$birthdate, xlab = label(clean_data$birthdate))
#> ```
#>
#>
#>
#> ## sex
#> ### Description
#> - Nom complet : Gender
#> - Nom dans R : sex
#> - Description : NA
#> - Unitée : NA
#> - Type : factor
#> - Commentaires : NA
#>
#> ### Exploration des données brutes
#> ```{r brute_sex}
#> head(raw_data$sex, 10)
#> ```
#>
#> ### Transformation
#> ```{r transfo_sex}
#> clean_data$sex <- factor(
#> x = raw_data$sex,
#> levels = c('female', 'male'),
#> labels = c('Women', 'Man')
#> )
#> ```
#>
#> Ajouter un _label_ (étiquette).
#> ```{r label_sex}
#> attr(clean_data$sex, "label") <- "Gender"
#> ```
#>
#> ### Vérifier
#> ```{r check_sex}
#> # Premières données
#> head(clean_data$sex, 10)
#>
#> # Résumé
#> summary(clean_data$sex)
#>
#> # Graphique
#> qplot(clean_data$sex, xlab = label(clean_data$sex))
#> ```
#>
#>
#>
#> ## height
#> ### Description
#> - Nom complet : Height
#> - Nom dans R : height
#> - Description : NA
#> - Unitée : NA
#> - Type : numeric
#> - Commentaires : NA
#>
#> ### Exploration des données brutes
#> ```{r brute_height}
#> head(raw_data$height, 10)
#> ```
#>
#> ### Transformation
#> ```{r transfo_height}
#> clean_data$height <- as.numeric(raw_data$height)
#> ```
#>
#> Ajouter un _label_ (étiquette).
#> ```{r label_height}
#> attr(clean_data$height, "label") <- "Height"
#> ```
#>
#> ### Vérifier
#> ```{r check_height}
#> # Premières données
#> head(clean_data$height, 10)
#>
#> # Résumé
#> summary(clean_data$height)
#>
#> # Graphique
#> qplot(clean_data$height, xlab = label(clean_data$height))
#> ```
#>
#>
#>
#> ## weight
#> ### Description
#> - Nom complet : Weight
#> - Nom dans R : weight
#> - Description : NA
#> - Unitée : NA
#> - Type : numeric
#> - Commentaires : NA
#>
#> ### Exploration des données brutes
#> ```{r brute_weight}
#> head(raw_data$weight, 10)
#> ```
#>
#> ### Transformation
#> ```{r transfo_weight}
#> clean_data$weight <- as.numeric(raw_data$weight)
#> ```
#>
#> Ajouter un _label_ (étiquette).
#> ```{r label_weight}
#> attr(clean_data$weight, "label") <- "Weight"
#> ```
#>
#> ### Vérifier
#> ```{r check_weight}
#> # Premières données
#> head(clean_data$weight, 10)
#>
#> # Résumé
#> summary(clean_data$weight)
#>
#> # Graphique
#> qplot(clean_data$weight, xlab = label(clean_data$weight))
#> ```
#>
#>
#>
#> ## nb_siblings
#> ### Description
#> - Nom complet : Number of siblings
#> - Nom dans R : nb_siblings
#> - Description : NA
#> - Unitée : NA
#> - Type : integer
#> - Commentaires : Original question : How many sisters and brothers do you have ?
#>
#> ### Exploration des données brutes
#> ```{r brute_nb_siblings}
#> head(raw_data$nb_siblings, 10)
#> ```
#>
#> ### Transformation
#> ```{r transfo_nb_siblings}
#> clean_data$nb_siblings <- as.integer(raw_data$nb_siblings)
#> ```
#>
#> Ajouter un _label_ (étiquette).
#> ```{r label_nb_siblings}
#> attr(clean_data$nb_siblings, "label") <- "Number of siblings"
#> ```
#>
#> ### Vérifier
#> ```{r check_nb_siblings}
#> # Premières données
#> head(clean_data$nb_siblings, 10)
#>
#> # Résumé
#> summary(clean_data$nb_siblings)
#>
#> # Graphique
#> qplot(clean_data$nb_siblings, xlab = label(clean_data$nb_siblings))
#> ```
#>
#>
#>
#> ## study_level
#> ### Description
#> - Nom complet : Maximal study level
#> - Nom dans R : study_level
#> - Description : NA
#> - Unitée : NA
#> - Type : ordered
#> - Commentaires : Original question : What's your higher education level ?
#>
#> ### Exploration des données brutes
#> ```{r brute_study_level}
#> head(raw_data$study_level, 10)
#> ```
#>
#> ### Transformation
#> ```{r transfo_study_level}
#> clean_data$study_level <- factor(
#> x = raw_data$study_level,
#> levels = c('primary', 'secondary', 'superior'),
#> labels = c('primary', 'secondary', 'superior'),
#> ordered = TRUE
#> )
#> ```
#>
#> Ajouter un _label_ (étiquette).
#> ```{r label_study_level}
#> attr(clean_data$study_level, "label") <- "Maximal study level"
#> ```
#>
#> ### Vérifier
#> ```{r check_study_level}
#> # Premières données
#> head(clean_data$study_level, 10)
#>
#> # Résumé
#> summary(clean_data$study_level)
#>
#> # Graphique
#> qplot(clean_data$study_level, xlab = label(clean_data$study_level))
#> ```
#>
#>
#>
#> ## q1_chocolate
#> ### Description
#> - Nom complet : Like chocolate
#> - Nom dans R : q1_chocolate
#> - Description : NA
#> - Unitée : NA
#> - Type : ordered
#> - Commentaires : Original question : Do you like chocolate ?
#>
#> ### Exploration des données brutes
#> ```{r brute_q1_chocolate}
#> head(raw_data$q1_chocolate, 10)
#> ```
#>
#> ### Transformation
#> ```{r transfo_q1_chocolate}
#> clean_data$q1_chocolate <- factor(
#> x = raw_data$q1_chocolate,
#> levels = c('0', '1', '2', '3', '4'),
#> labels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'),
#> ordered = TRUE
#> )
#> ```
#>
#> Ajouter un _label_ (étiquette).
#> ```{r label_q1_chocolate}
#> attr(clean_data$q1_chocolate, "label") <- "Like chocolate"
#> ```
#>
#> ### Vérifier
#> ```{r check_q1_chocolate}
#> # Premières données
#> head(clean_data$q1_chocolate, 10)
#>
#> # Résumé
#> summary(clean_data$q1_chocolate)
#>
#> # Graphique
#> qplot(clean_data$q1_chocolate, xlab = label(clean_data$q1_chocolate))
#> ```
#>
#>
#>
#> ## q2_cheese
#> ### Description
#> - Nom complet : Like french cheese
#> - Nom dans R : q2_cheese
#> - Description : NA
#> - Unitée : NA
#> - Type : ordered
#> - Commentaires : Original question : Do you like french cheese ?
#>
#> ### Exploration des données brutes
#> ```{r brute_q2_cheese}
#> head(raw_data$q2_cheese, 10)
#> ```
#>
#> ### Transformation
#> ```{r transfo_q2_cheese}
#> clean_data$q2_cheese <- factor(
#> x = raw_data$q2_cheese,
#> levels = c('0', '1', '2', '3', '4'),
#> labels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'),
#> ordered = TRUE
#> )
#> ```
#>
#> Ajouter un _label_ (étiquette).
#> ```{r label_q2_cheese}
#> attr(clean_data$q2_cheese, "label") <- "Like french cheese"
#> ```
#>
#> ### Vérifier
#> ```{r check_q2_cheese}
#> # Premières données
#> head(clean_data$q2_cheese, 10)
#>
#> # Résumé
#> summary(clean_data$q2_cheese)
#>
#> # Graphique
#> qplot(clean_data$q2_cheese, xlab = label(clean_data$q2_cheese))
#> ```
#>
#> # Exploration globale
#>
#> Recherche de données manquantes graphiquement
#> ```{r }
#> library(dfexplore)
#> dfplot(clean_data)
#> ```
#>
#> # Sauvegarder
#>
#> ```{r}
#> donnees <- clean_data
#> save(donnees, file = "donnees.RData")
#> ```
#> #- End of the script in Rmd -#
export_template
If you don't find any template that feet your need in the builtin ones, create one!
Template are just R
and Rmd
files with some delimiters. More information about how to produce your template in the documentation
?template
Basicaly, the idea is to export an builtin template and change it. To to this, use export_template
. For example, if you want to modify the preceding template, export it with :
export_template(builtin = "template_fr.Rmd", to = "mytemplate.Rmd")
When you are happy with your template, import it with import_template
.
my_template <- import_template(path = "mytemplate.Rmd")
and then use it to produce your script
create_script(var_desc = database_def_object, template = my_template)
The job of vartors ends with the script skeleton creation. We you choose to produce a script with this package and not directly the cleaned database to give the availabilty to the user to adapt perfecly the importation phase to his needs. Then what to do with this script?
knitr
to produce a report in HTML, PDF or other format.