BiVariAn Description

BiVarian Description

BiVariAn is a package designed to facilitate bivariate and multivariate statistical analysis. It includes various functions that enhance conventional workflows by incorporating loops for different types of statistical analyses, such as correlation analysis, two-group comparisons, and multi-group comparisons. Each function automatically performs parametric and non-parametric tests based on the specific situation, allowing for user-defined arguments that can be utilized by the methods within the function. In addition to bivariate analyses, BiVariAn can also automate predictor selection processes according to statistical significance levels based on the p-value. This is achieved through functions such as step_bw_p and step_bw_firth. Furthermore, the package allows for the automated creation of various types of graphs, with user-customizable arguments, including density plots, bar charts, box plots, violin plots, and pie charts. Thus, the automation of extensive processes is streamlined thanks to the functions provided in this package.

library(BiVariAn)
#> Registered S3 method overwritten by 'openxlsx':
#>   method               from         
#>   as.character.formula formula.tools

Loading the package

library(BiVariAn)

Render an automatic Shapiro-Wilk’s table of a simple dataset

auto_shapiro_raw(cars)

Variable

p_shapiro

Normality

speed

0.45763

Normal

dist

0.0391

Non-normal


shapiro.test(cars$speed)
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  cars$speed
#> W = 0.97765, p-value = 0.4576

Return Shapiro-Wilk’s results as a dataframe

auto_shapiro_raw(cars, flextableformat = FALSE)
#>       Variable p_shapiro  Normality
#> speed    speed   0.45763     Normal
#> dist      dist    0.0391 Non-normal

Render an automatic Shapiro-Wilk’s table of a more complex dataset

# Load riskCommunicator to access Framingham dataset
library(riskCommunicator)
# Load dplyr to select specific columns
library(dplyr)
data(cvdd)

For shapiro.test, sample size must be between 3 and 5000

Let’s select only 300 observations (arbitrary)

set.seed(081224)
ex_sample<-slice_sample(cvdd, n=300)

Now, let’s select specific columns from the database


auto_shapiro_raw(ex_sample %>% select(TOTCHOL, SYSBP, DIABP, BMI, HEARTRTE))

Variable

p_shapiro

Normality

TOTCHOL

0.00789

Non-normal

SYSBP

<0.001*

Non-normal

DIABP

<0.001*

Non-normal

BMI

<0.001*

Non-normal

HEARTRTE

<0.001*

Non-normal

Common use of shapiro.test

shapiro.test(ex_sample$TOTCHOL)
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  ex_sample$TOTCHOL
#> W = 0.98654, p-value = 0.007891

Return the same Shapiro-Wilk’s results as a dataframe

auto_shapiro_raw(ex_sample %>% select(TOTCHOL, SYSBP, DIABP, BMI, HEARTRTE), flextableformat = FALSE)
#>          Variable p_shapiro  Normality
#> TOTCHOL   TOTCHOL   0.00789 Non-normal
#> SYSBP       SYSBP   <0.001* Non-normal
#> DIABP       DIABP   <0.001* Non-normal
#> BMI           BMI   <0.001* Non-normal
#> HEARTRTE HEARTRTE   <0.001* Non-normal