| Type: | Package | 
| Title: | Simplifies Exploratory Data Analysis | 
| Version: | 1.3.6 | 
| Description: | Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis. | 
| License: | MIT + file LICENSE | 
| URL: | https://rolkra.github.io/explore/, https://github.com/rolkra/explore | 
| BugReports: | https://github.com/rolkra/explore/issues | 
| Depends: | R (≥ 3.5.0) | 
| Imports: | cli, dplyr (≥ 1.1.0), DT (≥ 0.3.0), forcats (≥ 1.0.0), ggplot2 (≥ 3.4.0), grDevices, gridExtra, magrittr, palmerpenguins, plotly, rlang (≥ 1.1.0), rmarkdown, rpart, rpart.plot, shiny, stats, stringr, tibble | 
| Suggests: | knitr, MASS, randomForest, xgboost, testthat (≥ 3.0.0) | 
| VignetteBuilder: | knitr | 
| Config/testthat/edition: | 3 | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-10-02 12:16:19 UTC; rolkra | 
| Author: | Roland Krasser [aut, cre] | 
| Maintainer: | Roland Krasser <roland.krasser@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-10-02 12:50:02 UTC | 
explore: Simplifies Exploratory Data Analysis
Description
 
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Author(s)
Maintainer: Roland Krasser roland.krasser@gmail.com
See Also
Useful links:
- Report bugs at https://github.com/rolkra/explore/issues 
A/B testing
Description
A/B testing
Usage
abtest(data, expr, n, target, sign_level = 0.05, color = "grey")
Arguments
| data | A dataset. If no data is provided, a shiny app is launched | 
| expr | Logical expression, that return in a FALSE/TRUE | 
| n | A Variable for number of observations (count data) | 
| target | Target variable | 
| sign_level | Significance Level (typical 0.01/0.05/0.10) | 
| color | Fill color of bar/violin-plot | 
Value
Plot that shows if difference is significant
Examples
## Using chi2-test or t-test depending on target type
data <- create_data_buy(obs = 100)
abtest(data, female_ind == 1, target = buy)  # chi2 test
abtest(data, city_ind == 1, target = age)    # t test
## If small number of observations, Fisher's Exact test
## is used for a binary target (if <= 5 observations in a subgroup)
data <- create_data_buy(obs = 25, seed = 1)
abtest(data, female_ind == 1, target = buy)  # Fisher's Exact test
A/B testing interactive
Description
Launches a shiny app to A/B test
Usage
abtest_shiny(
  size_a = 100,
  size_b = 100,
  success_a = 10,
  success_b = 20,
  success_unit = "percent",
  sign_level = 0.05
)
Arguments
| size_a | Size of Group A | 
| size_b | Size of Group B | 
| success_a | Success of Group A | 
| success_b | Success of Group B | 
| success_unit | "count" | "percent" | 
| sign_level | Significance Level (typical 0.01/0.05/0.10) | 
Examples
# Only run examples in interactive R sessions
if (interactive())  {
   abtest_shiny()
}
A/B testing comparing two mean
Description
A/B testing comparing two mean
Usage
abtest_targetnum(data, expr, target, sign_level = 0.05, color = "grey")
Arguments
| data | A dataset | 
| expr | Expression, that results in a FALSE/TRUE | 
| target | Target variable (must be numeric) | 
| sign_level | Significance Level (typical 0.01/0.05/0.10) | 
| color | fill color | 
Value
Plot that shows if difference is significant
Examples
data <- create_data_buy(obs = 100)
abtest(data, city_ind == 1, target = age)
A/B testing comparing percent per group
Description
A/B testing comparing percent per group
Usage
abtest_targetpct(
  data,
  expr,
  n,
  target,
  sign_level = 0.05,
  group_label,
  ab_label = FALSE,
  color = "grey"
)
Arguments
| data | A dataset | 
| expr | Expression, that results in a FALSE/TRUE | 
| n | A Variable for number of observations (count data) | 
| target | Target variable (must be 0/1 or FALSE/TRUE) | 
| sign_level | Significance Level (typical 0.01/0.05/0.10) | 
| group_label | Label of groups (default = expr) | 
| ab_label | Label Groups as A and B (default = FALSE) | 
| color | color of bar | 
Value
Plot that shows if difference is significant
Examples
data <- create_data_buy(obs = 100)
abtest(data, female_ind == 1, target = buy)
abtest(data, age >= 40, target = buy)
Add a variable id at first column in dataset
Description
Add a variable id at first column in dataset
Usage
add_var_id(data, name = "id", overwrite = FALSE)
Arguments
| data | A dataset | 
| name | Name of new variable (as string) | 
| overwrite | Can new id variable overwrite an existing variable in dataset? | 
Value
Data set containing new id variable
Examples
library(magrittr)
iris %>% add_var_id() %>% head()
iris %>% add_var_id(name = "iris_nr") %>% head()
Add a random 0/1 variable to dataset
Description
Add a random 0/1 variable to dataset
Usage
add_var_random_01(
  data,
  name = "random_01",
  prob = c(0.5, 0.5),
  overwrite = TRUE,
  seed
)
Arguments
| data | A dataset | 
| name | Name of new variable (as string) | 
| prob | Vector of probabilities | 
| overwrite | Can new random variable overwrite an existing variable in dataset? | 
| seed | Seed for random number generation (integer) | 
Value
Dataset containing new random variable
Examples
library(magrittr)
iris %>% add_var_random_01() %>% head()
iris %>% add_var_random_01(name = "my_var") %>% head()
Add a random categorical variable to dataset
Description
Add a random categorical variable to dataset
Usage
add_var_random_cat(
  data,
  name = "random_cat",
  cat = LETTERS[1:6],
  prob,
  overwrite = TRUE,
  seed
)
Arguments
| data | A dataset | 
| name | Name of new variable (as string) | 
| cat | Vector of categories | 
| prob | Vector of probabilities | 
| overwrite | Can new random variable overwrite an existing variable in dataset? | 
| seed | Seed for random number generation (integer) | 
Value
Dataset containing new random variable
Examples
library(magrittr)
iris %>% add_var_random_cat() %>% head()
iris %>% add_var_random_cat(name = "my_cat") %>% head()
iris %>% add_var_random_cat(cat = c("Version A", "Version B")) %>% head()
iris %>% add_var_random_cat(cat = c(1,2,3,4,5)) %>% head()
Add a random double variable to dataset
Description
Add a random double variable to dataset
Usage
add_var_random_dbl(
  data,
  name = "random_dbl",
  min_val = 0,
  max_val = 100,
  overwrite = TRUE,
  seed
)
Arguments
| data | A dataset | 
| name | Name of new variable (as string) | 
| min_val | Minimum random integers | 
| max_val | Maximum random integers | 
| overwrite | Can new random variable overwrite an existing variable in dataset? | 
| seed | Seed for random number generation (integer) | 
Value
Dataset containing new random variable
Examples
library(magrittr)
iris %>% add_var_random_dbl() %>% head()
iris %>% add_var_random_dbl(name = "random_var") %>% head()
iris %>% add_var_random_dbl(min_val = 1, max_val = 10) %>% head()
Add a random integer variable to dataset
Description
Add a random integer variable to dataset
Usage
add_var_random_int(
  data,
  name = "random_int",
  min_val = 1,
  max_val = 10,
  overwrite = TRUE,
  seed
)
Arguments
| data | A dataset | 
| name | Name of new variable (as string) | 
| min_val | Minimum random integers | 
| max_val | Maximum random integers | 
| overwrite | Can new random variable overwrite an existing variable in dataset? | 
| seed | Seed for random number generation (integer) | 
Value
Dataset containing new random variable
Examples
library(magrittr)
iris %>% add_var_random_int() %>% head()
iris %>% add_var_random_int(name = "random_var") %>% head()
iris %>% add_var_random_int(min_val = 1, max_val = 10) %>% head()
Add a random moon variable to dataset
Description
Add a random moon variable to dataset
Usage
add_var_random_moon(data, name = "random_moon", overwrite = TRUE, seed)
Arguments
| data | A dataset | 
| name | Name of new variable (as string) | 
| overwrite | Can new random variable overwrite an existing variable in dataset? | 
| seed | Seed for random number generation (integer) | 
Value
Dataset containing new random variable
Examples
library(magrittr)
iris %>% add_var_random_moon() %>% head()
Add a random starsign variable to dataset
Description
Add a random starsign variable to dataset
Usage
add_var_random_starsign(
  data,
  name = "random_starsign",
  lang = "en",
  overwrite = TRUE,
  seed
)
Arguments
| data | A dataset | 
| name | Name of new variable (as string) | 
| lang | Language used for starsign (en = English, de = Deutsch, es = Espanol) | 
| overwrite | Can new random variable overwrite an existing variable in dataset? | 
| seed | Seed for random number generation (integer) | 
Value
Dataset containing new random variable
Examples
library(magrittr)
iris %>% add_var_random_starsign() %>% head()
iris %>% add_var_random_starsign(lang = "de") %>% head()
Balance target variable
Description
Balances the target variable in your dataset using downsampling. Target must be 0/1, FALSE/TRUE ore no/yes
Usage
balance_target(data, target, min_prop = 0.1, seed)
Arguments
| data | A dataset | 
| target | Target variable (0/1, TRUE/FALSE, yes/no) | 
| min_prop | Minimum proportion of one of the target categories | 
| seed | Seed for random number generator | 
Value
Data
Examples
iris$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
balanced <- balance_target(iris, target = is_versicolor, min_prop = 0.5)
describe(balanced, is_versicolor)
Check vector for low variance
Description
Check vector for low variance
Usage
check_vec_low_variance(values, max_prop = 0.99)
Arguments
| values | Vector of values | 
| max_prop | Maximum proportion of values without variance | 
Value
TRUE/FALSE (low variance)
Examples
## Not run: 
values <- c(1, rep(0 ,1000))
check_vec_low_variance(values, max_prop = 0.9)
## End(Not run)
Clean variable
Description
Clean variable (replace NA values, set min_val and max_val)
Usage
clean_var(
  data,
  var,
  na = NA,
  min_val = NA,
  max_val = NA,
  max_cat = NA,
  rescale01 = FALSE,
  simplify_text = FALSE,
  name = NA
)
Arguments
| data | A dataset | 
| var | Name of variable | 
| na | Value that replaces NA | 
| min_val | All values < min_val are converted to min_val (var numeric or character) | 
| max_val | All values > max_val are converted to max_val (var numeric or character) | 
| max_cat | Maximum number of different factor levels for categorical variable (if more, .OTHER is added) | 
| rescale01 | IF TRUE, value is rescaled between 0 and 1 (var must be numeric) | 
| simplify_text | If TRUE, a character variable is simplified (trim, upper, ...) | 
| name | New name of variable (as string) | 
Value
Dataset
Examples
library(magrittr)
iris %>% clean_var(Sepal.Width, max_val = 3.5, name = "sepal_width") %>% head()
iris %>% clean_var(Sepal.Width, rescale01 = TRUE) %>% head()
Adds percentage to dplyr::count()
Description
Adds variables total and pct (percentage) to dplyr::count()
Usage
count_pct(data, ...)
Arguments
| data | A dataset | 
| ... | Other parameters passed to count() | 
Value
Dataset
Examples
count_pct(iris, Species)
Create data of A/B testing
Description
Data that can be used for unit-testing or teaching
Usage
create_data_abtest(
  n_a = 100,
  n_b = 100,
  success_a = 10,
  success_b = 5,
  success_unit = "count",
  count = TRUE
)
Arguments
| n_a | Total size of group A | 
| n_b | Total size of group B | 
| success_a | Success in group A | 
| success_b | Success in group B | 
| success_unit | Unit ("count"|"percent") | 
| count | Create as count-data (FALSE|TRUE) | 
Value
A dataset as tibble
Examples
library(dplyr)
create_data_abtest() %>% abtest()
create_data_abtest(
  n_a = 100,
  n_b = 100,
  success_a = 20,
  success_b = 30,
  success_unit = "count"
) %>% abtest()
Create data app
Description
Artificial data that can be used for unit-testing or teaching
Usage
create_data_app(obs = 1000, add_id = FALSE, seed = 123)
Arguments
| obs | Number of observations | 
| add_id | Add an id-variable to data? | 
| seed | Seed for randomization (integer) | 
Value
A dataset as tibble
Examples
create_data_app()
Create data buy
Description
Artificial data that can be used for unit-testing or teaching
Usage
create_data_buy(
  obs = 1000,
  target_name = "buy",
  factorise_target = FALSE,
  target1_prob = 0.5,
  add_extreme = TRUE,
  flip_gender = FALSE,
  add_id = FALSE,
  seed = 123
)
Arguments
| obs | Number of observations | 
| target_name | Variable name of target | 
| factorise_target | Should target variable be factorised? (from 0/1 to factor no/yes)? | 
| target1_prob | Probability that target = 1 | 
| add_extreme | Add an observation with extreme values? | 
| flip_gender | Should Male/Female be flipped in data? | 
| add_id | Add an id-variable to data? | 
| seed | Seed for randomization | 
Details
Variables in dataset:
- id = Identifier 
- period = Year & Month (YYYYMM) 
- city_ind = Indicating if customer is residing in a city (1 = yes, 0 = no) 
- female_ind = Gender of customer is female (1 = yes, 0 = no) 
- fixedvoice_ind = Customer has a fixed voice product (1 = yes, 0 = no) 
- fixeddata_ind = Customer has a fixed data product (1 = yes, 0 = no) 
- fixedtv_ind = Customer has a fixed TV product (1 = yes, 0 = no) 
- mobilevoice_ind = Customer has a mobile voice product (1 = yes, 0 = no) 
- mobiledata_prd = Customer has a mobile data product (NO/MOBILE STICK/BUSINESS) 
- bbi_speed_ind = Customer has a Broadband Internet (BBI) with extra speed 
- bbi_usg_gb = Broadband Internet (BBI) usage in Gigabyte (GB) last month 
- hh_single = Expected to be a Single Household (1 = yes, 0 = no) 
Target in dataset:
- buy (may be renamed) = Did customer buy a new product in next month? (1 = yes, 0 = no) 
Value
A dataset as tibble
Examples
create_data_buy()
Create data churn
Description
Artificial data that can be used for unit-testing or teaching
Usage
create_data_churn(
  obs = 1000,
  target_name = "churn",
  factorise_target = FALSE,
  target1_prob = 0.4,
  add_id = FALSE,
  seed = 123
)
Arguments
| obs | Number of observations | 
| target_name | Variable name of target | 
| factorise_target | Should target variable be factorised? | 
| target1_prob | Probability that target = 1 | 
| add_id | Add an id-variable to data? | 
| seed | Seed for randomization (integer) | 
Value
A dataset as tibble
Examples
create_data_churn()
Create an empty dataset
Description
Create an empty dataset
Usage
create_data_empty(obs = 1000, add_id = FALSE)
Arguments
| obs | Number of observations | 
| add_id | Add an id | 
Value
Dataset as tibble
Examples
create_data_empty(obs = 100)
create_data_empty(obs = 100, add_id = TRUE)
Create data esoteric
Description
Random data that can be used for unit-testing or teaching
Usage
create_data_esoteric(obs = 1000, add_id = FALSE, seed = 123)
Arguments
| obs | Number of observations | 
| add_id | Add an id-variable to data? | 
| seed | Seed for randomization | 
Details
Variables in dataset:
- id = Identifier 
- starsign = random starsign 
- chinese = random chinese zodiac 
- moon = random moon phase 
- blood = random blood type 
- fingers_crossed = random fingers crossed (1 = yes, 0 = no) 
- success = random success (1 = yes, 0 = no) 
Value
A dataset as tibble
Examples
create_data_esoteric(obs = 100)
Create data newsletter
Description
Artificial data that can be used for unit-testing or teaching (fairness & AI bias)
Usage
create_data_newsletter(obs = 1000, add_id = FALSE, seed = 123)
Arguments
| obs | Number of observations | 
| add_id | Add an id-variable to data? | 
| seed | Seed for randomization (integer) | 
Value
A dataset as tibble
Examples
create_data_newsletter()
Create data person
Description
Artificial data that can be used for unit-testing or teaching
Usage
create_data_person(obs = 1000, add_id = FALSE, seed = 123)
Arguments
| obs | Number of observations | 
| add_id | Add an id | 
| seed | Seed for randomization (integer) | 
Value
A dataset as tibble
Examples
create_data_person()
Create data random
Description
Random data that can be used for unit-testing or teaching
Usage
create_data_random(
  obs = 1000,
  vars = 10,
  target_name = "target_ind",
  factorise_target = FALSE,
  target1_prob = 0.5,
  add_id = TRUE,
  seed = 123
)
Arguments
| obs | Number of observations | 
| vars | Number of variables | 
| target_name | Variable name of target | 
| factorise_target | Should target variable be factorised? (from 0/1 to facotr no/yes)? | 
| target1_prob | Probability that target = 1 | 
| add_id | Add an id-variable to data? | 
| seed | Seed for randomization | 
Details
Variables in dataset:
- id = Identifier 
- var_X = variable containing values between 0 and 100 
Target in dataset:
- target_ind (may be renamed) = random values (1 = yes, 0 = no) 
Value
A dataset as tibble
Examples
create_data_random(obs = 100, vars = 5)
Create data unfair
Description
Artificial data that can be used for unit-testing or teaching (fairness & AI bias)
Usage
create_data_unfair(
  obs = 1000,
  target_name = "target_ind",
  factorise_target = FALSE,
  target1_prob = 0.25,
  add_id = FALSE,
  seed = 123
)
Arguments
| obs | Number of observations | 
| target_name | Variable name of target | 
| factorise_target | Should target variable be factorised? | 
| target1_prob | Probability that target = 1 | 
| add_id | Add an id-variable to data? | 
| seed | Seed for randomization (integer) | 
Value
A dataset as tibble
Examples
create_data_unfair()
Generate a notebook
Description
Generate an RMarkdown Notebook template for a report. You must provide a output-directory (parameter output_dir). The default file-name is "notebook-explore.Rmd" (may overwrite existing file with same name)
Usage
create_notebook_explore(output_file = "notebook-explore.Rmd", output_dir)
Arguments
| output_file | Filename of the html report | 
| output_dir | Directory where to save the html report | 
Examples
create_notebook_explore(output_file = "explore.Rmd", output_dir = tempdir())
Cut a variable
Description
Cut a variable
Usage
cut_vec_num_avg(values, bins = 8)
Arguments
| values | Variable | 
| bins | Number of bins | 
Value
Data frame
Create a data dictionary Markdown file
Description
Create a data dictionary Markdown file
Usage
data_dict_md(
  data,
  title = "",
  description = NA,
  output_file = "data_dict.md",
  output_dir
)
Arguments
| data | A dataframe (data dictionary for all variables) | 
| title | Title of the data dictionary | 
| description | Detailed description of variables in data (dataframe with columns 'variable' and 'description') | 
| output_file | Output filename for Markdown file | 
| output_dir | Directory where the Markdown file is saved | 
Value
Create Markdown file
Examples
# Data dictionary of a dataframe
data_dict_md(iris,
             title = "iris flower data set",
             output_dir = tempdir())
# Data dictionary of a dataframe with additional description of variables
description <- data.frame(
                 variable = c("Species"),
                 description = c("Species of Iris flower"))
data_dict_md(iris,
             title = "iris flower data set",
             description = description,
             output_dir = tempdir())
decrypt text
Description
decrypt text
Usage
decrypt(text, codeletters = c(toupper(letters), letters, 0:9), shift = 18)
Arguments
| text | A text (character) | 
| codeletters | A string of letters that are used for decryption | 
| shift | Number of elements shifted | 
Value
Decrypted text
Examples
decrypt("zw336 E693v")
Describe a dataset or variable
Description
Describe a dataset or variable (depending on input parameters)
Usage
describe(data, var, n, target, out = "text", ...)
Arguments
| data | A dataset | 
| var | A variable of the dataset | 
| n | Weights variable for count-data | 
| target | Target variable (0/1 or FALSE/TRUE) | 
| out | Output format ("text"|"list") of variable description | 
| ... | Further arguments | 
Value
Description as table, text or list
Examples
# Load package
library(magrittr)
# Describe a dataset
iris %>% describe()
# Describe a variable
iris %>% describe(Species)
iris %>% describe(Sepal.Length)
Describe all variables of a dataset
Description
Describe all variables of a dataset
Usage
describe_all(data, out = "large")
Arguments
| data | A dataset | 
| out | Output format ("small"|"large") | 
Value
Dataset (tibble)
Examples
describe_all(iris)
Describe categorical variable
Description
Describe categorical variable
Usage
describe_cat(data, var, n, max_cat = 10, out = "text", margin = 0)
Arguments
| data | A dataset | 
| var | Variable or variable name | 
| n | Weights variable for count-data | 
| max_cat | Maximum number of categories displayed | 
| out | Output format ("text"|"list"|"tibble"|"df") | 
| margin | Left margin for text output (number of spaces) | 
Value
Description as text or list
Examples
describe_cat(iris, Species)
Describe numerical variable
Description
Describe numerical variable
Usage
describe_num(data, var, n, out = "text", margin = 0)
Arguments
| data | A dataset | 
| var | Variable or variable name | 
| n | Weights variable for count-data | 
| out | Output format ("text"|"list") | 
| margin | Left margin for text output (number of spaces) | 
Value
Description as text or list
Examples
describe_num(iris, Sepal.Length)
Describe table
Description
Describe table (e.g. number of rows and columns of dataset)
Usage
describe_tbl(data, n, target, out = "text")
Arguments
| data | A dataset | 
| n | Weights variable for count-data | 
| target | Target variable (binary) | 
| out | Output format ("text"|"list") | 
Value
Description as text or list
Examples
describe_tbl(iris)
iris[1,1] <- NA
describe_tbl(iris)
Drop all observations where expression is true
Description
Drop all observations where expression is true
Usage
drop_obs_if(data, expr)
Arguments
| data | Data frame | 
| expr | Expression | 
Value
Data frame
Examples
drop_obs_if(iris, Species == "setosa")
drop_obs_if(iris, Sepal.Length < 5 | Sepal.Length >7)
Drop all observations with NA-values
Description
Drop all observations with NA-values
Usage
drop_obs_with_na(data)
Arguments
| data | Data frame | 
Value
Data frame
Examples
data <- data.frame(a = 1:10, b = rep("A",10))
data[1,1] <- NA
drop_obs_with_na(data)
Drop variables by name
Description
Drop variables by name
Usage
drop_var_by_names(data, var_names)
Arguments
| data | Data frame | 
| var_names | Vector of variable names (as string) | 
Value
Data frame
Examples
drop_var_by_names(iris, "Species")
drop_var_by_names(iris, c("Sepal.Length", "Sepal.Width"))
Drop all variables with low variance
Description
Drop all variables with low variance
Usage
drop_var_low_variance(data, max_prop = 0.99)
Arguments
| data | Data frame | 
| max_prop | Maximum proportion of values without variance | 
Value
Data frame
Examples
data <- data.frame(a = 1:100, b = c(0, rep(1, 99)))
drop_var_low_variance(data, max_prop = 0.9)
Drop all variables with no variance
Description
Drop all variables with no variance
Usage
drop_var_no_variance(data)
Arguments
| data | Data frame | 
Value
Data frame
Examples
data <- data.frame(a = 1:10, b = rep(1,10))
drop_var_no_variance(data)
Drop all not numeric variables
Description
Drop all not numeric variables
Usage
drop_var_not_numeric(data)
Arguments
| data | Data frame | 
Value
Data frame
Examples
data <- data.frame(a = 1:10, b = rep("A",10))
drop_var_not_numeric(data)
Drop all variables with NA-values
Description
Drop all variables with NA-values
Usage
drop_var_with_na(data)
Arguments
| data | Data frame | 
Value
Data frame
Examples
data <- data.frame(a = 1:10, b = rep(NA,10))
drop_var_with_na(data)
encrypt text
Description
encrypt text
Usage
encrypt(text, codeletters = c(toupper(letters), letters, 0:9), shift = 18)
Arguments
| text | A text (character) | 
| codeletters | A string of letters that are used for encryption | 
| shift | Number of elements shifted | 
Value
Encrypted text
Examples
encrypt("hello world")
Explain a target using Random Forest.
Description
Explain a target using Random Forest.
Usage
explain_forest(data, target, ntree = 50, out = "plot", ...)
Arguments
| data | A dataset | 
| target | Target variable (binary) | 
| ntree | Number of trees used for Random Forest | 
| out | Output of the function: "plot" | "model" | "importance" | all" | 
| ... | Further arguments | 
Value
Plot of importance (if out = "plot")
Examples
data <- create_data_buy()
explain_forest(data, target = buy)
Explain a binary target using a logistic regression (glm).
Model chosen by AIC in a Stepwise Algorithm (MASS::stepAIC()).
Description
Explain a binary target using a logistic regression (glm).
Model chosen by AIC in a Stepwise Algorithm (MASS::stepAIC()).
Usage
explain_logreg(data, target, out = "tibble", ...)
Arguments
| data | A dataset | 
| target | Target variable (binary) | 
| out | Output of the function: "tibble" | "model" | 
| ... | Further arguments | 
Value
Dataset with results (term, estimate, std.error, z.value, p.value)
Examples
data <- iris
data$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
data$Species <- NULL
explain_logreg(data, target = is_versicolor)
Explain a target using a simple decision tree (classification or regression)
Description
Explain a target using a simple decision tree (classification or regression)
Usage
explain_tree(
  data,
  target,
  n,
  max_cat = 10,
  max_target_cat = 5,
  maxdepth = 3,
  minsplit = 20,
  cp = 0,
  weights = NA,
  size = 0.7,
  out = "plot",
  ...
)
Arguments
| data | A dataset | 
| target | Target variable | 
| n | weights variable (for count data) | 
| max_cat | Drop categorical variables with higher number of levels | 
| max_target_cat | Maximum number of categories to be plotted for target (except NA) | 
| maxdepth | Set the maximum depth of any node of the final tree, with the root
node counted as depth 0.  Values greater than  30  | 
| minsplit | the minimum number of observations that must exist in a node in order for a split to be attempted. | 
| cp | complexity parameter.  Any split that does not decrease the overall
lack of fit by a factor of  | 
| weights | optional case weights. | 
| size | Text size of plot | 
| out | Output of function: "plot" | "model" | 
| ... | Further arguments | 
Value
Plot or additional the model (if out = "model")
Examples
data <- iris
data$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
data$Species <- NULL
explain_tree(data, target = is_versicolor)
Explain a binary target using xgboost
Description
Based on the hyperparameters defined in the setup parameter, XGBoost hyperparameter-tuning is carried out using cross-validation. The best model is chosen and returned. As default, the function returns the feature-importance plot. To get the all outputs, use parameter out = "all"
Usage
explain_xgboost(
  data,
  target,
  log = TRUE,
  nthread = 1,
  setup = list(cv_nfold = 2, max_nrounds = 1000, early_stopping_rounds = 50, grid_xgboost
    = list(eta = c(0.3, 0.1, 0.01), max_depth = c(3, 5), gamma = 0, colsample_bytree =
    0.8, subsample = 0.8, min_child_weight = 1, scale_pos_weight = 1)),
  out = "plot"
)
Arguments
| data | Data frame, must contain variable defined in target, but should not contain any customer-IDs or date/period columns | 
| target | Target variable (must be binary 0/1, FALSE/TRUE, no/yes) | 
| log | Log? | 
| nthread | Number of threads used for training | 
| setup | Setup of model | 
| out | Output of the function: "plot" | "model" | "importance" | all" | 
Value
Plot of importance (if out = "plot")
Examples
data <- use_data_iris()
data$is_versicolor <- ifelse(data$Species == "versicolor", 1, 0)
data$Species <- NULL
explain_xgboost(data, target = is_versicolor, log = FALSE)
Explore a dataset or variable
Description
Explore a dataset or variable
Usage
explore(
  data,
  var,
  var2,
  n,
  target,
  targetpct,
  split,
  min_val = NA,
  max_val = NA,
  auto_scale = TRUE,
  na = NA,
  ...
)
Arguments
| data | A dataset | 
| var | A variable | 
| var2 | A variable for checking correlation | 
| n | A Variable for number of observations (count data) | 
| target | Target variable (0/1 or  | 
| targetpct | Plot variable as target% ( | 
| split | Alternative to targetpct (split = !targetpct) | 
| min_val | All values < min_val are converted to  | 
| max_val | All values > max_val are converted to  | 
| auto_scale | Use 0.2 and 0.98 quantile for  | 
| na | Value to replace  | 
| ... | Further arguments (like flip =  | 
Value
Plot object
Examples
## Launch Shiny app (in interactive R sessions)
if (interactive())  {
   explore(iris)
}
## Explore grafically
# Load library
library(magrittr)
# Explore a variable
iris %>% explore(Species)
iris %>% explore(Sepal.Length)
iris %>% explore(Sepal.Length, min_val = 4, max_val = 7)
# Explore a variable with a target
iris$is_virginica <- ifelse(iris$Species == "virginica", 1, 0)
iris %>% explore(Species, target = is_virginica)
iris %>% explore(Sepal.Length, target = is_virginica)
# Explore correlation between two variables
iris %>% explore(Species, Petal.Length)
iris %>% explore(Sepal.Length, Petal.Length)
# Explore correlation between two variables and split by target
iris %>% explore(Sepal.Length, Petal.Length, target = is_virginica)
Explore all variables
Description
Explore all variables of a dataset (create plots)
Usage
explore_all(
  data,
  n,
  target,
  ncol = 2,
  targetpct,
  color = c("#ADD8E6", "#7BB8DA"),
  split = TRUE
)
Arguments
| data | A dataset | 
| n | Weights variable (only for count data) | 
| target | Target variable (0/1 or FALSE/TRUE) | 
| ncol | Layout of plots (number of columns) | 
| targetpct | Plot variable as target% (FALSE/TRUE) | 
| color | Forece a default color (if possible) | 
| split | Split by target (TRUE|FALSE) | 
Value
Plot
Examples
explore_all(iris)
iris$is_virginica <- ifelse(iris$Species == "virginica", 1, 0)
explore_all(iris, target = is_virginica)
Explore categorical variable using bar charts
Description
Create a barplot to explore a categorical variable. If a target is selected, the barplot is created for all levels of the target.
Usage
explore_bar(
  data,
  var,
  target,
  flip = NA,
  title = "",
  numeric = NA,
  max_cat = 30,
  max_target_cat = 5,
  color = c("#ADD8E6", "#7BB8DA"),
  legend_position = "right",
  label,
  label_size = 2.7,
  ...
)
Arguments
| data | A dataset | 
| var | variable | 
| target | target (can have more than 2 levels) | 
| flip | Should plot be flipped? (change of x and y) | 
| title | Title of the plot (if empty var name) | 
| numeric | Display variable as numeric (not category) | 
| max_cat | Maximum number of categories to be plotted | 
| max_target_cat | Maximum number of categories to be plotted for target (except NA) | 
| color | Color for bar | 
| legend_position | Position of the legend ("bottom"|"top"|"none") | 
| label | Show labels? (if empty, automatic) | 
| label_size | Size of labels | 
| ... | Further arguments | 
Value
Plot object (bar chart)
Explore data without aggregation (label + value)
Description
Label and Value are in the data. Create a bar plot where the heights of the bars represent the values for each label.
Usage
explore_col(
  data,
  var_label,
  var_value,
  title = NA,
  subtitle = "",
  numeric = FALSE,
  max_cat = 30,
  na = 0,
  flip = NA,
  color = "#ADD8E6"
)
Arguments
| data | A dataset (categories + frequency) | 
| var_label | Variable containing the label | 
| var_value | Variable containing the value | 
| title | Title of the plot | 
| subtitle | Subtitle of the plot | 
| numeric | Display variable as numeric (not category) | 
| max_cat | Maximum number of categories to be plotted | 
| na | Value to use for NA | 
| flip | Flip plot? (for categorical variables) | 
| color | Color for bar | 
Value
Plot object
Examples
library(magrittr)
data <- data.frame(label = LETTERS[1:5], value = c(1.5,2,1.2,3,2.6))
data %>% explore_col(label, value)
Explore the correlation between two variables
Description
Explore the correlation between two variables
Usage
explore_cor(
  data,
  x,
  y,
  target,
  bins = 8,
  min_val = NA,
  max_val = NA,
  auto_scale = TRUE,
  title = NA,
  color = c("#ADD8E6", "#7BB8DA"),
  ...
)
Arguments
| data | A dataset | 
| x | Variable on x axis | 
| y | Variable on y axis | 
| target | Target variable (categorical) | 
| bins | Number of bins | 
| min_val | All values < min_val are converted to min_val | 
| max_val | All values > max_val are converted to max_val | 
| auto_scale | Use 0.2 and 0.98 quantile for min_val and max_val (if min_val and max_val are not defined) | 
| title | Title of the plot | 
| color | Color of the plot | 
| ... | Further arguments | 
Value
Plot
Examples
explore_cor(iris, x = Sepal.Length, y = Sepal.Width)
Explore count data (categories + frequency)
Description
Create a plot to explore count data (categories + freuency) Variable named 'n' is auto detected as Frequency
Usage
explore_count(
  data,
  cat,
  n,
  target,
  pct = FALSE,
  split = TRUE,
  title = NA,
  numeric = FALSE,
  max_cat = 30,
  max_target_cat = 5,
  color = c("#ADD8E6", "#7BB8DA"),
  flip = NA
)
Arguments
| data | A dataset (categories + frequency) | 
| cat | Numerical variable | 
| n | Number of observations (frequency) | 
| target | Target variable | 
| pct | Show as percent? | 
| split | Split by target (FALSE/TRUE) | 
| title | Title of the plot | 
| numeric | Display variable as numeric (not category) | 
| max_cat | Maximum number of categories to be plotted | 
| max_target_cat | Maximum number of categories to be plotted for target (except NA) | 
| color | Color for bar | 
| flip | Flip plot? (for categorical variables) | 
Value
Plot object
Examples
library(dplyr)
iris %>%
  count(Species) %>%
  explore_count(Species)
Explore density of variable
Description
Create a density plot to explore numerical variable
Usage
explore_density(
  data,
  var,
  target,
  title = "",
  min_val = NA,
  max_val = NA,
  color = c("#ADD8E6", "#7BB8DA"),
  auto_scale = TRUE,
  max_target_cat = 5,
  ...
)
Arguments
| data | A dataset | 
| var | Variable | 
| target | Target variable (0/1 or FALSE/TRUE) | 
| title | Title of the plot (if empty var name) | 
| min_val | All values < min_val are converted to min_val | 
| max_val | All values > max_val are converted to max_val | 
| color | Color of plot | 
| auto_scale | Use 0.02 and 0.98 percent quantile for min_val and max_val (if min_val and max_val are not defined) | 
| max_target_cat | Maximum number of levels of target shown in the plot (except NA). | 
| ... | Further arguments | 
Value
Plot object (density plot)
Examples
explore_density(iris, Sepal.Length)
iris$is_virginica <- ifelse(iris$Species == "virginica", 1, 0)
explore_density(iris, Sepal.Length, target = is_virginica)
Explore dataset interactive
Description
Launches a shiny app to explore a dataset
Usage
explore_shiny(data, target, color = c("#ADD8E6", "#7BB8DA"))
Arguments
| data | A dataset | 
| target | Target variable (0/1 or FALSE/TRUE) | 
| color | Color for plots (vector) | 
Examples
# Only run examples in interactive R sessions
if (interactive())  {
   explore_shiny(iris)
}
Explore variable + binary target (values 0/1)
Description
Create a plot to explore relation between a variable and a binary target as target percent. The target variable is choosen automatically if possible (name starts with 'target')
Usage
explore_targetpct(
  data,
  var,
  target = NULL,
  title = NA,
  min_val = NA,
  max_val = NA,
  auto_scale = TRUE,
  na = NA,
  flip = NA,
  ...
)
Arguments
| data | A dataset | 
| var | Numerical variable | 
| target | Target variable (0/1 or FALSE/TRUE) | 
| title | Title of the plot | 
| min_val | All values < min_val are converted to min_val | 
| max_val | All values > max_val are converted to max_val | 
| auto_scale | Use 0.2 and 0.98 quantile for min_val and max_val (if min_val and max_val are not defined) | 
| na | Value to replace NA | 
| flip | Flip plot? (for categorical variables) | 
| ... | Further arguments | 
Value
Plot object
Examples
iris$target01 <- ifelse(iris$Species == "versicolor",1,0)
explore_targetpct(iris)
Explore table
Description
Explore a table. Plots variable types, variables with no variance and variables with NA
Usage
explore_tbl(data, n)
Arguments
| data | A dataset | 
| n | Weight variable for count data | 
Examples
explore_tbl(iris)
Format number as character string (auto)
Description
Formats a number depending on the value as number with space, scientific or big number as k (1 000), M (1 000 000) or B (1 000 000 000)
Usage
format_num_auto(number = 0, digits = 1)
Arguments
| number | A number (integer or real) | 
| digits | Number of digits | 
Value
Formatted number as text
Examples
format_num_kMB(5500, digits = 2)
Format number as character string (kMB)
Description
Formats a big number as k (1 000), M (1 000 000) or B (1 000 000 000)
Usage
format_num_kMB(number = 0, digits = 1)
Arguments
| number | A number (integer or real) | 
| digits | Number of digits | 
Value
Formatted number as text
Examples
format_num_kMB(5500, digits = 2)
Format number as character string (space as big.mark)
Description
Formats a big number using space as big.mark (1000 = 1 000)
Usage
format_num_space(number = 0, digits = 1)
Arguments
| number | A number (integer or real) | 
| digits | Number of digits | 
Value
Formatted number as text
Examples
format_num_space(5500, digits = 2)
Format target
Description
Formats a target as a 0/1 variable. If target is numeric, 1 = above average.
Usage
format_target(target)
Arguments
| target | Variable as vector | 
Value
Formated target
Examples
iris$is_virginica <- ifelse(iris$Species == "virginica", "yes", "no")
iris$target <- format_target(iris$is_virginica)
table(iris$target)
Format type description
Description
Format type description of variable to 3 letters (int|dbl|lgl|chr|dat)
Usage
format_type(type)
Arguments
| type | Type description ("integer", "double", "logical", character", "date") | 
Value
Formatted type description (int|dbl|lgl|chr|dat)
Examples
format_type(typeof(iris$Species))
Get predefined colors
Description
Get predefined colors
Usage
get_color(name, fill = FALSE, fill_color = "#DDDDDD", fill_n = 10)
Arguments
| name | Name of color/color-vector | 
| fill | Fill color vector? | 
| fill_color | Color to use to fill color vector | 
| fill_n | Number of color codes to return | 
Value
Vector of color-codes
Examples
get_color("mario")
get_color("mario")
show_color(get_color("mario"))
show_color(get_color("mario", fill = TRUE, fill_n = 10))
col <- get_color("mario")
explore(iris, Sepal.Length, target = Species,
  color = col)
explore(iris, Sepal.Length, target = Species,
  color = c(col["peach"], col["bowser"], col["donkeykong"]))
Get number of rows for a grid plot
Description
This function is deprecated, please use total_fig_height() instead.
Usage
get_nrow(varnames, exclude = 0, ncol = 2)
Arguments
| varnames | List of variables to be plotted | 
| exclude | Number of variables that will be excluded from plot | 
| ncol | Number of columns (default = 2) | 
Value
Number of rows
Examples
## Not run: 
get_nrow(names(iris), ncol = 2)
## End(Not run)
Return type of variable
Description
Return value of typeof, except if variable contains hide, then return "other"
Usage
get_type(var)
Arguments
| var | A vector (dataframe column) | 
Value
Value of typeof or "other"
Examples
get_type(iris$Species)
Put variables into "buckets" to create a set of plots instead one large plot
Description
Put variables into "buckets" to create a set of plots instead one large plot
Usage
get_var_buckets(data, bucket_size = 100, var_name_target = NA, var_name_n = NA)
Arguments
| data | A dataset | 
| bucket_size | Maximum number of variables in one bucket | 
| var_name_target | Name of the target variable (if defined) | 
| var_name_n | Name of the weight (n) variable (if defined) | 
Value
Buckets as a list
Examples
get_var_buckets(iris)
get_var_buckets(iris, bucket_size = 2)
get_var_buckets(iris, bucket_size = 2, var_name_target = "Species")
Return if variable is categorical or numerical
Description
Guess if variable is categorical or numerical based on name, type and values of variable
Usage
guess_cat_num(var, descr)
Arguments
| var | A vector (dataframe column) | 
| descr | A description of the variable (optional) | 
Value
"cat" (categorical), "num" (numerical) or "oth" (other)
Examples
guess_cat_num(iris$Species)
Make a explore-plot interactive
Description
Make a explore-plot interactive
Usage
interact(obj, lower_title = TRUE, hide_geom_text = TRUE)
Arguments
| obj | A object (e.g. ggplot2-object) | 
| lower_title | Lowering the title in ggplot2-object( | 
| hide_geom_text | Hiding geom_text in ggplot2-object ( | 
Value
Plot object
Examples
library(dplyr)
if (interactive())  {
   iris %>% explore(Sepal.Length, target = Species) %>% interact()
}
Log conditional
Description
Log conditional
Usage
log_info_if(log = TRUE, text = "log")
Arguments
| log | log (TRUE|FALSE) | 
| text | text string to be logged | 
Value
prints log on screen (if log == TRUE).
Mix colors
Description
Mix colors
Usage
mix_color(color1, color2 = NA, n = 5)
Arguments
| color1 | Color 1 | 
| color2 | Color 2 | 
| n | Number of different colors that should be generated | 
Value
Vector of color-codes
Examples
mix_color("blue", n = 10)
mix_color("gold", "red", n = 4)
Plots a legend that can be used for explore_all with a binary target
Description
Plots a legend that can be used for explore_all with a binary target
Usage
plot_legend_targetpct(border = TRUE)
Arguments
| border | Draw a border? | 
Value
Base plot
Examples
plot_legend_targetpct(border = TRUE)
Plot a text
Description
Plots a text (base plot) and let you choose text-size and color
Usage
plot_text(text = "hello world", size = 1.2, color = "black", ggplot = FALSE)
Arguments
| text | Text as string | 
| size | Text-size | 
| color | Text-color | 
| ggplot | return a ggplot-object? (or base plot) | 
Value
Plot
Examples
plot_text("hello", size = 2, color = "red")
Plot a variable info
Description
Creates a ggplot with the variable-name as title and a text
Usage
plot_var_info(data, var, info = "")
Arguments
| data | A dataset | 
| var | Variable | 
| info | Text to plot | 
Value
Plot (ggplot)
Predict target using a trained model.
Description
Predict target using a trained model.
Usage
predict_target(data, model, name = "prediction")
Arguments
| data | A dataset (data.frame or tbl) | 
| model | A model created with  | 
| name | Prefix of variable-name for prediction | 
Value
data containing predicted probabilities for target values
Examples
data_train <- create_data_buy(seed = 1)
data_test <- create_data_buy(seed = 2)
model <- explain_tree(data_train, target = buy, out = "model")
data <- predict_target(data = data_test, model = model)
describe(data)
Replace NA
Description
Replace NA values of a variable in a dataframe
Usage
replace_na_with(data, var_name, with)
Arguments
| data | A dataframe | 
| var_name | Name of variable where NAs are replaced | 
| with | Value instead of NA | 
Value
Updated dataframe
Examples
data <- data.frame(nr = c(1,2,3,NA,NA))
replace_na_with(data, "nr", 0)
Generate a report of all variables
Description
Generate a report of all variables If target is defined, the relation to the target is reported
Usage
report(data, n, target, targetpct, split, color, output_file, output_dir)
Arguments
| data | A dataset | 
| n | Weights variable for count data | 
| target | Target variable (0/1 or  | 
| targetpct | Plot variable as target% ( | 
| split | Alternative to targetpct (split = !targetpct) | 
| color | User defined colors for plots (vector) | 
| output_file | Filename of the html report | 
| output_dir | Directory where to save the html report | 
Examples
if (rmarkdown::pandoc_available("1.12.3")) {
 report(iris, output_dir = tempdir())
 }
Rescales a numeric variable into values between 0 and 1
Description
Rescales a numeric variable into values between 0 and 1
Usage
rescale01(x)
Arguments
| x | numeric vector (to be rescaled) | 
Value
vector with values between 0 and 1
Examples
rescale01(0:10)
Show color vector as ggplot
Description
Show color vector as ggplot
Usage
show_color(color)
Arguments
| color | Vector of colors | 
Value
ggplot
Examples
show_color("gold")
show_color(c("blue", "red", "green"))
Simplifies a text string
Description
A text string is converted into a simplified version by trimming, converting to upper case, replacing german Umlaute, dropping special characters like comma and semicolon and replacing multiple spaces with one space.
Usage
simplify_text(text)
Arguments
| text | text string | 
Value
text string
Examples
simplify_text(" Hello  World !, ")
Explore categorical variable + target
Description
Create a plot to explore relation between categorical variable and a binary target
Usage
target_explore_cat(
  data,
  var,
  target = "target_ind",
  min_val = NA,
  max_val = NA,
  flip = TRUE,
  num2char = TRUE,
  title = NA,
  auto_scale = TRUE,
  na = NA,
  max_cat = 25,
  color = c("#ECEFF1", "#CFD8DC", "#B0BEC5", "#90A4AE"),
  legend_position = "bottom"
)
Arguments
| data | A dataset | 
| var | Categorical variable | 
| target | Target variable (0/1 or FALSE/TRUE) | 
| min_val | All values < min_val are converted to min_val | 
| max_val | All values > max_val are converted to max_val | 
| flip | Should plot be flipped? (change of x and y) | 
| num2char | If TRUE, numeric values in variable are converted into character | 
| title | Title of plot | 
| auto_scale | Not used, just for compatibility | 
| na | Value to replace NA | 
| max_cat | Maximum numbers of categories to be plotted | 
| color | Color vector (4 colors) | 
| legend_position | Position of legend ("right"|"bottom"|"non") | 
Value
Plot object
Explore Nuberical variable + target
Description
Create a plot to explore relation between numerical variable and a binary target
Usage
target_explore_num(
  data,
  var,
  target = "target_ind",
  min_val = NA,
  max_val = NA,
  bins = 10,
  flip = TRUE,
  title = NA,
  auto_scale = TRUE,
  na = NA,
  color = c("#ECEFF1", "#CFD8DC", "#B0BEC5", "#90A4AE"),
  legend_position = "bottom"
)
Arguments
| data | A dataset | 
| var | Numerical variable | 
| target | Target variable (0/1 or FALSE/TRUE) | 
| min_val | All values < min_val are converted to min_val | 
| max_val | All values > max_val are converted to max_val | 
| bins | Nuber of bins | 
| flip | Should plot be flipped? (change of x and y) | 
| title | Title of plot | 
| auto_scale | Use 0.02 and 0.98 quantile for min_val and max_val (if min_val and max_val are not defined) | 
| na | Value to replace NA | 
| color | Color vector (4 colors) | 
| legend_position | Position of legend ("right"|"bottom"|"non") | 
Value
Plot object
Get fig.height for RMarkdown-junk using explore_all()
Description
Get fig.height for RMarkdown-junk using explore_all()
Usage
total_fig_height(
  data,
  var_name_n,
  var_name_target,
  nvar = NA,
  ncol = 2,
  size = 3
)
Arguments
| data | A dataset | 
| var_name_n | Weights variable for count data? (TRUE / MISSING) | 
| var_name_target | Target variable (TRUE / MISSING) | 
| nvar | Number of variables to plot | 
| ncol | Number of columns (default = 2) | 
| size | fig.height of 1 plot (default = 3) | 
Value
Number of rows
Examples
total_fig_height(iris)
total_fig_height(iris, var_name_target = "Species")
total_fig_height(nvar = 5)
Use the beer data set
Description
This data set is an incomplete collection of popular beers in Austria, Germany and Switzerland. Data are collected from various websites in 2023. Some of the collected data may be incorrect.
Usage
use_data_beer()
Value
Dataset as tibble
Examples
use_data_beer()
Use the diamonds data set
Description
This data set comes with the ggplot2 package. It contains the prices and other attributes of almost 54,000 diamonds.
Usage
use_data_diamonds()
Value
Dataset
See Also
Examples
use_data_diamonds()
Use the iris flower data set
Description
This data set comes with base R. The data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
Usage
use_data_iris()
Value
Dataset as tibble
Examples
use_data_iris()
Use the mpg data set
Description
This data set comes with the ggplot2 package. It contains a subset of the fuel economy data that the EPA makes available on https://fueleconomy.gov/. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.
Usage
use_data_mpg()
Value
Dataset
See Also
Examples
use_data_mpg()
Use the mtcars data set
Description
This data set comes with base R. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
Usage
use_data_mtcars()
Value
Dataset
Examples
use_data_mtcars()
Use the penguins data set
Description
This data set comes with the palmerpenguins package. It contains measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.
Usage
use_data_penguins(short_names = FALSE)
Arguments
| short_names | Use short variable names | 
Value
Dataset
See Also
Examples
use_data_penguins()
use_data_penguins(short_names = TRUE)
Use the starwars data set
Description
This data set comes with the dplyr package. It contains data of 87 star war characters
Usage
use_data_starwars()
Value
Dataset
See Also
Examples
use_data_starwars()
Use the titanic data set
Description
This data set comes with base R. Survival of passengers on the Titanic.
Usage
use_data_titanic(count = FALSE)
Arguments
| count | use count data | 
Value
Dataset
Examples
use_data_titanic(count = TRUE)
use_data_titanic(count = FALSE)
Use the wordle data set
Description
This data set contains the result of a real wordle challange (in german language) between tow players. Wordle is a game where a player guesses a five-letter word in six tries. The variable "try" reflects the success of player A and B. Other variables like "noun", "aeiou", "unique", "common" and "rare" reflect the properties of the word.
Usage
use_data_wordle()
Value
Dataset
Examples
use_data_wordle()
Weight target variable
Description
Create weights for the target variable in your dataset so that are equal weights for target = 0 and target = 1. Target must be 0/1, FALSE/TRUE ore no/yes
Usage
weight_target(data, target)
Arguments
| data | A dataset | 
| target | Target variable (0/1, TRUE/FALSE, yes/no) | 
Value
Weights for each observation (as a vector)
Examples
iris$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
weights <- weight_target(iris, target = is_versicolor)
versicolor <- iris$is_versicolor
table(versicolor, weights)
Calculate with periods (format yyyymm)
Description
Calculate with periods (format yyyymm)
Usage
yyyymm_calc(yyyymm, add_month = 0, add_year = 0, diff_to = NA)
Arguments
| yyyymm | Input vector of periods (format yyyymm) | 
| add_month | How many months to add (can be negative too) | 
| add_year | How many years to add (can be negative too) | 
| diff_to | Difference between date and yyyymm (format yyyymm) | 
Value
Vector of periods (format yyyymm) or number of months
Examples
yyyymm_calc(202412, add_month = 1)
yyyymm_calc(c(202411,202412,202501), add_month = -1, add_year = 1)
yyyymm_calc(202410, diff_to = 202501)
yyyymm_calc(c(202411,202412,202501,202502), diff_to = 202501)