Getting Started

This vignette shows the usage of the tidycharts package. It contains different chart types examples and tips for proper data visualization.

Prerequisites

This package and vignette are created for a user who:

How to download and install the package?


# install from CRAN
# install.packages(tidycharts)

library(tidycharts)

Bar charts

Data

Bar charts should be used to show structure in one moment of time. One of typical usecase of the barchart is to visualize profit of a company in a division by departments. The data structure could look the following:

In the example data operational, property and bonus are parts of profit and sum up to it.

Basic

Creation of the barchart is simple. We use barchart_plot function to do that. After calling the function chart will be automatically printed. It can also be assigned to a variable as one element character vector with SVG content.

17 Services 15 Production 2 Marketing -3 Purchasing

A plot should contain an informative title. We can use add_title function to make one. We can chain the commands by pipe operator (%>%).

17 Services 15 Production 2 Marketing -3 Purchasing The company XYZ Profit in mEUR by departments, 2020

We can show the structure of each value by specifing different series argument. It can be a vector of column names. It that case, stacked barchart will be generated.

17 Services op 9 prop 4 bon 4 15 Production 7 4 4 2 Marketing -3 Purchasing -2 The company XYZ Profit in mEUR by departments and profit type, 2020

Normalized

Normalized barplot should be used to show the proportions of parts in each category. Typical intention of using this kind of plot could be to visualize the percentage structure of profit among different departments in a company.

100 Services op 9 porp 4 bon 4 100 Production 7 4 4 100 Marketing 1.5 0.5 100 Purchasing -0.4 -0.6 -2 100 The company XYZ Profit in mEUR normalized in department

Referenced

We use reference values (indices) to show a reference value on the plot. In the following example, index line is used to show the best result in previous year (PY).

17 Services 15 Production 2 Marketing -3 Purchasing PY best result The company XYZ Profit in mEUR with reference value of 10 mEUR

Grouped

To visualize 2 or 3 series of data, which do not sum up to some value, grouped barchart should be used. First series is visualized by bars in the foreground, second by bars in the background and third in the form of triangles. Style of the bars and triangles indicates type of data, so called scenarios.

The most typical usecase of this chart is to visualize profit of different departments in a company with comparison to budget and previous year data.

PL AC 17 Services 15 Production 2 Marketing -3 Purchasing The company XYZ Profit in mEUR compared to different scenarios

Variances

Show variance in data using relative variance plots or absolute variance plots. Define the baseline and the real values. Axis on variance plots use styles to scenario of baseline data. Relative variance plot shows difference in percents and absolute variance plot shows it in base units.

Example usage: Visualize difference between two scenarios in division by departments.

6 Services Plan vs. actual 2 Production 0 Marketing -0.5 Purchasing The company XYZ Profit variance in mEUR between plan and actual
Services 55 Production 15 Marketing 0 Purchasing 20 The company XYZ Profit variance in % between plan and actual (plan=100%)

Scatter plots

For demonstration we will use mtcars dataset available in R as built-in.

scatter_data <- mtcars[c('hp','qsec','cyl', 'wt')]

Scatter plots, also known as point plots, are used to visualize multidimensional relationships between variables. Therefore, they are extensively used in exploratory data analysis.

Scatter

In scatter plot 2 numerical dimensions are visualized by position of a point on the Cartesian plane.

50 100 150 200 250 300 2 4 6 8 10 12 14 16 18 20 22 Horsepower in hp 1/4 mile time in s The mtcars dataset

Optionally, categorical dimension can be added in a form of a point color.

50 100 150 200 250 300 2 4 6 8 10 12 14 16 18 20 22 Horsepower in hp 1/4 mile time in s No. cylinder 6 4 8 The mtcars dataset

Bubble

Bubble plots can visualize the same dimensions as scatter plots. However even third numeric dimension can be added in a form of point size. On the other hand, there is a tradeoff between dimensionality and size of your data and readability of generated plots, so be careful when using bubble charts.

50 100 150 200 250 300 2 4 6 8 10 12 14 16 18 20 22 Horsepower in hp 1/4 mile time in s No. cylinders 6 4 8 The mtcars dataset

Column charts

Charts with vertical columns are intended to visualize time series data. What is worth noticing, column width depends on the x-axis interval. The longer the interval, the wider the column. General guideline for this kind of chart is to plot up to 24 columns. If your data has more than 24 time points see line chart section.

Data

Here is how an example column chart data frame could look like:

The time column consists of the three-letter abbreviations for the English month names and other columns consist of some artificial data, it could be for example sales in different countries.

Basic

Use basic column chart to make a simple visualization of a time series. Pass interval parameter to change the width of columns.

Typical task related to this kind of plot could be the following: Show sales from different countries over the months.

2.42 3.14 3.08 Jan 2.46 2.24 1.17 Feb 2.07 2.04 Mar 1.62 2.72 Apr 1.52 3.65 2.57 May 1.86 3.99 3.92 Jun 2.33 3.41 3.51 Jul 2.49 2.46 1.71 Aug 2.20 2.00 Sep 1.73 2.46 Oct 1.50 3.42 2.01 Nov 1.73 3.99 3.69 Dec Poland Germany Slovakia 8.64 5.87 4.13 5.03 7.74 9.77 9.25 6.66 4.38 4.51 6.93 9.41 The company XYZ Profit in mEUR by country, 2020

Waterfall

To visualize contribution waterfall charts can be used. We need to transform the data a little bit to before passing it into the plotting function.

Example usage: visualize contribution of monthly sales as a part of year sales.

Jan 8.64 Feb 5.87 Mar 4.13 Apr 5.03 May 7.75 Jun 9.77 Jul 9.25 Aug 6.66 Sep 4.38 Oct 4.51 Nov 6.93 Dec 9.41 82.33 The company XYZ Profit in mEUR cumulative, 2020

Other column charts

Other types of column charts are available, ie. column_chart_grouped or column_chart_normalized. When using them similar data visualization rules apply as for bar charts. Feel free to explore them and see reference page if need help.

Line charts

Line charts, as column charts, should be used to show time series data. Some lineplots however require more complicated data structure.

Basic

The basic lineplot uses lines with markers to show the data. Typical usage is to visualize several data series, which do not sum up, for example the market value of different companies among the years.

2010 25 2011 27 2012 30 2013 29 2014 30 2015 33 2016 32 2017 32 2018 33 2019 35 2020 37 41 41 40 39 44 41 36 41 39 38 40 45 46 47 42 54 51 44 56 52 49 54 Alpha.inc Beta.inc Gamma.inc Some companies Market value in mEUR 2010...2020

Dense

Use dense line plot to visualize up to 6 time series with more than one point in a category on x-axis. The more advanced users are encouraged to use the line_chart_dense_custom function, where they can choose points that will be highlighted by value label.

The most typical example is to show data with time granularity of 1 day among the years (mean day temperature in the course of 16 months).

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Warsaw London Temperature in European Cities Daily mean in deg. C In 2019

Columns vs lines

One can wonder what type of plot choose: line chart or column chart. The answer depends on the data. If you want to visualize only one series, both line and column chart are appropriate. More differences occur when number of series increases. If the sum of series means something reasonable, use stacked columns, optionally stacked lines. If not, use line plot.