August 2022
The litteR-software and this user manual have been
commissioned by Rijkswaterstaat,
Ministry of Infrastructure and Water
Management, The Netherlands.
litteR is a user-friendly tool for analyzing litter data (e.g., beach litter data). The current version (1.0.0) contains routines for:
The focus of this version of litteR is to provide a a user-friendly, flexible, robust, transparent, and relatively simple tool for litter analysis. Although litteR is distributed as an R-package, experience with R is not required. If you need more information on how to install R, RStudio, and litteR, please consult our installation guide.
Litter data are count data. As has been illustrated in the histogram below (copied with permission from Hanke et al., 2019), litter data generally have skewed distributions. All procedures in litteR are based on robust statistical methods. They do not require distributional assumptions and are relatively robust for outliers.
This user guide consists of two parts. In the first part, the user interface is described, the second part provides details on the technicalities.
For applications with (a previous version of) litteR see Schulz et al. (2019). litteR is the successor of the Litter Analyst software (Schulz et al., 2017).
Before litteR can be used, it should be installed or updated in case you installed litteR before. See our installation guide fore details.
You need to install litteR only once, but you need to load this package each time you start RStudio.
The litteR-package should be loaded in RStudio before you can use it. This can be done by running the following code in the R-console or the RStudio-console:
library(litteR)
A startup messsage appears that gives some essential instructions to start using litteR.
The easiest way to start working with litteR is to create an empty project directory. This directory can be filled with example and reference files by running:
create_litter_project("d:/work/litter-projects/beach-litter")
in the RStudio-console. For more information on how to obtain and use RStudio, consult its website or read our installation guide.
The argument of function create_litter_project
(i.e., the quoted part in parentheses) is an existing work
directory on your computer. This can be any valid directory name with
sufficient user privileges. Note for MS-Windows users: R requires
forward slashes!
It is also possible to run create_litter_project()
without an argument. In that case, a simple graphical user interface
pops up for interactive directory selection.
litteR can be started typing litter()
in the RStudio console (see the
figure below).
After entering litter()
, a simple graphical user
interface pops up for file selection. An example of a file selection
dialogue is given below.
litteR needs three input files:
These input files are described below.
The type file contains a list of all litter types that are allowed to
use in the data file. It also indicates to
which litter group each litter type belongs. Two example files, named
‘types-ospar-materials.csv
’ and
‘types-ospar-sup-fish-other.csv
’ are automatically
generated when using the create_litter_project
-function, a
described earlier in this tutorial. A
type file assigns each litter type (type_name
) to one or
more litter groups. The first 10 rows of
’types-ospar-sup-fish-other.csv
are given in the table
below.
Warning: The following named parsers don't match the column names: PLASTIC
type_name | included | SUP | FISH | OTHER |
---|---|---|---|---|
Plastic: Yokes [1] | x | x | x | |
Plastic: Bags [2] | x | x | ||
Plastic: Small_bags [3] | x | x | x | |
Plastic: Bag_ends [112] | x | x | x | |
Plastic: Drinks [4] | x | x | ||
Plastic: Cleaner [5] | x | x | ||
Plastic: Food [6] | x | x | ||
Plastic: Toiletries [7] | x | x | ||
Plastic: Oil_small [8] | x | x | ||
Plastic: Oil_large [9] | x | x |
The following columns are in this table:
type_name
. This column is required and gives all litter
types that are allowed in the data file. Litter types given in this
column need to be unique;included
: This column indicates whether a type
specified in column type_name
will be used in the analysis
or not. Only type_names
that are included in the analysis
will contribute to the total litter count (TC).SUP
, FISH
, PLASTIC
, etc.:
these columns give the definition of each litter group. In the example
above three groups are given: ‘single use plastics’ (SUP), ‘fisheries
related litter’ (FISH), and ‘plastics’ (PLASTIC). A cross (x) indicates
that a litter type in type_name
is a member of a litter
group or not. A cross (x) means ‘a member’, an empty cell means ‘not a
member’.The user may use one of the provided type files as a template for his own type file. litteR will use the type file that has been specified in the settings-file.
litteR performs regional aggregation at the group level. In order to perform regional aggregation at the type level (the columns in the data file), a group with only one or a few litter types of interest can be constructed in the type file, and then regionally aggregated by running litteR.
litteR supports a simple and flexible data format. It is similar to the OSPAR-format. The data are stored in so called wide format: each row refers to a single survey, each column to a single litter type or metadata. The table below gives an example of a small part (i.e., the upper left corner) of a data file.
location_code | date | Plastic: Yokes [1] | Plastic: Bags [2] | Plastic… |
---|---|---|---|---|
NL001 | 2012-01-27 | 0 | 3 | … |
NL001 | 2012-04-20 | 0 | 8 | … |
NL001 | 2012-07-22 | 0 | 1 | … |
NL001 | 2012-10-19 | 0 | 2 | … |
NL001 | 2013-02-19 | 0 | 24 | … |
: | : | : | : | : |
The columns location_code
and date
are
always required and define unique records (rows) with litter survey data
for a specific date and location (e.g., a specific beach, or a
location along a river). litteR will use these data to
estimate statistics (as the median and trend) for each
location_code
.
Column location_code
may contain location codes (as in
the example above), but also full names like ‘Bergen’, ‘Noordwijk’, and
‘La Grève des Courses’. Full names may be more clear when interpreting
the results.
The date
column gives the monitoring date in ISO format,
i.e., YYYY-mm-dd (for example 2022-08-26, to indicate 26 August
2022). For convenience, the OSPAR-format (dd/mm/YYYY) is
currently also supported (for example 26/08/2022, to indicate 26 August
2022).
Columns Plastic: Yokes [1]
,
Plastic: Bags [2]
, … contain the counts for specific litter
types. Each litter type (column name) should be listed in the litter type file. Only litter types in the litter type file are valid column names. All
column names that are not valid litter types are considered as optional
metadata. These columns are ignored by litteR and do
not affect the results.
There is one exception: the column region_code
is
optional and should be available when the locations (in column
location_code
) also need to be spatially aggregated. Each
region_code
is related to one or more
location_code
(s) that are part of that region.
In the data file below, one region_code
(NL) is provided
for all locations in location_code
. Therefore,
litteR will spatially aggregate the results for all
locations (NL001 … NL004) within the specified region (NL).
region_code | location_code | date | Plastic: Yokes [1] | Plastic: Bags [2] | Plastic… |
---|---|---|---|---|---|
NL | NL001 | 2012-01-27 | 0 | 3 | … |
NL | NL001 | 2012-04-20 | 0 | 8 | … |
NL | NL001 | 2012-07-22 | 0 | 1 | … |
: | : | : | : | : | : |
NL | NL004 | 2017-04-14 | 0 | 0 | … |
NL | NL004 | 2017-07-11 | 1 | 0 | … |
NL | NL004 | 2017-10-18 | 0 | 1 | … |
A data file can be constructed easily from existing litter files. As an example consider the OSPAR-format below:
Beach ID | Beach name | Country | Survey date | Plastic: Yokes [1] | Plastic: Bags [2] | Plastic… |
---|---|---|---|---|---|---|
NL001 | Bergen | Netherlands | 2012-01-27 | 0 | 3 | … |
NL001 | Bergen | Netherlands | 2012-04-20 | 0 | 8 | … |
NL001 | Bergen | Netherlands | 2012-07-22 | 0 | 1 | … |
: | : | : | : | : | : | : |
One can simply rename existing columns to the names required by
litteR. This can be done with a spreadsheet program or
a text editor. For instance, renaming Beach ID
,
Country
and Survey date
to respectively
location_code
, region_code
, and
date
gives the following valid litteR
format:
location_code | Beach name | region_code | date | Plastic: Yokes [1] | Plastic: Bags [2] | Plastic… |
---|---|---|---|---|---|---|
NL001 | Bergen | Netherlands | 2012-01-27 | 0 | 3 | … |
NL001 | Bergen | Netherlands | 2012-04-20 | 0 | 8 | … |
NL001 | Bergen | Netherlands | 2012-07-22 | 0 | 1 | … |
: | : | : | : | : | : | : |
Column Beach name
is not recognized by
litteR, and is therefore ignored.
As an alternative, one may also add new columns with valid litteR names to the data file and fill them with the contents of existing columns. See the example below:
region_code | location_code | date | Beach ID | Beach name | Country | Survey date | Plastic… |
---|---|---|---|---|---|---|---|
Netherlands | Bergen | 27/01/2012 | NL001 | Bergen | Netherlands | 27/01/2012 | … |
Netherlands | Bergen | 20/04/2012 | NL001 | Bergen | Netherlands | 20/04/2012 | … |
Netherlands | Bergen | 22/07/2012 | NL001 | Bergen | Netherlands | 22/07/2012 | … |
: | : | : | : | : | : | : | : |
This can be done quite easily with a spreadsheet program. The
original columns of the OSPAR-format (Beach ID
,
Beach name
, Country
, and
Survey date
) are ignored by litteR.
It is advised to use region_code
s and
location_code
s that are easily recognized by the user. For
instance, in the example above, location_code
‘Bergen’ is
easier to interpret than location_code
‘NL001’. Obviously,
this choice does not affect the litteR-results.
The settings file contains all settings needed to run litteR. An example of the contents of a settings file is given in the figure below:
# litteR settings file
# Period to analyse (YYYY-mm-dd)
date_min: 2012-01-01
date_max: 2017-12-31
# Percentage of total count to analyse (0 < percentage_total_count <= 100)
percentage_total_count: 80
# Data file.
# Note: the datafile must be in the same path as the settings file
# Note: the file extension should be .csv
file_data: beach-litter-nl-2012-2017.csv
# Type file. Defines the types and their groups
file_types: types-ospar-materials.csv
# Select trend figures to plot in the report
# Note: this can be zero, one, or more than one location_code, region_code,
# group_code, and/or type_name
location_code: ["NL001", "NL004"]
region_code: ["NL"]
group_code: ["TC", "SUP", "FISH"]
type_name: ["Plastic: Bags [2]"]
# figure quality (high or low)
figure_quality: high
# cutoff value vertical axis with litter counts (percentage)
cutoff_count_axis: 100
The settings-file contains the following entries:
date_min
and date_max
, the first and final
date of the period to analyze. Dates should be given in ISO format,
i.e., YYYY-mm-dd (for example 2022-08-26, to indicate 26 August
2022);percentage_total_count
: the percentage of the total
count used to estimate statistics. See the section on descriptive statistics for more
information;file_data
: name of the data
file (including its path, e.g.,
c:/my-litter-directory/my-litter-data.csv);file_types
: name of the type
file (including its path, e.g.,
c:/my-litter-directory/types-ospar-materials.csv);location_code
: name(s) of the location(s) to plot.
These should exist in column location_code
in the data file. As mentioned in the previous section,location_code
s
should be readily interpretable for the user, as these codes are also
used in the litteR-results (tables and plots);region_code
: name(s) of the region(s) to plot. These
should exist in column region_code
in the data file;group_code
: name(s) of group(s) to plot. Litter groups
should be available as column names in the type
file;type_name
: name(s) of type(s) to plot; Type names
should be available in the type file and data file;figure_quality
: quality of the plots in the report,
either high
or low
.cutoff_count_axis
: optional cutoff value as a
percentage of the vertical count axis in trend plots. A cutoff value is
useful to improve the readability of a plot in case of a few very high
litter counts.All input files are validated by litteR. The following validation rules apply:
litteR produces three output files:
For convenience, all input and output files are stored as a snapshot
in a directory with names like
litteR-results-20210904T221809
, where the final part of the
name is a timestamp.
litteR produces an HTML-report that can best be viewed with modern web browsers like Mozilla FireFox, Google Chrome, or Safari. These browsers are freely available from the internet.
The filename of each report starts with ‘litter-results’, followed by
a timestamp: YYYYmmddTHHMMSS and the extension html. For example:
litteR-results-20210904T221809.html
This section briefly describes each section in the HTML-report
This section gives a summary of the settings in the settings file.
In this section (potential) problems in the input files are reported. These problems are also stored in the log file.
For each location_code
in the data
file, adjusted
boxplots are given of the total count for the detection of outliers. Outliers are
given as dots (if any) in adjusted box-and-whisker plots. Adjusted
boxplots are more suitable for outlier detection in case of skewed
distributions than traditional box plots. An example of these
box-and-whisker plots are given below.
For each location_code
and group/type name, the
following statistics are estimated:
These statistics will be estimated for litter types with the greatest
counts making up a given percentage of the total count for each location
and for all groups specified in the type file.
This percentage needs to be provided as
percentage_total_count
in the settings file.
The descriptive statistics for the litter types and groups are stored
in a CSV-file with a name starting with litteR-results
and
ending with a timestamp. The
statistics for litter groups are also printed as a table and shown as
bar plots in the report: one plot for each location_code
column of the data file. An example is given in
the figure below. If you want other groups, or only a subset of groups,
you should modify the type file.
In addition to the statistics given above, the top 10 of litter types for each location is given in a table and as a figure. This top 10 is based on median litter counts.
When the data file contains column
region_code
, the data for the location_code
s
in that region are spatially aggregated in a stepwise fashion:
location_code
) within that region
(region_code
).Note that these statistics are so called intra-block statistics,
i.e., data from individual location_code
s are not
merged.
The summary statistics are:
regional mean
: the mean of the
means of the individual locations (location_code
) within a
region (region_code
) for each litter group;
regional median
: the median of the medians of
the individual locations (location_code
) within a region
(region_code
) for each litter group;
regional slope
: the median of the Theil-Sen slopes
of the individual locations (location_code
) within a region
(region_code
) for each litter group. Data from different
locations have not been mixed in the computation of the Theil-Sen
slopes. This method is similar to the one in Gilbert (1987) except that
in our procedure all locations within a region contribute equally to the
regional trend.
p_value
: the p-values for each regional trend
(slope
) are computed by means of the expressions given in
Van Belle &
Hughes, 1984 (Eqs. 2 and 7) and Gilbert,
1987 (Eqs. 17.1 - 17.5).
In addition to the regional statistics given above, the top 10 of litter types for each region is given in a table and as a figure. This top 10 is based on median litter counts.
For each location_code
, and the type names and group
codes specified in the settings file,
trends are estimated by means of the Theil-Sen
slope estimator: a robust non-parametric estimator of slope (counts
/ year). The significance of the estimated slopes is tested by means of
the Mann-Kendall
test. The Mann-Kendall test is a non-parametric test and as such
does not make distributional assumptions on the data.
The figure below gives examples of trend plots for total count (TC), single use plastics (SUP), and plastic bags at the beach of Terschelling (The Netherlands). In each plot, the black dots are the observations, the thin gray line segments connect the dots and guide the eye, and the red line is the Theil-Sen slope.