Orangutan is an R package for analyzing and visualizing measurements (morphometrics) from groups such as species or populations. It runs a full analysis pipeline that summarizes data, finds variables that differentiate groups, performs multivariate and univariate statistics, and produces publication-ready plots.
Loads and validates your CSV data (requires a
species column).
Optionally applies allometric correction
Adjusts mensural measurements for a user-selected variable (e.g. body
size).
Optionally removes extreme outliers within
species
Uses user-specified variables and a configurable tail percentage.
05_data_cleaned_outliers_removed.csv05_qc_outlier_audit_log.csvComputes per-species summary statistics
Mean, SD, min, and max for all variables.
06_summary_stats.csvIdentifies variables that do not overlap between
species
Finds diagnostic traits and produces publication-ready plots.
07_nonoverlaps_list.csv07_nonoverlap_plot_<species1>_vs_<species2>_<variable>.pdfRuns multivariate tests on the full dataset
08_multi_betadisper_overall_test.csv08_multi_betadisper_pairwise_tests.csv08_multi_permanova_species_effect.csvPerforms Principal Components Analysis (PCA) on scaled variables
09_multi_pca_plot.pdf09_multi_pca_top_loadings_PC1_PC2_plot.pdf09_multi_pca_top_loadings_PC1_PC2.csvRuns PCA axis post-hoc tests
09_multi_pca_posthoc.csvRuns Discriminant Analysis of Principal Components (DAPC)
10_multi_dapc_plot.pdf11_multi_dapc_confusion_matrix.csv11_multi_dapc_performance_metrics.csv11_multi_dapc_misclassified_individuals.csvPerforms univariate tests for each variable
12_uni_anova_summary.csv12_uni_anova_plot_<variable>.pdf13_uni_kruskalwallis_summary.csv13_uni_kruskalwallis_plot_<variable>.pdfAutomatically identifies and analyzes categorical variables
14_categorical_analysis_summary.csv14_categorical_percentages_summary.csv14_categorical_barplot_<variable>.pdfEnsures reproducibility
output_dir.
00_methods_summary.txt — human-readable methods summary
alongside the exact R environment and call configurations.Generates an HTML interpretation report
orangutan_report.htmlInstall the latest stable release from CRAN (v2.0.0):
install.packages("Orangutan")Install the development version directly from GitHub (v2.1.0):
install.packages("pak")
pak::pak("metalofis/Orangutan-R")Quick example: run_orangutan called with default
parameters (writes results next to the input file by default):
library(Orangutan)
run_orangutan("data/my_dataset.csv")Full example: run_orangutan called with all available
arguments
library(Orangutan) # Load the Orangutan package
run_orangutan(
# ---------- Input / output ----------
data_path = "data/my_dataset.csv", # Path to your input CSV dataset
output_dir = "address/to/orangutan_outputs", # Folder where all outputs (plots, tables) will be saved
# ---------- Allometry ----------
apply_allometry = TRUE, # Whether to adjust measurements for allometry
allometry_var = "SVL", # Column used as the reference variable for allometry correction
# ---------- Outlier handling ----------
remove_outliers = TRUE, # Whether to remove extreme values (outliers)
outlier_vars = c("SVL"), # Which variables to check for outliers
outlier_tail_pct = 0.05, # Proportion of extreme values to remove from each tail (5% here)
# ---------- PCA / DAPC highlighting ----------
species_to_encircle = c("carolinensis", "torresfundorai"), # Species to highlight on PCA/DAPC plots
# ---------- Color palette ----------
palette_name = "Paired", # Name of the color palette for plots ("Paired", "Set3", "Dark2")
custom_colors = c(SpeciesA = "#FF0000", SpeciesB = "#00FF00"), # Optional: custom hex codes for specific species
# ---------- Point aesthetics ----------
point_aes = list(
point_size = 3.5, # Size of each individual point
jitter_width = 0.1, # Horizontal jitter to prevent overplotting
jitter_alpha = 0.8, # Transparency of points
jitter_shape = 21, # Shape of the points (21 = filled circle with border)
jitter_color = "black", # Border color of points
jitter_stroke = 0.35 # Thickness of the point border
),
# ---------- Mean point aesthetics ----------
mean_aes = list(
size = 1.8, # Size of the mean point
shape = 21, # Shape of the mean point
fill = "white", # Fill color of the mean point
color = "black", # Border color of the mean point
stroke = 0.6 # Thickness of the mean point border
),
# ---------- Violin aesthetics ----------
violin_aes = list(
alpha = 0.4 # Transparency of violin plots
),
# ---------- Boxplot aesthetics ----------
box_aes = list(
alpha = 0.4, # Transparency of boxplots
width = 0.15 # Width of boxplots
),
# ---------- Label / text control ----------
label_aes = list(
text_size = 6, # Size of text labels on plots
axis_text_size = 10, # Size of axis tick labels
title_size = 12, # Size of plot titles
label_offset = 0.05 # Distance of labels from points
),
# ---------- Optional label templates ----------
label_templates = list(
nonoverlap_title = "Non-Overlapping Pair: %s vs %s for %s", # Title template for non-overlapping variable plots
pca_x = "PC1 (%s%% variance)", # Label for PCA X-axis with explained variance
pca_y = "PC2 (%s%% variance)", # Label for PCA Y-axis with explained variance
dapc_x = "LD1 (%s%%)", # Label for DAPC X-axis with explained variance
dapc_y = "LD2 (%s%%)", # Label for DAPC Y-axis with explained variance
dapc_title_1d = "DAPC – Single Discriminant Axis" # Title for one-dimensional DAPC plots
),
# ---------- Multivariate test seeds ----------
seeds = list(betadisper = 123, permanova = 456), # Seed for reproducible dispersion/randomization calculations and permutation tests
# ---------- Messaging ----------
verbose = FALSE # Whether to print progress messages in console
)apply_allometry = TRUE).c(SpeciesA = "#FF0000")).list(betadisper = 123, permanova = 456)).species column and one or more numeric
measurement columns.| species | main_length | Head_length | Supralabials | Color |
|---|---|---|---|---|
| allisoni | 86.5 | 25.2 | 9 | Blue |
| allisoni | 73.6 | 24.8 | 8 | Blue |
| carolinensis | 63.0 | 18.3 | 8 | Green |
| carolinensis | 59.0 | 19.17 | 8 | Green |
| torresfundorai | 66.9 | 18.7 | 7 | Green |
| torresfundorai | 70.9 | 23.6 | 7 | Green |
Every run automatically produces orangutan_report.html
inside output_dir. Open it in any web browser to get a
plain-language summary of all analysis sections, with embedded thumbnail
images of the key plots. No extra arguments are needed — the report is
generated by default.
Torres, J. (2026). Orangutan: An R Package for Analyzing and Visualizing Phenotypic Data in the Context of Species Descriptions and Population Comparisons. Ecology and Evolution, 16(2), e73111. https://doi.org/10.1002/ece3.73111