detectXOR: XOR pattern detection and visualization in R

Provides tools for detecting XOR-like patterns in variable pairs. Includes visualizations for pattern exploration.

Overview

Traditional feature selection methods often miss complex non-linear relationships where variables interact to produce class differences. The detectXOR package specifically targets XOR patterns - relationships where class discrimination only emerges through variable interactions, not individual variables alone.

Key capabilities

πŸ” XOR pattern detection - Statistical identification using χ² and Wilcoxon tests
πŸ“ˆ Correlation analysis - Class-wise Kendall Ο„ coefficients
πŸ“Š Visualization - Spaghetti plots and decision boundary visualizations
⚑ Parallel processing - Multi-core acceleration for large datasets
πŸ”¬ Robust statistics - Winsorization and scaling options for outlier handling

Installation

Install the development version from GitHub:

# Install devtools if needed
if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") }
# Install detectXOR
devtools::install_github("JornLotsch/detectXOR")

Dependencies

The package requires R β‰₯ 3.5.0 and depends on: - dplyr, tibble (data manipulation) - ggplot2, ggh4x, scales (visualization) - future, future.apply, pbmcapply, parallel (parallel processing) - reshape2, glue (data processing and string manipulation) - DescTools (statistical tools) - Base R packages: stats, utils, methods, grDevices

Optional packages (suggested): - testthat, knitr, rmarkdown (development and documentation) - doParallel, foreach (additional parallel processing options)

Quick start

Basic XOR detection

library(detectXOR)
# Load example data
data(XOR_data)
# Detect XOR patterns with default settings
results <- detectXOR(XOR_data, class_col = "class")
# View summary
print(results$results_df)

Usage with custom parameters

# Detection with custom thresholds and parallel processing
results <- detect_xor(
  data = XOR_data,
  class_col = "class",
  p_threshold = 0.01,
  tau_threshold = 0.4,
  max_cores = 4,
  extreme_handling = "winsorize",
  scale_data = TRUE
)

Function parameters

detectXOR() - Main detection function

Parameter Type Default Description
data data.frame required Input dataset with variables and class column
class_col character "class" Name of the class/target variable column
check_tau logical TRUE Compute class-wise Kendall Ο„ correlations
compute_axes_parallel_significance logical TRUE Perform group-wise Wilcoxon tests
p_threshold numeric 0.05 Significance threshold for statistical tests
tau_threshold numeric 0.3 Minimum absolute Ο„ for β€œstrong” correlation
abs_diff_threshold numeric 20 Minimum absolute difference for practical significance
split_method character "quantile" Tile splitting method: "quantile" or "range"
max_cores integer NULL Maximum cores for parallel processing (auto-detect if NULL)
extreme_handling character "winsorize" Outlier handling: "winsorize", "remove", or "none"
winsor_limits numeric vector c(0.05, 0.95) Winsorization percentiles
scale_data logical TRUE Standardize variables before analysis
use_complete logical TRUE Use only complete cases (remove NA values)

Output structure

The detectXOR() function returns a list with two components: ### results_df - Summary data frame

Column Description
var1, var2 Variable pair names
xor_shape_detected Logical: XOR pattern identified
chi_sq_p_value χ² test p-value for tile independence
tau_class_0, tau_class_1 Class-wise Kendall Ο„ coefficients
tau_difference Absolute difference between class Ο„ values
wilcox_p_x, wilcox_p_y Wilcoxon test p-values for each axis
significant_wilcox Logical: significant group differences detected

pair_list - Detailed results

Contains comprehensive analysis for each variable pair including: - Tile pattern analysis results - Statistical test outputs - Processed data subsets - Intermediate calculations

Visualization functions

Function Description Key Parameters
generate_spaghetti_plot_from_results() Creates connected line plots showing variable trajectories for XOR-detected pairs results, data, class_col, scale_data = TRUE
generate_xy_plot_from_results() Generates scatter plots with decision boundary lines for detected XOR patterns results, data, class_col, scale_data = TRUE, quantile_lines = c(1/3, 2/3), line_method = "quantile"

Both functions return ggplot objects that can be displayed or saved manually.

# Generate plots
generate_spaghetti_plot_from_results(results, XOR_data) 
generate_xy_plot_from_results(results, XOR_data)

Example plots

Reporting functions

Function Description Key Parameters
generate_xor_reportConsole() Creates console-friendly formatted report with optional plots results, data, class_col, scale_data = TRUE, show_plots = TRUE
generate_xor_reportHTML() Generates comprehensive HTML report with interactive elements results, data, class_col, output_file, open_browser = TRUE

Example report

# Generate formatted report 
generate_xor_reportHTML(results, XOR_data, class_col = "class")

The report will be automaticlaly opened in the system standard web browser.

Methodology

XOR detection pipeline

  1. Pairwise dataset creation - Extract all variable pairs with preprocessing
  2. Tile pattern analysis - Divide variable space into 2Γ—2 tiles and test for XOR-like distributions
  3. Statistical validation - Apply χ² tests for independence and Wilcoxon tests for group differences
  4. Correlation analysis - Compute class-wise Kendall Ο„ to quantify relationship strength
  5. Result aggregation - Combine findings into interpretable summary format

Statistical tests

Use cases

Machine learning

Technical details

Cross-platform compatibility

Package structure

detectXOR/
β”œβ”€β”€ R/                 # Package source code
β”œβ”€β”€ man/               # Package documentation
β”œβ”€β”€ data/              # Example dataset
β”œβ”€β”€ issues/            # Problem reporting
└── analyses/          # Files used to generate or plot publictaion data sets (not in library)

Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests on GitHub. ## License GPL-3 ## Citation

For citation details or to request a formal publication reference, please contact the maintainer.