bioLeak 0.3.0
New features
- Added N-axis combined splitting via
constraints in make_split_plan(), generalizing
beyond two-axis combined CV while preserving train/test exclusion across
all declared axes.
- Added
compact = TRUE split storage (fold assignments)
for large datasets to reduce split object memory footprint.
- Added
check_split_overlap() for explicit
overlap-invariant validation across fold/group axes.
- Added
cv_ci() (with Nadeau-Bengio correction) and
integrated CI columns into fit_resample() and
tune_resample() metric summaries (*_ci_lo,
*_ci_hi).
- Added
guard_to_recipe() to map guarded preprocessing
configurations to recipes pipelines with explicit
fallback/warning behavior.
- Added
benchmark_leakage_suite() for reproducible
modality-by-mechanism benchmark grids and detection-rate summaries.
- Expanded
audit_leakage() diagnostics with mechanism
taxonomy fields (mechanism_class, taxonomy,
mechanism_summary) and richer risk attribution
outputs.
- Added FDR-aware target scan outputs (
p_value_adj,
flag_fdr) with selectable multiple-testing correction
(target_p_adjust, target_alpha).
- Added
feature_space
(raw/rank) and duplicate_scope
(train_test/all) controls for duplicate
diagnostics.
- Strengthened permutation auditing with explicit
perm_mode handling for rsample-derived splits and safer
perm_refit = "auto" behavior.
- Extended tidymodels interoperability: rsample conversion and
metadata inference are more robust (
split_cols = "auto",
mode/perm-mode propagation, stricter compatibility checks).
- Improved nested tuning safety in
tune_resample(): final
refit now aggregates hyperparameters across outer folds
(median/majority) instead of selecting a single best outer fold.
- Added binomial threshold tuning support in
tune_resample() using inner-fold predictions
(tune_threshold, threshold_grid,
threshold_metric).
- Added structured fold-status tracking (
fold_status) and
elapsed timing in both fitting and tuning paths for better failure-mode
observability.
- Added strict-mode and validation-policy infrastructure
(
bioLeak.strict, bioLeak.validation_mode) with
structured condition classes for safer recipe and workflow
guardrails.
- Added provenance capture (
.bio_capture_provenance) and
attached provenance metadata to LeakFit,
LeakAudit, and LeakTune.
- Improved
summary.LeakAudit() output with explicit
Mechanism Risk Assessment reporting.
- Hardened recipe preprocessing in
fit_resample() to
avoid fold-time failures when recipes reference split metadata columns
(for example subject).
- Updated simulation defaults and audit settings for more practical
runtime (
simulate_leakage_suite() default B,
auto refit cap handling).
- Updated manuscript/simulation assets under
paper/ with
refreshed large-scale simulation outputs and case-study artifacts.
bioLeak 0.2.0
New features
- Leak-safe hyperparameter tuning via
tune_resample(): nested cross-validation using tidymodels
tune/dials with leakage-aware outer
splits.
- Tidymodels interoperability:
fit_resample() now accepts rsample rset/rsplit
objects as splits, recipes::recipe for
preprocessing, workflows::workflow as learner,
and yardstick::metric_set for metrics.
as_rsample() converts LeakSplits to an
rsample rset.
- Parsnip model specs accepted directly as the
learner argument in fit_resample().
- Diagnostics polish: new
calibration_summary() and plot_calibration()
for probability calibration checks;
confounder_sensitivity() and
plot_confounder_sensitivity() for sensitivity
analysis.
- Simulation utility
simulate_leakage_suite() for generating controlled leakage
scenarios and benchmarking audit sensitivity.
- HTML audit report via
audit_report():
renders a self-contained HTML summary of all audit results for sharing
and review.
- Multi-learner auditing with
audit_leakage_by_learner() to audit each learner in a
multi-model fit separately.
- Multivariate target leakage scan enabled by default
in
audit_leakage() for supported tasks, complementing the
existing univariate scan.
- Refit-based permutations
(
perm_refit = TRUE or "auto") in
audit_leakage() for a more powerful permutation gap test
when refit data are available.
- Class weights support in
fit_resample() for imbalanced classification tasks.
- New plotting functions:
plot_fold_balance(),
plot_overlap_checks(),
plot_perm_distribution(),
plot_time_acf().
Improvements
- S4 classes (
LeakSplits, LeakFit,
LeakAudit) now include setValidity checks for
slot consistency.
summary() methods for LeakFit,
LeakAudit, and LeakTune improved with clearer
console output and edge-case handling.
impute_guarded() gains enhanced diagnostics and RNG
safety.
.guard_fit() and .guard_ensure_levels()
made more robust with better error messages.
- Permutation label factory (
permute_labels) gains
verbose mode, digest-based caching, and improved stratification
safety.
audit_leakage() handles NA metrics gracefully and
enriches trail metadata.
make_split_plan() improved stratification logic and
reproducible seeding.
audit_report() now renders from a temporary copy of the
Rmd template to avoid write failures on read-only file systems
(e.g. during R CMD check).
- Comprehensive vignette (
bioLeak-intro) rewritten with
guided workflow and leaky-vs-correct comparisons.
Bug fixes
- Fixed
fit_resample() result aggregation when folds fail
during preprocessing.
- Fixed
missForest preprocessing dropping rows.
- Fixed single-level factors causing errors in guarded
preprocessing.
- Fixed filter keep-column alignment by name.
- Fixed
glmnet folds receiving non-numeric design
matrices.
- Fixed constant imputation for categorical data.
- Fixed RANN self-neighbour filter in duplicate detection.
- Fixed various edge cases in outcome extraction and hashing
utilities.
- Resolved multiple CRAN check issues (Rd formatting, example runtime,
read-only file-system writes).
bioLeak 0.1.0
- Initial release.
- Core pipeline:
make_split_plan() for
leakage-aware splitting (subject-grouped, batch-blocked, study
leave-out, time-ordered); fit_resample() for
cross-validated fitting with built-in guarded preprocessing (train-only
imputation, normalisation, filtering, feature selection).
- Leakage auditing:
audit_leakage() with
label-permutation gap test, batch/study association tests, univariate
target leakage scan, and near-duplicate detection.
- Guarded preprocessing helpers:
impute_guarded(), predict_guard(),
.guard_fit(), .guard_ensure_levels().
- S4 class system:
LeakSplits, LeakFit,
LeakAudit.
- Support for binomial, multiclass, regression, and survival
tasks.
- Built-in learners:
glm, glmnet,
ranger, xgboost (via
custom_learners).
SummarizedExperiment input support.
- Vignette and comprehensive documentation.