---
title: "Conformal prediction with different choices"
output:
  rmarkdown::html_vignette: default
vignette: >
  %\VignetteIndexEntry{Conformal prediction with different choices}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r cp-knit-opts, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6, fig.height = 4,
  fig.align = "center"
)
```

```{r setup}
library(MetaHunt)
set.seed(1)
```

This vignette covers the three conformal-prediction interfaces
exported by MetaHunt. The validity of all three rests on the
exchangeability assumption A3 in
`vignette("metahunt-intro", package = "MetaHunt")` §"Key
assumptions"; we do not re-derive it here.

## Why conformal prediction here

Conformal prediction wraps any black-box prediction rule and
produces a band around its forecast that, on average across new
studies, will contain the truth at least `(1 - alpha)` of the time.
The key word is *marginal*: the guarantee is over the random draw
of the new study, not conditional on a specific covariate value.
All you need is for the calibration data to be exchangeable with
the new study (assumption A3) — no distributional assumptions on
the noise or on the weight model.

## A small standalone simulation

```{r cp-simulate}
# m = 80 is large enough that with cal_frac = 0.5 and alpha = 0.05 the conformal quantile is finite.
m <- 80; G <- 20; K_true <- 3
x <- seq(0, 1, length.out = G)
basis <- rbind(sin(pi * x), cos(pi * x), x)
W <- data.frame(w1 = rnorm(m), w2 = rnorm(m))
beta <- cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0))
pi_true <- exp(as.matrix(W) %*% beta); pi_true <- pi_true / rowSums(pi_true)
F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G)
```

```{r cp-wnew}
W_new <- data.frame(w1 = c(0, 1, -1), w2 = c(0, -0.5, 1))
```

## The three flavours

All three return an object you can plot directly with `plot()`.

| Function | When to use |
|---|---|
| `split_conformal()` | Default. One train/calibration split. Fastest; some variance from the random split. |
| `cross_conformal()` | Many studies, want lower split-induced variance. Refits the pipeline `n_folds + 1` times. |
| `conformal_from_fit()` | You've already fit a pipeline (e.g. after tuning `K`) and want intervals without refitting. See `?conformal_from_fit`. |

## Split conformal

`split_conformal()` does a single train / calibration split. With
small `m`, set `cal_frac = 0.5` so the calibration set is large
enough for the chosen `alpha`.

```{r cp-split-pointwise}
res_pw <- split_conformal(F_hat, W, W_new, K = K_true, alpha = 0.05,
                          cal_frac = 0.5, seed = 1,
                          dfspa_args = list(denoise = FALSE))
plot(res_pw, target_idx = 1, x_axis = x)
```

The shaded region is the pointwise 95% conformal band; it has finite width because `n_cal = 40` is large enough for `α = 0.05`.

```{r cp-split-scalar}
res_scalar <- split_conformal(F_hat, W, W_new, K = K_true,
                              wrapper = mean, alpha = 0.05,
                              cal_frac = 0.5, seed = 1,
                              dfspa_args = list(denoise = FALSE))
data.frame(prediction = res_scalar$prediction,
           lower      = res_scalar$lower,
           upper      = res_scalar$upper)
```

## Cross conformal

`cross_conformal()` reduces the variance of the band that comes
from the random split, at the cost of refitting `n_folds + 1`
times.

```{r cp-cross}
res_cross <- cross_conformal(F_hat, W, W_new, K = K_true, n_folds = 4,
                             wrapper = mean, alpha = 0.1, seed = 1,
                             dfspa_args = list(denoise = FALSE))
res_cross
```

## Pre-fit conformal

If you have already run `metahunt()` (for instance after tuning
`K`) and do not want to refit, `conformal_from_fit()` recycles the
existing fit to produce calibrated intervals. The example below
re-uses the training data as the calibration set *for demonstration
only*; in real use, hold out a separate calibration set so the
exchangeability argument applies to genuinely unseen studies.

```{r cp-prefit}
fit <- metahunt(F_hat, W, K = K_true, dfspa_args = list(denoise = FALSE))
pi_hat <- project_to_simplex(F_hat, fit$dfspa_fit$bases)
res_pre <- conformal_from_fit(
  dfspa_fit = fit$dfspa_fit, weight_model = fit$weight_model,
  F_cal = F_hat, W_cal = W, W_new = W_new,
  wrapper = mean, alpha = 0.1
)
res_pre
```

## Pointwise vs scalar bands

A pointwise band returns a `(1 - alpha)` interval at each grid
point but does not give a joint guarantee across grid points: the
probability that the truth lies inside the entire band
simultaneously is generally lower than `1 - alpha`. A scalar
wrapper (e.g. `wrapper = mean`) collapses the function to a single
number and gives one calibrated interval, which is the right tool
for joint inferential claims. If you need a joint coverage statement across the grid, either apply a multiple-testing correction (e.g. divide α by `G`) or replace the pointwise band with a scalar wrapper — see `vignette('wrapper-scalar', package = 'MetaHunt')`.

## Small-`m` warning on `cal_frac`

> With too-few calibration studies for the chosen `alpha`, the
> conformal quantile is `Inf` and intervals are unbounded. The
> finite-sample formula needs
> `n_cal >= ceiling((1 - alpha)(n_cal + 1))` calibration studies;
> below that threshold the package warns and the bands degenerate.
> Either raise `cal_frac`, raise `alpha`, or switch to
> `cross_conformal()`.

Below we deliberately reuse only the first 30 of our 80 studies so the calibration set is too small for `α = 0.05`. The package issues a warning and returns `Inf` quantiles; the corresponding intervals are unbounded. The fix is to either supply more studies, raise `α`, or raise `cal_frac`.

```{r cp-small-m-warning, warning = TRUE}
m_small <- 30  # too small for alpha = 0.05 with cal_frac = 0.5
F_small <- F_hat[1:m_small, , drop = FALSE]
W_small <- W[1:m_small, , drop = FALSE]
res_inf <- split_conformal(F_small, W_small, W_new, K = K_true,
                           alpha = 0.05, cal_frac = 0.5, seed = 1,
                           dfspa_args = list(denoise = FALSE))
res_inf$quantile        # Inf — quantile is unbounded
range(res_inf$lower)    # -Inf
range(res_inf$upper)    #  Inf
```

## See also

- `vignette("metahunt-intro", package = "MetaHunt")` — pipeline
  context and the A3 exchangeability assumption.
- `?split_conformal` — single-split conformal calibration.
- `?cross_conformal` — cross-fitting conformal calibration.
- `?conformal_from_fit` — calibration using an existing fit.
- `?coverage` — empirical coverage diagnostics for conformal bands.
- `?plot.metahunt_conformal` — plotting method for the returned
  objects.