---
title: "An introduction to MetaHunt"
output:
  rmarkdown::html_vignette: default
vignette: >
  %\VignetteIndexEntry{An introduction to MetaHunt}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r intro-knitr-opts, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6, fig.height = 4,
  fig.align = "center"
)
```

```{r setup}
library(MetaHunt)
set.seed(1)
```

## The problem: function-valued meta-analysis

You have several **studies** — trial centres, cohorts, sites — and each
one has produced a fitted model for some function of patient-level
covariates. That function might be a regression curve, a CATE, or a
dose-response curve. You also have **study-level metadata** for each
study: region, sample composition, year, and so on. You want to predict
that function for a *new* study, characterised only by its metadata,
together with a prediction band, and you want to do it without ever
pooling patient-level data across studies.

MetaHunt is designed for exactly that workflow. It assumes a small
number of latent **basis functions** drive cross-study heterogeneity,
recovers them from the per-study estimates, and learns how the mixing
weights depend on study-level metadata. The result is a single fitted
object that you can `predict()` for new metadata and combine with
conformal prediction for uncertainty quantification. If you would
rather see one short example before reading the full pipeline, start
with `vignette("get-started", package = "MetaHunt")`.

## Key assumptions

The MetaHunt pipeline rests on four assumptions. We state them in
applied-reader voice; the manuscript contains the formal statements
and proofs.

### A1. Low-rank cross-study heterogeneity

We assume the true study-level functions all live inside the convex
hull of a small number $K$ of shared *latent basis functions*. Each
study is a convex combination of the bases, with non-negative weights
that sum to one. Geometrically, every study is a point inside a
$(K-1)$-simplex whose vertices are the basis functions; the bases are
identifiable as the vertices of that hull.

Formally, there exist basis functions $g_1, \ldots, g_K$ with $K < m$ such that, for every study $i \in \{0, 1, \ldots, m\}$,
$$f^{(i)}(\boldsymbol{x}) \;=\; \sum_{k=1}^{K} \pi_{ik}\, g_k(\boldsymbol{x}), \qquad \boldsymbol{x} \in \mathcal{X},$$
with weight vector $\boldsymbol{\pi}_i = (\pi_{i1}, \ldots, \pi_{iK})^\top \in \Delta_{K-1}$, the $(K-1)$-simplex. The bases are non-degenerate, i.e. $g_2 - g_1, \ldots, g_K - g_1$ are linearly independent in $L^2(\mu)$.

*What this buys you:* the entire $m$-by-grid table of study functions
collapses to $K$ shared shapes plus an $m$-by-$K$ table of weights, so
heterogeneity becomes low-dimensional and shareable.

### A2. Weight model

The mixing weight $\boldsymbol{\pi}_i$ for study $i$ is drawn from
some conditional distribution given that study's covariates
$\boldsymbol{W}_i$. The distribution can be anything you can
estimate — a Dirichlet regression, an RKHS-Dirichlet, a multinomial
logit, a nearest-neighbour smoother. MetaHunt does not commit you to a
specific functional form.

Formally, for $i = 0, 1, \ldots, m$, the weight vectors are independent draws
$$\boldsymbol{\pi}_i \mid \boldsymbol{W}_i \;\stackrel{\text{ind.}}{\sim}\; \mathcal{P}_{\boldsymbol{\pi} \mid \boldsymbol{W}}(\,\cdot \mid \boldsymbol{W}_i),$$
where $\mathcal{P}_{\boldsymbol{\pi}\mid\boldsymbol{W}}$ is an arbitrary distributional map from the covariate space $\mathcal{W}$ to the simplex $\Delta_{K-1}$.

*What this buys you:* once you can map metadata to weights, predicting
the function for a brand-new target population reduces to predicting
its weights and re-mixing the recovered bases.

### A3. Exchangeability

The studies (including the target) are exchangeable units drawn from a
common data-generating process: their joint distribution is invariant
under reordering. This is much weaker than i.i.d. and is the standard
condition under which conformal prediction is valid.

Formally, the site-level covariates $\boldsymbol{W}_0, \boldsymbol{W}_1, \ldots, \boldsymbol{W}_m$ are exchangeable. Combined with A1–A2, this implies that the triples
$$\bigl(\boldsymbol{W}_i,\, \boldsymbol{\pi}_i,\, f^{(i)}\bigr), \qquad i = 0, 1, \ldots, m,$$
are jointly exchangeable across $i$.

*What this buys you:* the calibration step in conformal prediction
inherits a finite-sample, distribution-free coverage guarantee with
no need for parametric error models.

### A4. Estimation error control

Each study reports a noisy version of its true function,
$\hat f^{(i)} = f^{(i)} + \epsilon^{(i)}$. We assume the within-study
estimation error vanishes uniformly in the within-study sample size,
and that the number of studies $m$ does not grow too fast relative to
those sample sizes. In words: every study should be reasonably
well-estimated, and you should not have a large number of studies
whose individual sample sizes $n_i$ are small.

Formally, write $\hat f^{(i)}(\boldsymbol{x}) = f^{(i)}(\boldsymbol{x}) + \epsilon^{(i)}(\boldsymbol{x})$. We assume
$$\sup_{\boldsymbol{x} \in \mathcal{X}} \mathbb{E}\!\left[\epsilon^{(i)}(\boldsymbol{x})^2\right] = O\!\left(n_i^{-r}\right) \quad \text{for some } r > 0,$$
and that $m = o\bigl(\inf_i n_i^{\,a}\bigr)$ for some $0 < a < r$, where $n_i$ is study $i$'s sample size.

*What this buys you:* the noise in $\hat F$ does not accumulate
through the pipeline, basis recovery is consistent, and conformal
intervals retain their asymptotic coverage despite a multi-stage
estimator.

## The three-step pipeline

`metahunt()` is a thin wrapper around three exported steps. You can
also call them individually if you want fine-grained control.

### Step 1. Basis hunting via d-fSPA

`dfspa()` extends the Successive Projection Algorithm to functions.
It iteratively picks the study whose current residual norm is largest,
projects the rest onto the orthogonal complement, and repeats $K$
times. A denoising step averages each study with its near neighbours
before the search; this trades a small bias for substantially smaller
variance when the per-study estimates are noisy.

### Step 2. Fitting weight model

For each study, `project_to_simplex()` finds the convex weights that
best reconstruct its observed function from the recovered bases.
`fit_weight_model()` then regresses these weight vectors on the
study-level covariates `W` using a method of your choice (default:
Dirichlet regression).

### Step 3. Target prediction

`predict.metahunt()` takes a new metadata row $W_0$, predicts its
weight vector through the fitted weight model, and returns the convex
combination of the recovered bases. With `wrapper = mean` (or any
other reduction) it returns a scalar summary — for example, an ATE
under a uniform grid weighting.

## A worked end-to-end example

For the rest of the vignette we work with a simulated `(F_hat, W)` so
the truth is known. In your own data, replace this block with the
data-prep onramp described in `vignette("data-prep")`.

```{r simulate}
G <- 40; m <- 120; K_true <- 3
x <- seq(0, 1, length.out = G)
basis <- rbind(sin(pi * x), cos(pi * x), x)         # 3 true bases on the grid

W <- data.frame(w1 = rnorm(m), w2 = rnorm(m))       # study-level covariates
beta <- cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0))
pi_true <- exp(as.matrix(W) %*% beta)
pi_true <- pi_true / rowSums(pi_true)

F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G)
dim(F_hat)
```

In real data, `F_hat[i, ]` would be `predict(model_i, newdata = grid)`
and `W[i, ]` would be that centre's metadata. The data-prep vignette
describes a one-line `lm`-based onramp.

Fit the full pipeline and inspect the recovered bases:

```{r fit}
fit <- metahunt(F_hat, W, K = 3)
fit
```

```{r plot-bases}
plot(fit, x_axis = x,
     col = c("#0072B2", "#D55E00", "#009E73"))
```

Predict the target functions for three new metadata profiles and plot
them:

```{r predict}
W_new <- data.frame(w1 = c(0, 1, -1), w2 = c(0, -0.5, 1),
                    row.names = c("baseline", "high w1, low w2", "low w1, high w2"))
f_pred <- predict(fit, newdata = W_new)
dim(f_pred)
oldpar <- par(mar = c(4, 4.5, 3, 1))
matplot(x, t(f_pred), type = "l", lty = 1,
        col = c("#0072B2", "#D55E00", "#009E73"),
        xlab = "x", ylab = expression(tilde(f)(x)),
        main = "Predicted target functions")
legend("topright", legend = rownames(W_new),
       col = 1:3, lty = 1, bty = "n")
par(oldpar)
```

Pass a `wrapper` for a scalar summary per target (an ATE under uniform
grid weights):

```{r predict-wrapped}
predict(fit, newdata = W_new, wrapper = mean)
```

## Where to next

- `vignette("data-prep")` — turning per-centre fitted models into
  `(F_hat, W)` via `build_grid()` and `f_hat_from_models()`,
  including a self-contained `lm` onramp.
- `vignette("grid-weights")` — choosing the `grid_weights` argument
  and the underlying $L^2(\mu)$ inner product.
- `vignette("choosing-k-denoising")` — picking the rank $K$ and the
  d-fSPA denoising knobs `(N, Delta)` via elbow and CV diagnostics.
- `vignette("conformal-prediction")` — split, cross, and from-fit
  conformal bands around the target function.
- `vignette("wrapper-scalar")` — using `wrapper = mean` and other
  reductions to turn function-valued predictions into ATEs and
  related scalar estimands.
- `vignette("minmax-baseline")` — when to prefer the covariate-free
  worst-case-regret aggregator and how it compares to MetaHunt.
- Function-level references: `?metahunt`, `?dfspa`,
  `?fit_weight_model`, `?predict_target`.

## References

Shi, W., Imai, K., and Zhang, Y. (2024). *Privacy-preserving
meta-analysis through low-rank basis hunting.*

Zhang, Y., Huang, M., and Imai, K. (2024). *Minimax regret estimation
for generalizing heterogeneous treatment effects with multisite data.*
arXiv:2412.11136.