---
title: "Getting Started with sssvcqr"
author: "Houjian Hou"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Getting Started with sssvcqr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>",
  fig.width = 7, fig.height = 5, fig.align = "center")
options(prompt = "R> ", continue = "+  ", width = 80, useFancyQuotes = FALSE)
```

## What this vignette covers

This vignette walks through the core workflow of `sssvcqr` on a small
synthetic example: simulating a global--local quantile regression problem,
fitting the SS-SVCQR estimator, inspecting the global-versus-local
selection, querying the fitted object through the standard R model methods,
predicting at new locations, tuning the penalties with spatially blocked
cross-validation, and reading the first-order KKT diagnostics.

**The model.** SS-SVCQR fits the conditional quantile

$$
Q_\tau(y_i \mid z_i, x_i, u_i)
= z_i^\top \alpha + \sum_{j=1}^p x_{ij}\{\beta_{G,j} + \delta_j(u_i)\},
$$

with global controls `Z`, candidate spatially varying covariates `X`,
coordinates `u`, a group penalty that may shrink an entire deviation
vector $\delta_j$ exactly to zero, and a graph-Laplacian smoothness
penalty over a $k$-nearest-neighbor proximity graph. The companion JSS
software paper covers the algorithm and case study in detail; this
vignette is the short interactive walkthrough.

## Simulating data with known structure

`simulate_sssvcqr_data()` returns synthetic data on the unit square with
a known true active set, true global-baseline coefficients, and known
deviation fields. The default `p = 3` design has `x1` and `x3` active and
`x2` globally constant.

```{r simulate}
library("sssvcqr")
dat <- simulate_sssvcqr_data(n = 120, q = 2, p = 3, seed = 20260505)
str(dat[c("y", "Z", "X", "u", "active", "alpha_true", "beta_G_true")],
  max.level = 1)
dat$active
```

## Fitting the model

`ss_svcqr()` takes an explicit matrix interface so that the analyst's
`Z`/`X` partition is unambiguous. The penalties `lambda1` and `lambda2`
control sparsity (group selection) and graph smoothness, respectively.

```{r fit}
fit <- ss_svcqr(
  y = dat$y, Z = dat$Z, X = dat$X, u = dat$u,
  tau = 0.5,
  lambda1 = 5, lambda2 = 0.1, k_nn = 8,
  control = list(max_iter = 180, warn_nonconvergence = FALSE)
)
fit
```

## Selection and global--local summary

The fitted summary prints the model size, tuning parameters, ADMM
convergence status, the global block, and the L2 norms of each estimated
deviation field. The deviation norms are the visual signature of
global-versus-local selection: the second deviation field is exactly zero
because the group penalty has correctly identified `x2` as global.

```{r summary}
summary(fit)
selection_recovery_table(fit, dat)
```

## Inspecting fitted values, residuals, and coefficients

Standard R methods are wired through the `sssvcqr` class so that fitted
values and residuals can be extracted directly, and `coef()` returns a
list with the global block and the deviation matrix.

```{r methods}
round(unlist(coef(fit)[1:2]), 4)
head(round(fitted(fit), 4))
head(round(residuals(fit), 4))
```

## Visualizing the fit

The `plot()` method renders the fitted deviation field, the local total
coefficient surface, residuals, or ADMM convergence traces. By default the
spatial plots use an inverse-distance-weighted gridded surface over the
first two coordinate columns, with a diverging color scale.

```{r plots, fig.width = 7, fig.height = 6}
plot(fit, type = "deviation", index = 1)
plot(fit, type = "coefficient", index = 1)
plot(fit, type = "convergence")
```

## Predicting at new locations

`predict()` either returns fitted conditional quantiles (the default) or
local coefficient surfaces (`type = "coefficients"`). For new locations
the deviation is extrapolated by inverse-distance-weighted averaging of
the $k$ nearest training deviations.

```{r predict}
unew <- dat$u[1:3, , drop = FALSE] + 0.01
round(predict(fit,
  Znew = dat$Z[1:3, , drop = FALSE],
  Xnew = dat$X[1:3, , drop = FALSE],
  unew = unew), 4)
round(predict(fit, type = "coefficients")[1:3, ], 4)
```

## Tuning by spatially blocked cross-validation

`cv_ss_svcqr()` evaluates a $(\lambda_1,\lambda_2)$ grid using spatial
folds (k-means clusters on the coordinate matrix by default), so that
nearby observations are not split between training and validation. The
example below uses a small grid and three folds for speed; empirical
applications should use a broader grid and more iterations.

```{r cv}
dat_cv <- simulate_sssvcqr_data(n = 80, q = 2, p = 3, seed = 20260505)
cv <- cv_ss_svcqr(
  y = dat_cv$y, Z = dat_cv$Z, X = dat_cv$X, u = dat_cv$u,
  tau = 0.5,
  lambda1_seq = c(1, 2),
  lambda2_seq = c(0.5, 1),
  K_folds = 3, adaptive_weights = FALSE,
  control = list(max_iter = 25, warn_nonconvergence = FALSE)
)
cv
cv$best
```

## First-order KKT diagnostics

`kkt_sssvcqr()` reports the first-order conditions of the SS-SVCQR
objective: gradient norms for the unpenalized global block, stationarity
residuals for nonzero deviation groups, zero-group margins for groups
shrunk to zero, and the maximum violation of the per-component
degree-weighted centering constraints. Centering violations should be at
the level of solver round-off when the fit has converged.

```{r kkt}
kkt <- kkt_sssvcqr(dat$y, dat$Z, dat$X, fit)
signif(kkt$max_violation, 3)
signif(kkt$max_centering_violation, 3)
```

## Where to look next

- The companion vignette *Lucas County housing example* mirrors the case
  study in the JSS software paper.
- The replication directory at the project root reproduces every numerical
  table and figure in the JSS manuscript.
- The function help pages (`?ss_svcqr`, `?cv_ss_svcqr`,
  `?build_graph_laplacian`, `?kkt_sssvcqr`) document each argument and
  return value in full.
