This vignette walks through the full SS-SVCQR workflow on the small
Lucas County housing subset shipped with the package
(lucas_housing_sample, \(n =
150\)). The aim is to make every step of the case study
reproducible without having to download external data: the model is the
same one used in the JSS software paper but on a subset rather than the
full \(n =
25{,}357\) panel. The replication script at the project root
reproduces the full-sample case study from
data/lucas_housing_clean.csv.
The hedonic interpretation is conventional: log sale price is regressed on global controls (log total living area, log lot size, sale-year indicators) and on candidate spatially varying age effects. Because the Lucas County region is geographically compact, longitude and latitude serve as a pragmatic spatial index for graph construction; for larger study regions, projected coordinates should be used.
library("sssvcqr")
data("lucas_housing_sample")
housing <- lucas_housing_sample
str(housing)
#> 'data.frame': 150 obs. of 8 variables:
#> $ log_price : num 11.9 11 11.4 11.1 11.3 ...
#> $ log_TLA : num 7.92 6.51 7.45 7.29 7.86 ...
#> $ log_lotsize: num 10.69 8.48 9.9 8.78 8.82 ...
#> $ age_scaled : num 0.09 0.56 0.76 0.63 0.9 0.71 0.99 0.63 1.07 0.49 ...
#> $ age2_scaled: num 0.0081 0.3136 0.5776 0.3969 0.81 ...
#> $ longitude : num -83.8 -83.5 -83.7 -83.6 -83.6 ...
#> $ latitude : num 41.7 41.7 41.6 41.6 41.7 ...
#> $ sale_year : int 1994 1997 1997 1997 1996 1995 1996 1996 1997 1998 ...
summary(housing[, c("log_price", "log_TLA", "log_lotsize", "age_scaled")])
#> log_price log_TLA log_lotsize age_scaled
#> Min. : 8.434 Min. :6.275 Min. : 7.601 Min. :0.0300
#> 1st Qu.:10.612 1st Qu.:7.020 1st Qu.: 8.434 1st Qu.:0.3700
#> Median :11.097 Median :7.207 Median : 8.716 Median :0.5650
#> Mean :10.989 Mean :7.217 Mean : 8.887 Mean :0.5703
#> 3rd Qu.:11.460 3rd Qu.:7.416 3rd Qu.: 9.113 3rd Qu.:0.7600
#> Max. :12.801 Max. :8.595 Max. :12.257 Max. :1.1700y, Z, X, and
coordinatesThe matrix interface separates always-global controls
(Z) from candidate spatially varying covariates
(X). The coordinates (u) drive both the
proximity graph and the location-indexed deviation fields.
It is good practice to query the graph before fitting, both to check
that the proximity structure is connected and to inspect the degree
distribution. build_graph_laplacian() returns the sparse
adjacency, the degree vector, the chosen Laplacian, and component
membership.
The fit below uses fixed penalties for speed; the JSS software paper tunes \((\lambda_1, \lambda_2)\) by spatially blocked cross-validation on the full sample. Tighter ADMM tolerances are appropriate for the moderate sample sizes encountered in this vignette.
fit <- ss_svcqr(
y = y, Z = Z, X = X, u = u,
tau = 0.5,
lambda1 = 3, lambda2 = 1, k_nn = 8,
control = list(max_iter = 100, warn_nonconvergence = FALSE)
)
summary(fit)
#> Sparse-smooth SVC quantile regression summary
#> n = 150 q = 4 p = 2 tau = 0.5
#> lambda1 = 3 lambda2 = 1
#> iterations = 70 converged = TRUE
#>
#> alpha:
#> [1] -67.52811362 0.92689808 0.11572866 0.03569393
#> beta_G:
#> [1] 0.2598358 -1.2339241
#> delta L2 norms:
#> [1] 0.5186381 0.0000000The deviation L2 norms make the global-versus-local decision explicit: a norm at exact zero means the group penalty has classified the corresponding candidate as global.
For a single candidate effect, the package’s plot()
method renders the local total coefficient surface over the first two
coordinate columns. Inverse-distance-weighted interpolation gives a
smooth visual summary; the observed locations are overlaid as small
reference marks.
For real analyses, the penalties should be tuned rather than fixed by hand. The example below evaluates a small grid with three spatial folds on this subset; empirical applications should use a broader grid and more iterations.
cv <- cv_ss_svcqr(
y = y, Z = Z, X = X, u = u,
tau = 0.5,
lambda1_seq = c(2, 3),
lambda2_seq = c(0.5, 1),
K_folds = 3, adaptive_weights = FALSE,
control = list(max_iter = 25, warn_nonconvergence = FALSE)
)
cv
#> Spatially blocked CV for SS-SVCQR
#> tau = 0.5
#> best lambda1 = 2
#> best lambda2 = 0.5
#> best mean check loss = 0.1643256kkt_sssvcqr() returns first-order optimality summaries
and the maximum violation of the per-component degree-weighted centering
constraints. Both should be small after a converged fit.
The predict() method returns fitted conditional
quantiles when given new \(Z\), \(X\), and \(u\) (type = "response", the
default), or local coefficient surfaces when called with
type = "coefficients". New-location deviations are
extrapolated by inverse-distance-weighted averaging of the \(k\) nearest training deviations.
?ss_svcqr,
?cv_ss_svcqr, ?build_graph_laplacian,
?kkt_sssvcqr, ?predict.sssvcqr) document each
argument and return value.