Lucas County Housing Example

Scope

This vignette walks through the full SS-SVCQR workflow on the small Lucas County housing subset shipped with the package (lucas_housing_sample, \(n = 150\)). The aim is to make every step of the case study reproducible without having to download external data: the model is the same one used in the JSS software paper but on a subset rather than the full \(n = 25{,}357\) panel. The replication script at the project root reproduces the full-sample case study from data/lucas_housing_clean.csv.

The hedonic interpretation is conventional: log sale price is regressed on global controls (log total living area, log lot size, sale-year indicators) and on candidate spatially varying age effects. Because the Lucas County region is geographically compact, longitude and latitude serve as a pragmatic spatial index for graph construction; for larger study regions, projected coordinates should be used.

Loading the package subset

library("sssvcqr")
data("lucas_housing_sample")
housing <- lucas_housing_sample
str(housing)
#> 'data.frame':    150 obs. of  8 variables:
#>  $ log_price  : num  11.9 11 11.4 11.1 11.3 ...
#>  $ log_TLA    : num  7.92 6.51 7.45 7.29 7.86 ...
#>  $ log_lotsize: num  10.69 8.48 9.9 8.78 8.82 ...
#>  $ age_scaled : num  0.09 0.56 0.76 0.63 0.9 0.71 0.99 0.63 1.07 0.49 ...
#>  $ age2_scaled: num  0.0081 0.3136 0.5776 0.3969 0.81 ...
#>  $ longitude  : num  -83.8 -83.5 -83.7 -83.6 -83.6 ...
#>  $ latitude   : num  41.7 41.7 41.6 41.6 41.7 ...
#>  $ sale_year  : int  1994 1997 1997 1997 1996 1995 1996 1996 1997 1998 ...
summary(housing[, c("log_price", "log_TLA", "log_lotsize", "age_scaled")])
#>    log_price         log_TLA       log_lotsize       age_scaled    
#>  Min.   : 8.434   Min.   :6.275   Min.   : 7.601   Min.   :0.0300  
#>  1st Qu.:10.612   1st Qu.:7.020   1st Qu.: 8.434   1st Qu.:0.3700  
#>  Median :11.097   Median :7.207   Median : 8.716   Median :0.5650  
#>  Mean   :10.989   Mean   :7.217   Mean   : 8.887   Mean   :0.5703  
#>  3rd Qu.:11.460   3rd Qu.:7.416   3rd Qu.: 9.113   3rd Qu.:0.7600  
#>  Max.   :12.801   Max.   :8.595   Max.   :12.257   Max.   :1.1700

Assembling `y`, `Z`, `X`, and coordinates

The matrix interface separates always-global controls (Z) from candidate spatially varying covariates (X). The coordinates (u) drive both the proximity graph and the location-indexed deviation fields.

y <- housing$log_price
Z <- model.matrix(~ log_TLA + log_lotsize + sale_year, data = housing)
X <- as.matrix(housing[, c("age_scaled", "age2_scaled")])
u <- scale(as.matrix(housing[, c("longitude", "latitude")]))
c(n = nrow(housing), q = ncol(Z), p = ncol(X))
#>   n   q   p 
#> 150   4   2

Inspecting graph choices

It is good practice to query the graph before fitting, both to check that the proximity structure is connected and to inspect the degree distribution. build_graph_laplacian() returns the sparse adjacency, the degree vector, the chosen Laplacian, and component membership.

graph <- build_graph_laplacian(u, k = 8L)
length(graph$components_list)
#> [1] 1
range(graph$D_vec)
#> [1] 1.678900e-21 8.446148e+00

Fitting SS-SVCQR

The fit below uses fixed penalties for speed; the JSS software paper tunes \((\lambda_1, \lambda_2)\) by spatially blocked cross-validation on the full sample. Tighter ADMM tolerances are appropriate for the moderate sample sizes encountered in this vignette.

fit <- ss_svcqr(
  y = y, Z = Z, X = X, u = u,
  tau = 0.5,
  lambda1 = 3, lambda2 = 1, k_nn = 8,
  control = list(max_iter = 100, warn_nonconvergence = FALSE)
)
summary(fit)
#> Sparse-smooth SVC quantile regression summary
#>   n = 150  q = 4  p = 2  tau = 0.5 
#>   lambda1 = 3  lambda2 = 1 
#>   iterations = 70  converged = TRUE 
#> 
#> alpha:
#> [1] -67.52811362   0.92689808   0.11572866   0.03569393
#> beta_G:
#> [1]  0.2598358 -1.2339241
#> delta L2 norms:
#> [1] 0.5186381 0.0000000

The deviation L2 norms make the global-versus-local decision explicit: a norm at exact zero means the group penalty has classified the corresponding candidate as global.

Visualizing the fitted local coefficient

For a single candidate effect, the package’s plot() method renders the local total coefficient surface over the first two coordinate columns. Inverse-distance-weighted interpolation gives a smooth visual summary; the observed locations are overlaid as small reference marks.

plot(fit, type = "coefficient", index = 1)

Spatially blocked cross-validation

For real analyses, the penalties should be tuned rather than fixed by hand. The example below evaluates a small grid with three spatial folds on this subset; empirical applications should use a broader grid and more iterations.

cv <- cv_ss_svcqr(
  y = y, Z = Z, X = X, u = u,
  tau = 0.5,
  lambda1_seq = c(2, 3),
  lambda2_seq = c(0.5, 1),
  K_folds = 3, adaptive_weights = FALSE,
  control = list(max_iter = 25, warn_nonconvergence = FALSE)
)
cv
#> Spatially blocked CV for SS-SVCQR
#>   tau = 0.5 
#>   best lambda1 = 2 
#>   best lambda2 = 0.5 
#>   best mean check loss = 0.1643256

KKT diagnostics

kkt_sssvcqr() returns first-order optimality summaries and the maximum violation of the per-component degree-weighted centering constraints. Both should be small after a converged fit.

kkt <- kkt_sssvcqr(y, Z, X, fit)
signif(kkt$max_violation, 3)
#> [1] 2000
signif(kkt$max_centering_violation, 3)
#> [1] 4.16e-16

Predicting at new locations

The predict() method returns fitted conditional quantiles when given new \(Z\), \(X\), and \(u\) (type = "response", the default), or local coefficient surfaces when called with type = "coefficients". New-location deviations are extrapolated by inverse-distance-weighted averaging of the \(k\) nearest training deviations.

unew <- u[1:3, , drop = FALSE]
round(predict(fit,
  Znew = Z[1:3, , drop = FALSE],
  Xnew = X[1:3, , drop = FALSE],
  unew = unew), 3)
#> [1] 12.235 10.549 11.320
round(predict(fit, type = "coefficients")[1:3, ], 3)
#>       [,1]   [,2]
#> [1,] 0.254 -1.234
#> [2,] 0.300 -1.234
#> [3,] 0.306 -1.234

Where to look next

The companion vignette Getting started with sssvcqr covers the synthetic example with full prediction, plotting, CV, and diagnostic calls.
The replication directory at the project root reproduces the full-sample Lucas County case study used in the JSS software paper, including the all-six-coefficients map and the multi-quantile age-coefficient comparison.
Function help pages (?ss_svcqr, ?cv_ss_svcqr, ?build_graph_laplacian, ?kkt_sssvcqr, ?predict.sssvcqr) document each argument and return value.