Introduction to wstdiff

library(wstdiff)

Overview

The wstdiff package implements the Welch-Satterthwaite approximation for differences of non-standardized t-distributed random variables.

Background

While the classical Welch-Satterthwaite approximation handles combinations of sample variances, this package extends the framework to t-distribution differences, which arise in:

Univariate Example

Consider two independent t-distributed variables with different parameters:

# X1 ~ t(mu=0, sigma=1, nu=10)
# X2 ~ t(mu=0, sigma=1.5, nu=15)
result <- ws_tdiff_univariate(
  mu1 = 0, sigma1 = 1, nu1 = 10,
  mu2 = 0, sigma2 = 1.5, nu2 = 15
)

print(result)
#> Welch-Satterthwaite Approximation (Univariate)
#> ==============================================
#> Location (mu1 - mu2): 0.0000
#> Effective scale (sigma*): 1.9612
#> Effective df (nu*): 16.9421

# The difference Z = X1 - X2 is approximately:
# Z ~ t(mu_diff, sigma_star^2, nu_star)

Using Distribution Functions

Once we have the approximation, we can compute probabilities and quantiles:

# Probability that difference is negative
p_negative <- ptdiff(0, result)
print(paste("P(X1 - X2 < 0) =", round(p_negative, 4)))
#> [1] "P(X1 - X2 < 0) = 0.5"

# 95% confidence interval for the difference
ci_95 <- qtdiff(c(0.025, 0.975), result)
print(paste("95% CI:", round(ci_95[1], 3), "to", round(ci_95[2], 3)))
#> [1] "95% CI: -4.139 to 4.139"

# Generate random samples
samples <- rtdiff(1000, result)
hist(samples, breaks = 30, main = "Simulated Difference Distribution",
     xlab = "X1 - X2", probability = TRUE)

# Overlay theoretical density
x_seq <- seq(min(samples), max(samples), length.out = 100)
lines(x_seq, dtdiff(x_seq, result), col = "red", lwd = 2)

Multivariate Example

For multiple independent components:

result_multi <- ws_tdiff_multivariate_independent(
  mu1 = c(0, 1, 2),
  sigma1 = c(1, 1.5, 2),
  nu1 = c(10, 12, 15),
  mu2 = c(0, 0, 0),
  sigma2 = c(1.2, 1, 1.8),
  nu2 = c(15, 20, 25)
)

print(result_multi)
#> Welch-Satterthwaite Approximation (Multivariate Independent)
#> =============================================================
#> Location difference:
#> [1] 0 1 2
#> 
#> Effective scale:
#> [1] 1.706323 1.952207 2.852564
#> 
#> Effective df:
#> [1] 16.57649 14.69487 26.20081

Special Cases

The package handles special cases efficiently:

# When both distributions have identical parameters
result_equal <- ws_tdiff_equal_params(mu = 0, sigma = 1, nu = 10)
print(result_equal)
#> Welch-Satterthwaite Approximation (Univariate)
#> ==============================================
#> Location (mu1 - mu2): 0.0000
#> Effective scale (sigma*): 1.5811
#> Effective df (nu*): 12.0000

# Verify: nu_star should be 2*(nu - 4) = 12
stopifnot(result_equal$nu_star == 12)

Approximation Quality

The approximation quality depends on:

  1. Degrees of freedom: Higher values give better approximations
  2. Parameter similarity: More similar parameters improve accuracy
  3. Dimension: In multivariate cases, using a single nu* may be less accurate
# Low degrees of freedom - use with caution
result_low_df <- ws_tdiff_univariate(0, 1, 5, 0, 1, 6)
print(paste("Warning: nu_star =", round(result_low_df$nu_star, 2)))
#> [1] "Warning: nu_star = 2.57"

# High degrees of freedom - excellent approximation
result_high_df <- ws_tdiff_univariate(0, 1, 100, 0, 1.5, 150)
print(paste("Good: nu_star =", round(result_high_df$nu_star, 2)))
#> [1] "Good: nu_star = 234.49"

References

Yamaguchi, Y., Homma, G., Maruo, K., & Takeda, K. Welch-Satterthwaite Approximation for Difference of Non-Standardized t-Distributed Variables. (unpublished).