The MMD copula package: robust estimation of parametric copula models by MMD minimization

library(MMDCopula)
library(VineCopula)
set.seed(1)

Simulation of a synthetic dataset

We simulate 500 points from a Gaussian copula with parameter \(0.5\). From this data, the parameter of the Gaussian copula is estimated, by Canonical Maximum Likelihood Estimation (MLE) and by Maximum Mean Discrepancy Minimization (MMD).


my_data = BiCopSim(N = 500, family = 1, par = 0.5)

estimator_MLE = BiCopEst(u1 = my_data[,1], u2 = my_data[,2],
                         family = 1, method = "mle")

estimator_MMD = BiCopEstMMD(u1 = my_data[,1], u2 = my_data[,2], family = 1)

print(estimator_MLE)
#> Bivariate copula: Gaussian (par = 0.5, tau = 0.33)
print(estimator_MMD)
#> Bivariate copula: Gaussian (par = 0.49, tau = 0.33)

Both estimators are close to the true value, and the MLE estimator is slightly closer.

Adding contamination

We now add a contamination on 20 points of the dataset that are replaced by outliers in the top-left corner of the unit square \([0 , 0.001] \times [0.999, 1]\).


my_data_contam = my_data

number_outliers = 20
q = 0.001
my_data_contam[1:number_outliers, 1] = runif(n = number_outliers, min = 0, max = q)
my_data_contam[1:number_outliers, 2] = runif(n = number_outliers, min = 1-q, max = 1)

estimator_MLE = BiCopEst(u1 = my_data_contam[,1], u2 = my_data_contam[,2],
                         family = 1, method = "mle")

estimator_MMD = BiCopEstMMD(u1 = my_data_contam[,1], u2 = my_data_contam[,2], family = 1)
print(estimator_MLE)
#> Bivariate copula: Gaussian (par = 0.03, tau = 0.02)
print(estimator_MMD)
#> Bivariate copula: Gaussian (par = 0.46, tau = 0.3)

We can see that now the MLE estimator is very far away from the true value while the MMD estimators is still near the true value (even if it is farther away than in the uncontaminated case).

Constructing confidence intervals for the MMD estimator

We now compare the confidence intervals for the estimated parameter in the contaminated and in the non-contaminated case. These intervals are constructed using either the nonparametric bootstrap or using subsampling.


# With the nonparametric bootstrap
CI_bootstrap = BiCopConfIntMMD(x1 = my_data[,1], x2 = my_data[,2],
                              family = 1, level = 0.95)

# With the subsampling
CI_subsampling = BiCopConfIntMMD(x1 = my_data[,1], x2 = my_data[,2],
                              family = 1, level = 0.95, subsamplingSize = 100)

print(CI_bootstrap$CI.Par)
#>       2.5      97.5 
#> 0.4081262 0.5701880
print(CI_subsampling$CI.Par)
#>       2.5      97.5 
#> 0.3526828 0.7005315