Enriching family objects: exponential family of distributions

Ioannis Kosmidis

2020-01-09

Introduction

The enrichwith R package provides the enrich method to enrich list-like R objects with new, relevant components. The resulting objects preserve their class, so all methods associated with them still apply.

This vignette is a demo of the available enrichment options for family objects, focusing on objects that correspond to members of the exponential family of distributions.

Exponential family and family objects

family objects specify characteristics of the models used by functions such as glm. The families implemented in the stats package include binomial, gaussian, Gamma, inverse.gaussian, and poisson, which obvious corresponding distributions. These distributions are all special cases of the exponential family of distributions with probability mass or density function of the form \[ f_{Y}(y) = \exp\left\{\frac{y \theta - b(\theta) - c_1(y)}{\phi/m} - \frac{1}{2}a\left(-\frac{m}{\phi}\right) + c_2(y) \right\} \] for some sufficiently smooth functions \(b(.)\), \(c_1(.)\), \(a(.)\) and \(c_2(.)\), and a fixed weight \(m\). The expected value and the variance of \(Y\) are then \[\begin{align*} E(Y) & = \mu = b'(\theta) \\ Var(Y) & = \frac{\phi}{m}b''(\theta) = \frac{\phi}{m}V(\mu) \end{align*}\] where \(V(\mu)\) and \(\phi\) are the variance function and the dispersion parameter, respectively. Below we list the characteristics of the distributions supported by family objects.

Normal with mean \(\mu\) and variance \(\phi/m\)

\(\theta = \mu\), \(\displaystyle b(\theta) = \frac{\theta^2}{2}\), \(\displaystyle c_1(y) = \frac{y^2}{2}\), \(\displaystyle a(\zeta) = -\log(-\zeta)\), \(\displaystyle c_2(y) = -\frac{1}{2}\log(2\pi)\)

Binomial with index \(m\) and probability \(\mu\)

\(\displaystyle \theta = \log\frac{\mu}{1- \mu}\), \(\displaystyle b(\theta) = \log(1 + e^\theta)\), \(\displaystyle \phi = 1\), \(\displaystyle c_1(y) = 0\), \(\displaystyle a(\zeta) = 0\), \(\displaystyle c_2(y) = \log{m\choose{my}}\)

Poisson with mean \(\mu\)

\(\displaystyle \theta = \log\mu\), \(\displaystyle b(\theta) = e^\theta\), \(\displaystyle \phi = 1\), \(\displaystyle c_1(y) = 0\), \(\displaystyle a(\zeta) = 0\), \(\displaystyle c_2(y) = -\log\Gamma(y + 1)\)

Gamma with mean \(\mu\) and shape \(1/\phi\)

\(\displaystyle \theta = -\frac{1}{\mu}\), \(\displaystyle b(\theta) = -\log(-\theta)\), \(\displaystyle c_1(y) = -\log y\), \(\displaystyle a(\zeta) = 2 \log \Gamma(-\zeta) + 2 \zeta \log\left(-\zeta\right)\), \(\displaystyle c_2(y) = -\log y\)

Inverse Gaussian with mean \(\mu\) and variance \(\phi\mu^3\)

\(\displaystyle \theta = -\frac{1}{2\mu^2}\), \(\displaystyle b(\theta) = -\sqrt{-2\theta}\), \(\displaystyle c_1(y) = \frac{1}{2y}\), \(\displaystyle a(\zeta) = -\log(-\zeta)\), \(\displaystyle c_2(y) = -\frac{1}{2}\log\left(\pi y^3\right)\)

Components in family objects

family objects provide functions for the variance function (variance), a specification of deviance residuals (dev.resids) and the Akaike information criterion (aic). For example

inverse.gaussian()$dev.resids
## function (y, mu, wt) 
## wt * ((y - mu)^2)/(y * mu^2)
## <bytecode: 0x7fbe690297d0>
## <environment: 0x7fbe6902f4c0>
inverse.gaussian()$variance
## function (mu) 
## mu^3
## <bytecode: 0x7fbe69029990>
## <environment: 0x7fbe5f0335b0>
inverse.gaussian()$aic
## function (y, n, mu, wt, dev) 
## sum(wt) * (log(dev/sum(wt) * 2 * pi) + 1) + 3 * sum(log(y) * 
##     wt) + 2
## <bytecode: 0x7fbe69029108>
## <environment: 0x7fbe5f08ea50>

Enrichment options for family objects

The enrichwith R package provides methods for the enrichment of family objects with a function that links the natural parameter \(\theta\) with \(\mu\), the function \(b(\theta)\), the first two derivatives of \(V(\mu)\), \(a(\zeta)\) and its first four derivatives, and \(c_1(y)\) and \(c_2(y)\). To illustrate, let’s write a function that reconstructs the densities and probability mass functions from the components that result from enrichment

library("enrichwith")
dens <- function(y, m = 1, mu, phi, family) {
    object <- enrich(family)
    with(object, {
        c2 <- if (family == "binomial") c2fun(y, m) else c2fun(y)
        exp(m * (y * theta(mu) - bfun(theta(mu)) - c1fun(y))/phi -
            0.5 * afun(-m/phi) + c2)
    })
}

The following chunks test dens for a few distributions against the standard d* functions

## Normal
all.equal(dens(y = 0.2, m = 3, mu = 1, phi = 3.22, gaussian()),
          dnorm(x = 0.2, mean = 1, sd = sqrt(3.22/3)))
## [1] TRUE
## Gamma
all.equal(dens(y = 3, m = 1.44, mu = 2.3, phi = 1.3, Gamma()),
          dgamma(x = 3, shape = 1.44/1.3, 1.44/(1.3 * 2.3)))
## [1] TRUE
## Inverse gaussian
all.equal(dens(y = 0.2, m = 7.23, mu = 1, phi = 3.22, inverse.gaussian()),
          SuppDists::dinvGauss(0.2, nu = 1, lambda = 7.23/3.22))
## [1] TRUE
## Binomial
all.equal(dens(y = 0.34, m = 100, mu = 0.32, phi = 1, binomial()),
          dbinom(x = 34, size = 100, prob = 0.32))
## [1] TRUE