RMetalog

Isaac J. Faber

The R Metalog Distribution

This repo is a working project for an R package that generates functions for the metalog distribution. The metalog distribution is a highly flexible probability distribution that can be used to model data without traditional parameters. There is also a python implementation of this package which can be found here https://github.com/tjefferies/pymetalog.

Metalog Distribution Background

In economics, business, engineering, science and other fields, continuous uncertainties frequently arise that are not easily- or well-characterized by previously-named continuous probability distributions. Frequently, there is data available from measurements, assessments, derivations, simulations or other sources that characterize the range of an uncertainty. But the underlying process that generated this data is either unknown or fails to lend itself to convenient derivation of equations that appropriately characterize the probability density (PDF), cumulative (CDF) or quantile distribution functions.

The metalog distributions are a family of continuous univariate probability distributions that directly address this need. They can be used in most any situation in which CDF data is known and a flexible, simple, and easy-to-use continuous probability distribution is needed to represent that data. Consider their uses and benefits. Also consider their applications over a wide range of fields and data sources.

This repository is a complement and extension of the information found in the paper published in Decision Analysis and the website

Using the package

Install from CRAN:

install.packages('rmetalog')

or, install the dev package from this repository:

library(devtools)
install_github('isaacfab/rmetalog')

Once the package is loaded you start with a data set of continuous observations. For this repository, we will load the library and use an example of fish size measurements from the Pacific Northwest. This data set is illustrative to demonstrate the flexibility of the metalog distribution as it is bi-modal. The data is installed with the package.

library(rmetalog)
data("fishSize")
summary(fishSize)
#>     FishSize   
#>  Min.   : 3.0  
#>  1st Qu.: 7.0  
#>  Median :10.0  
#>  Mean   :10.2  
#>  3rd Qu.:12.0  
#>  Max.   :33.0

The base function for the package to create distributions is:

metalog()

This function takes several inputs:

x - vector of numeric data
term_limit - integer between 3 and 30, specifying the number of metalog distributions, with respective terms, terms to build (default: 13)
bounds - numeric vector specifying lower or upper bounds, none required if the distribution is unbounded
boundedness - character string specifying unbounded, semi-bounded upper, semi-bounded lower or bounded; accepts values u, su, sl and b (default: ‘u’)
term_lower_bound - (Optional) the smallest term to generate, used to minimize computation must be less than term_limit (default is 2)
step_len - (Optional) size of steps to summarize the distribution (between 0.001 and 0.01, which is between approx 1000 and 100 summarized points). This is only used if the data vector length is greater than 100.
probs - (Optional) probability quantiles, same length as x

Here is an example of a lower bounded distribution build.

my_metalog <- metalog(
  fishSize$FishSize,
  term_limit = 9,
  term_lower_bound = 2,
  bounds = c(0, 60),
  boundedness = 'b',
  step_len = 0.01
  )

The function returns an object of class rmetalog and list. You can get a summary of the distributions using summary.

summary(my_metalog)
#>  -----------------------------------------------
#>  Summary of Metalog Distribution Object
#>  -----------------------------------------------
#>  
#> Parameters
#>  Term Limit:  9 
#>  Term Lower Bound:  2 
#>  Boundedness:  b 
#>  Bounds (only used based on boundedness):  0 60 
#>  Step Length for Distribution Summary:  0.01 
#>  Method Use for Fitting:  any 
#>  
#> 
#>  Validation and Fit Method
#>  term valid method
#>     2   yes    OLS
#>     3   yes    OLS
#>     4   yes    OLS
#>     5   yes    OLS
#>     6   yes    OLS
#>     7   yes    OLS
#>     8   yes    OLS
#>     9   yes    OLS

You can also plot a quick visual comparison of the distributions by term.

plot(my_metalog)
#> $pdf

#> 
#> $cdf

Once the distributions are built, you can create n samples by selecting a term.

s <- rmetalog(my_metalog, n = 1000, term = 9)
hist(s)

You can also retrieve quantile, density, and probability values similar to other R distributions.

qmetalog(my_metalog, y = c(0.25, 0.5, 0.75), term = 9)
#> [1]  7.241  9.840 12.063

probabilities from a quantile.

pmetalog(my_metalog, q = c(3, 10, 25), term = 9)
#> [1] 0.001957 0.520058 0.992267

density from a quantile.

dmetalog(my_metalog, q = c(3, 10, 25), term = 9)
#> [1] 0.004490 0.126724 0.002264

As this package is under development, any feedback is appreciated! Please submit a pull request or issue if you find anything that needs to be addressed.