minerva

R package for Maximal Information-Based Nonparametric Exploration computation

Minepy Homepage
Minepy Github
Mictools Github

Install

Latest cran release

install.packages("minerva")

Development version

devtools::install_github('filosi/minerva')

Usage

Basic usage with helper function mine.

library(minerva)

x <- 0:200 / 200
y <- sin(10 * pi * x) + x
mine(x,y, n.cores=1)

Compute a single measure from the MINE suite using mine_stat.
- Available mesures are: mic, mas, mev, mcn, tic, gmic

x <- 0:200 / 200
y <- sin(10 * pi * x) + x
mine_stat(x, y, measure="mic")

To compute the mic-r2 measure use the cor R function:

x <- 0:200 / 200
y <- sin(10 * pi * x) + x

r2 <- cor(x, y)
mm <- mine_stat(x, y, measure="mic")
mm - r2**2

## mine(x, y, n.cores=1)[[5]]

Compute statistic on matrices

All features in a single matrix (mine_compute_pstat).
All possible combination of features between two matrices (mine_compute_cstat).
- When comparing two matrices the function check for euquality of number of rows of the two matrices. If the matrices have different number of rows then an error is thrown.

x <- matrix(rnorm(1000), ncol=10, nrow=10)
y <- as.matrix(rnorm(1000), ncol=10, nrow=20)

## Compare feature of the same matrix
pstats(x)

## Compare features of matrix x with feature in matrix y
cstats(x, y)

Mictools pipeline

This is inspired to the original implementation by Albanese et al. available in python here: https://github.com/minepy/mictools.

Reading the data from mictool repository

datasaurus <- read.table("https://raw.githubusercontent.com/minepy/mictools/master/examples/datasaurus.txt", 
header=TRUE, row.names=1, as.is=TRUE, stringsAsFactors=FALSE)
datasaurus.m <- t(datasaurus)

Compute null distribution for `tic_e`

Automatically compute:

tic_e null distribution based on permutations.
histogram of the distribution with cumulative distribution.
Observed values of tic_e for each pair of variable in datasaurus.
Observed distribution of tic_e.
P-value for each variable pair association.

ticnull <- mictools(datasaurus.m, nperm=10000, seed=1234)

## Get the names of the named list
names(ticnull)
##[1]  "tic"      "nulldist" "obstic"   "obsdist"  "pval"

Null Distribution

ticnull$nulldist

BinStart	BinEnd	NullCount	NullCumSum
0e+00	1e-04	0	1e+05
1e-04	2e-04	0	1e+05
2e-04	3e-04	0	1e+05
3e-04	4e-04	0	1e+05
4e-04	5e-04	0	1e+05
5e-04	6e-04	0	1e+05
…	…	….	….

Observed distribution

ticnull$obsdist

BinStart	BinEnd	Count	CountCum
0e+00	1e-04	0	325
1e-04	2e-04	0	325
2e-04	3e-04	0	325
3e-04	4e-04	0	325
4e-04	5e-04	0	325
5e-04	6e-04	0	325
…	…	….	….

Plot tic_e and pvalue distribution.

hist(ticnull$tic)

hist(ticenull$pval, breaks=50, freq=FALSE)

Use p.adjust.method to use a different pvalue correction method, or use the qvalue package to use Storey’s qvalue.

## Correct pvalues using qvalue
qobj <- qvalue(ticnull$pval$pval)

## Add column in the pval data.frame
ticnull$pval$qvalue <- qobj$qvalue
ticnull$pval

Same table as above with the qvalue column added at the end.

pval	I1	I2	Var1	Var2	adj.P.Val	qvalue
0.5202	1	2	away_x	bullseye_x	0.95	1
0.9533	1	3	away_x	circle_x	0.99	1
0.0442	1	4	away_x	dino_x	0.52	0
0.6219	1	5	away_x	dots_x	0.95	1
0.8922	1	6	away_x	h_lines_x	0.98	1
0.3972	1	7	away_x	high_lines_x	0.91	1
…	…	…	…	…	…	….

Strenght of the association (MIC)

## Use columns of indexes and FDR adjusted pvalue 
micres <- mic_strength(datasaurus.m, ticnull$pval, pval.col=c(6, 2, 3))

TicePval	MIC	I1	I2
0.0457	0.42	2	15
0.0000	0.63	3	16
0.0196	0.50	5	18
0.0162	0.36	9	22
0.0000	0.63	10	23
0.0000	0.57	13	26
…	…	…	…

Association strength computed based on the qvalue adjusted pvalue

## Use qvalue adjusted pvalue 
micresq <- mic_strength(datasaurus.m, ticnull$pval, pval.col=c("qvalue", "Var1", "Var2"))

TicePval	MIC	I1	I2
0.0401	0.42	bullseye_x	bullseye_y
0.0000	0.63	circle_x	circle_y
0.0172	0.50	dots_x	dots_y
0.0143	0.36	slant_up_x	slant_up_y
0.0000	0.63	star_x	star_y
0.0000	0.57	x_shape_x	x_shape_y
…	…	…	…

Citing minepy/minerva and mictools

minepy2013	Davide Albanese, Michele Filosi, Roberto Visintainer, Samantha Riccadonna, Giuseppe Jurman and Cesare Furlanello. minerva and minepy:a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics (2013) 29(3): 407-408 first published online December 14, 2012
mictools2018	Davide Albanese, Samantha Riccadonna, Claudio Donati, Pietro Franceschi. A practical tool for maximal information coefficient analysis. GigaScience (2018)