Interactive Manhattan, Q-Q, and Volcano Plots Using Plotly.js

Sahir Bhatnagar

2021-04-25

Author: Sahir Bhatnagar (sahir.bhatnagar@gmail.com)

Notes:


1 Introduction

Manhattan, Q-Q and volcano plots are popular graphical methods for visualizing results from high-dimensional data analysis such as a (epi)genome wide asssociation study (GWAS or EWAS), in which p-values, Z-scores, test statistics are plotted on a scatter plot against their genomic position. Manhattan plots are used for visualizing potential regions of interest in the genome that are associated with a phenotype. Q-Q plots tell us about the distributional assumptions of the observed test statistics. Volcano plots are the negative log10 p-values plotted against their effect size, odds ratio or log fold-change. They are used to identify clinically meaningful markers in genomic experiments, i.e., markers that are statistically significant and have an effect size greater than some threshold.

Interactive manhattan, Q-Q and volcano plots allow the inspection of specific value (e.g. rs number or gene name) by hovering the mouse over a point, as well as zooming into a region of the genome (e.g. a chromosome) by dragging a rectangle around the relevant area.

This pacakge creates interactive Q-Q, manhattan and volcano plots that are usable from the R console, the RStudio viewer pane, R Markdown documents, in Dash apps, Shiny apps, embeddable in websites and can be exported as .png files. By hovering the mouse over a point, you can see annotation information such as the SNP identifier and GENE name. You can also drag a rectangle to zoom in on a region of interest and then export the image as a .png file.

This work is based on the qqman R package and the plotly.js engine. It produces similar manhattan and Q-Q plots as the qqman::manhattan and qqman::qq functions; the main difference here is being able to interact with the plot, including extra annotation information and seamless integration with HTML.



2 Installation

You can install manhattanly from CRAN:

install.packages("manhattanly")

Alternatively, you can install the development version of manhattanly from GitHub with:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("sahirbhatnagar/manhattanly")



3 Quick Start

3.1 Manhattan plot

The manhattanly package ships with an example dataset called HapMap. See help(HapMap) for more details about how this dataset was created. Here is what the HapMap dataset looks like:

# load the manhattanly library
library(manhattanly)
## See example usage at http://sahirbhatnagar.com/manhattanly/
set.seed(12345)
HapMap.subset <- subset(HapMap, CHR %in% 4:7)
# for highlighting SNPs of interest
significantSNP <- sample(HapMap.subset$SNP, 20)
head(HapMap.subset)
##      CHR      BP         P        SNP ZSCORE EFFECTSIZE   GENE DISTANCE
## 3300   4  336758 0.7869011  rs6821220 0.2703    -0.1437 ZNF141        0
## 3301   4  992125 0.5116233  rs6855233 0.6563    -0.0128 FGFRL1     3634
## 3302   4 1155741 0.7977199   rs922697 0.2563    -0.1170  SPON2        0
## 3303   4 1302267 0.3590100 rs10025665 0.9173     0.1079   MAEA        0
## 3304   4 1388897 0.1824547 rs11731672 1.3332     0.1005 CRIPAK     9116
## 3305   4 1814818 0.4673265  rs6599401 0.7268     0.0625  LETM1        0
dim(HapMap.subset)
## [1] 3410    8

The required columns to create a manhattan plot are the chromosome, base-pair position and p-value. By default, the manhattanly function assumes these columns are named CHR, BP and P (but these can be specified by the user if they are different)

Create an interactive manhattan plot using one command:

manhattanly(HapMap.subset, snp = "SNP", gene = "GENE")

The arguments snp = "SNP" and gene = "GENE" specify that we want to add snp and gene information to each point. This information is found in the columns names "SNP" and "GENE" in the HapMap dataset. See help(manhattanly) for a full list of options.

3.2 Q-Q plot

Similarly, we can create an interactive Q-Q plot using one command (See help(qqly) for a full list of options):

qqly(HapMap.subset, snp = "SNP", gene = "GENE")

You can then save the plot as a .png file by clicking on the camera icon in the toolbar (which appears when you hover your mouse over it).

3.3 Volcano plot

You can also make a volcano plot which by default, highlights the points greater than the default genomewideline and effect_size_line arguments:

volcanoly(HapMap.subset, snp = "SNP", gene = "GENE", effect_size = "EFFECTSIZE")


3.4 Highlighting SNPs of Interest

We can also highlight SNPs of interest using the highlight argument. This package comes with a list of SNPs of interest called significantSNP (see help(significantSNP) for more details). To highlight these SNPs we simply pass this vector to the highlight argument (note that these SNPs need to be present in the "SNP" column of your data). We omit the plot here due to size constraints of the package on CRAN:

manhattanly(HapMap.subset, snp = "SNP", gene = "GENE", highlight = significantSNP)

3.5 More annotations

You can add up to 4 annotations. In the following manhattan plot we add the snp, gene, the distance to the nearest gene and the effect size. The same annotations can also be added to Q-Q and volcano plots:

manhattanly(HapMap.subset, snp = "SNP", gene = "GENE",
            annotation1 = "DISTANCE", annotation2 = "EFFECTSIZE",
            highlight = significantSNP)


3.6 Adding Text Annotations to the Plot

The annotations in the previous plots only appear when we hover the mouse over the point. Once we have identified a SNP, or a few SNPs of interest we want to explicitly show the annotation information and save the plot. The output of the manhattanly function is an object which can be further manipulated using the %>% operator from the magrittr package:

library(magrittr)

p <- manhattanly(HapMap.subset, snp = "SNP", gene = "GENE",
            annotation1 = "DISTANCE", annotation2 = "EFFECTSIZE",
            highlight = significantSNP)

# get the x and y coordinates from the pre-processed data
plotData <- manhattanr(HapMap.subset, snp = "SNP", gene = "GENE")[["data"]]

# annotate the smallest p-value
annotate <- plotData[which.min(plotData$P),]

# x and y coordinates of SNP with smallest p-value
xc <- annotate$pos
yc <- annotate$logp

p %>% plotly::layout(annotations = list(
  list(x = xc, y = yc,
       text = paste0(annotate$SNP,"<br>","GENE: ",annotate$GENE),
       font = list(family = "serif", size = 10))))

You can then save the plot as a .png file by clicking on the camera icon in the toolbar (which appears when you hover your mouse over it).


3.7 Sharing your Plots

By default, plotly for R runs locally in your web browser or in the R Studio viewer. It would be useful to be able to easily share these plot with for example, your collaborators or supervisor, especially in the during the exploratory data analysis stage of your project.

You can publish your charts to the web with plotly’s web service in three steps.

3.7.1 Step 1: Signup for a free plotly account

Create a free plotly account here. A plotly account is required to publish charts online. It’s free to get started, and you control the privacy of your charts.

3.7.2 Step 2: Save your authentication credentials

Find your authentication API keys in your online settings. Set them in your R session with:

Sys.setenv("plotly_username"="your_plotly_username")
Sys.setenv("plotly_api_key"="your_api_key")

3.7.3 Step 3: Publish your graphs to plotly with plotly::api_create()

library(plotly)
# p is the interactive manhattan plot we saved earlier
plotly::api_create(p, filename = "r-docs/manhattan", world_readable=TRUE)
  • filename sets the name of the file inside your online plotly account.
  • world_readable sets the privacy of your chart. If TRUE, the graph is publically viewable, if FALSE, only you can view it.



4 Dynamic Documents with knitr and R Markdown

R Markdown is a an R software package that allows the creation of dynamic documents, i.e., embed R code with text to create fully reproducible reports. Furthermore it allows easy creation of HTML reports without knowing how to code in HTML (such as this vignette). This means you can embed interactive manhattan, qq and volcano plots in HTML reports using the manhattanly package. For example, to embed the above manhattan plot I included the following code chunk in the .Rmd document:

```{r}
library(plotly)
manhattanly(subset(HapMap, CHR %in% 4:7), snp = "SNP", gene = "GENE")
```

5 Details

5.1 Data Pre-Processing

The manhattanly package splits up the data pre-processing from the rendering of the plot object (inspired by the heatmaply package. These are done by the manhattanr, qqr and volcanor functions:

# create an object of class `manhattanr`
manhattanrObject <- manhattanr(HapMap)

# whats in there
str(manhattanrObject)
## List of 10
##  $ data           :'data.frame': 14412 obs. of  6 variables:
##   ..$ CHR  : int [1:14412] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ BP   : int [1:14412] 937641 1136887 2116240 2310562 2681715 2917484 2942700 3298358 3501155 3676178 ...
##   ..$ P    : num [1:14412] 0.335 0.246 0.823 0.493 0.605 ...
##   ..$ logp : num [1:14412] 0.4745 0.6093 0.0844 0.307 0.218 ...
##   ..$ pos  : num [1:14412] 937641 1136887 2116240 2310562 2681715 ...
##   ..$ index: num [1:14412] 1 1 1 1 1 1 1 1 1 1 ...
##  $ xlabel         : chr "Chromosome"
##  $ ticks          : num [1:23] 1.24e+08 3.68e+08 5.88e+08 7.83e+08 9.68e+08 ...
##  $ labs           : int [1:23] 1 2 3 4 5 6 7 8 9 10 ...
##  $ nchr           : int 23
##  $ pName          : chr "P"
##  $ snpName        : logi NA
##  $ geneName       : logi NA
##  $ annotation1Name: logi NA
##  $ annotation2Name: logi NA
##  - attr(*, "class")= chr "manhattanr"
# the data used for plotting is a data.frame
# this data.frame can be used with any graphics function such as in base R
# you do not need to use plotly
head(manhattanrObject[["data"]])
##   CHR      BP         P       logp     pos index
## 1   1  937641 0.3353438 0.47450973  937641     1
## 2   1 1136887 0.2458571 0.60931719 1136887     1
## 3   1 2116240 0.8232859 0.08444933 2116240     1
## 4   1 2310562 0.4932038 0.30697357 2310562     1
## 5   1 2681715 0.6053916 0.21796358 2681715     1
## 6   1 2917484 0.1944431 0.71120743 2917484     1
is.data.frame(manhattanrObject[["data"]])
## [1] TRUE

This manhattanrObject which is of class manhattanr can also be passed to the manhattanly function:

manhattanly(manhattanrObject)

5.2 More annotations

We can specify more annotations in the data using the snp, gene, annotation1 and annotation2 arguments:

# create an object of class `manhattanr`
manhattanrObject <- manhattanr(HapMap, snp = "SNP", gene = "GENE",
            annotation1 = "DISTANCE", annotation2 = "EFFECTSIZE")

# the annotation columns are now part of the data.frame
head(manhattanrObject[["data"]])
##   CHR      BP         P        SNP    GENE DISTANCE EFFECTSIZE       logp
## 1   1  937641 0.3353438  rs9697358   ISG15     1068    -0.0946 0.47450973
## 2   1 1136887 0.2458571 rs34945898 TNFRSF4        0    -0.0947 0.60931719
## 3   1 2116240 0.8232859 rs12034613  FP7162        0    -0.0741 0.08444933
## 4   1 2310562 0.4932038  rs4648633   MORN1        0     0.0146 0.30697357
## 5   1 2681715 0.6053916  rs4430271   MMEL1   127427     0.1234 0.21796358
## 6   1 2917484 0.1944431  rs6685625  ACTRT2    10421     0.1979 0.71120743
##       pos index
## 1  937641     1
## 2 1136887     1
## 3 2116240     1
## 4 2310562     1
## 5 2681715     1
## 6 2917484     1
is.data.frame(manhattanrObject[["data"]])
## [1] TRUE

5.3 Q-Q plots

Similarly the data used for the Q-Q plot can be created using the qqr function:

qqrObject <- qqr(HapMap)
str(qqrObject)
## List of 6
##  $ data           :'data.frame': 14412 obs. of  3 variables:
##   ..$ P       : num [1:14412] 6.75e-10 3.41e-09 3.95e-09 4.71e-09 5.02e-09 ...
##   ..$ OBSERVED: num [1:14412] 9.17 8.47 8.4 8.33 8.3 ...
##   ..$ EXPECTED: num [1:14412] 4.46 3.98 3.76 3.61 3.51 ...
##  $ pName          : chr "P"
##  $ snpName        : logi NA
##  $ geneName       : logi NA
##  $ annotation1Name: logi NA
##  $ annotation2Name: logi NA
##  - attr(*, "class")= chr "qqr"
head(qqrObject[["data"]])
##                P OBSERVED EXPECTED
## 4346 6.75010e-10 9.170690 4.459754
## 4347 3.41101e-09 8.467117 3.982633
## 4344 3.95101e-09 8.403292 3.760784
## 4338 4.70701e-09 8.327255 3.614656
## 4342 5.02201e-09 8.299122 3.505512
## 4341 6.22801e-09 8.205651 3.418362

This qqrObject which is of class qqr can also be passed to the qqly function (we omit the plot here for the sake of size of the rendered vignette):

qqly(qqrObject)

5.4 Volcano plots

Similarly the data used for the volcanor plot can be created using the volcanor function:

volcanorObject <- volcanor(HapMap, 
                        p = "P",
                        effect_size = "EFFECTSIZE",
                        snp = "SNP",
                        gene = "GENE")
str(volcanorObject)
## List of 8
##  $ data           :'data.frame': 14412 obs. of  5 variables:
##   ..$ EFFECTSIZE: num [1:14412] -0.0946 -0.0947 -0.0741 0.0146 0.1234 ...
##   ..$ P         : num [1:14412] 0.335 0.246 0.823 0.493 0.605 ...
##   ..$ SNP       : chr [1:14412] "rs9697358" "rs34945898" "rs12034613" "rs4648633" ...
##   ..$ GENE      : chr [1:14412] "ISG15" "TNFRSF4" "FP7162" "MORN1" ...
##   ..$ LOG10P    : num [1:14412] 0.4745 0.6093 0.0844 0.307 0.218 ...
##  $ pName          : chr "P"
##  $ effectName     : chr "EFFECTSIZE"
##  $ xlabel         : chr "EFFECTSIZE"
##  $ snpName        : chr "SNP"
##  $ geneName       : chr "GENE"
##  $ annotation1Name: logi NA
##  $ annotation2Name: logi NA
##  - attr(*, "class")= chr "volcanor"
head(volcanorObject[["data"]])
##   EFFECTSIZE         P        SNP    GENE     LOG10P
## 1    -0.0946 0.3353438  rs9697358   ISG15 0.47450973
## 2    -0.0947 0.2458571 rs34945898 TNFRSF4 0.60931719
## 3    -0.0741 0.8232859 rs12034613  FP7162 0.08444933
## 4     0.0146 0.4932038  rs4648633   MORN1 0.30697357
## 5     0.1234 0.6053916  rs4430271   MMEL1 0.21796358
## 6     0.1979 0.1944431  rs6685625  ACTRT2 0.71120743

This can be directly passed to the volcanoly function, with additional arguments:

volcanoly(volcanorObject, effect_size_line = c(-1, 0.5))