COVID-walkthrough

0. Load libraries, ArchR project, and annotation databases

Optionally filter the ArchR project to a subset of samples.

library(MOCHA)
library(ArchR)
library(TxDb.Hsapiens.UCSC.hg38.refGene) 
library(org.Hs.eg.db)
library(BSgenome.Hsapiens.UCSC.hg19)
# You should substitute this with your own ArchR project.
# You must have completed cell labeling with your ArchR project.

ArchRProj <- ArchR::loadArchRProject("/home/jupyter/FullCovid")

metadata <- data.table::as.data.table(ArchR::getCellColData(ArchRProj))
studySignal <- median(metadata$nFrags)

# Get metadata information at the sample level
lookup_table <- unique(
  metadata[, c(
    "Sample",
    "COVID_status",
    "Visit",
    "days_since_symptoms"
  ),
  with = FALSE
  ]
)

# Subset to visit 1 and extract samples
samplesToKeep <- lookup_table$Sample[
  lookup_table$Visit == "FH3 COVID-19 Visit 1" &
    lookup_table$days_since_symptoms <= 15 |
    is.na(lookup_table$days_since_symptoms)
]

# subset ArchR Project
idxSample <- BiocGenerics::which(ArchRProj$Sample %in% samplesToKeep)
cellsSample <- ArchRProj$cellNames[idxSample]
ArchRProj <- ArchRProj[cellsSample, ]

1. Setting Parameters

These should be set according to your ArchR project and investigative question.

For more details on each of these parameters, view the help pages for each function using ?callOpenTiles and ?getSampleTileMatrix

# Parameters for calling open tiles.
cellPopLabel <- "CellSubsets" 
cellPopulations <- c("CD16 Mono") 
numCores <- 20

2. Call open tiles

Get sample-tile matrices for all specified cell populations.

tileResults <- MOCHA::callOpenTiles(
  ArchRProj,
  cellPopLabel = cellPopLabel,
  cellPopulations = cellPopulations,
  TxDb = "TxDb.Hsapiens.UCSC.hg38.refGene",
  Org = "org.Hs.eg.db",
  numCores = numCores,
  studySignal = studySignal,
  outDir = tempdir()
)

3. Get consensus sample-tile matrices

…for all cell populations.

These matrices are organized by cell population RangedSummarizedExperiment object and are the primary input to downstream analyses. An advantage of MOCHA’s ability to call sample-specific open tiles is that we can determine a high-quality set of”consensus tiles determined as follows: each sample “votes” on whether a tile is open for that sample, and we keep tiles that are called open by a minimum percentage of samples. The minimum percentage of samples which a tile must be called in to be retained is controlled by threshold. groupColumn can be provided to specify a metadata column that separates your data by sample groups, e.g. if you have a case and control groups. Consensus tiles will be computed for each group, and the union of consensus tiles from each group are retained. This should be used where there are expected biological differences between the sample groups. Currently it is best utilized when each group has a similar size, as threshold will be applied to evenly each group. By default, groupColumn is null and all samples will be pooled to vote on consensus tiles.

# Computing the TSAM can take into account groupings of 
# samples when determining consensus tiles.
# Our samples can be grouped by the metadata column 'COVID_status'
# into 'Positive' and 'Negative' groups.
# Since these groupings may have unique biology and we expect differences
# in accessibility, we want to compute consensus tiles on each 
# group independently and take the union of consensus tiles from each group.
groupColumn <- "COVID_status" 

# We set the threshold to require a tile must be open in at least 
# (0.2 * the number of samples in each group) samples to be 
# retained
threshold <- 0.2

# Alternatively, you can set the threshold to 0 to keep the union of
# all samples' open tiles.
# This is equivalent to setting a threshold that would retain
# tiles that are open in at least one sample. 

SampleTileMatrices <- MOCHA::getSampleTileMatrix(
  tileResults,
  cellPopulations = "CD16 Mono",
  groupColumn = groupColumn,
  threshold = threshold,
  verbose = FALSE
)

4. (Optional) Add gene annotations

…and motifs to our SampleTileMatrices.

This info will aid further downstream analyses but is not required for differential accessibility nor co-accessibility.

# This function can also take any GRanges object
# and add annotations to its metadata.
SampleTileMatricesAnnotated <- MOCHA::annotateTiles(SampleTileMatrices)

# Load a curated motif set from library(chromVARmotifs)
# included with ArchR installation
data(human_pwms_v2)
SampleTileMatricesAnnotated <- MOCHA::addMotifSet(
  SampleTileMatricesAnnotated, 
  pwms = human_pwms_v2, 
  w = 7 # weight parameter for motifmatchr
)

5. (Optional) Plot a specific region’s coverage

Here we plot coverage at a specific region and gene by infection stage.

regionToPlot = "chr4:XXX-XXXX"

countSE <- MOCHA::extractRegion(
  SampleTileObj = SampleTileMatrices, 
  cellPopulations = "CD16 Mono", 
  region = regionToPlot, 
  groupColumn = "COVID_status", 
  numCores = numCores, 
  sampleSpecific = FALSE
)
dev.off()
pdf("ExamplePlot.pdf")
# Note that to show specific genes with the option' whichGene'
# you must have the package RMariaDB installed 
MOCHA::plotRegion(countSE = countSE, whichGene = "MYD88")
dev.off()

6. Get differential accessibility for specific cell populations

Here we are comparing MAIT cells between samples where our groupColumn “COVID_status” is Positive (our foreground) to Negative samples (our background).

cellPopulation <- "CD16 Mono"
groupColumn <- "COVID_status"
foreground <- "Positive"
background <- "Negative"

# Choose to output a GRanges or data.frame.
# Default is TRUE
outputGRanges <- TRUE

# Optional: Standard output will display the number of tiles found 
# below a false-discovery rate threshold.
# This parameter does not filter results and only affects the 
# afforementioned message. 
fdrToDisplay <- 0.2

differentials <- MOCHA::getDifferentialAccessibleTiles(
  SampleTileObj = SampleTileMatrices,
  cellPopulation = cellPopulation,
  groupColumn = groupColumn,
  foreground = foreground,
  background = background,
  fdrToDisplay = fdrToDisplay,
  outputGRanges = outputGRanges,
  numCores = numCores
)

# The output contains a GRanges with all tiles and their differential 
# test results. We can filter by FDR to get our set of 
# differentially accessible tiles:

res = head(plyranges::filter(differentials, seqnames =='chr4' & FDR < 0.2))