| Title: | Cell Ranger Output Filtering and Metrics Visualization |
| Version: | 0.3.2 |
| Description: | Sample and cell filtering as well as visualisation of output metrics from 'Cell Ranger' by Grace X.Y. Zheng et al. (2017) <doi:10.1038/ncomms14049>. 'CRMetrics' allows for easy plotting of output metrics across multiple samples as well as comparative plots including statistical assessments of these. 'CRMetrics' allows for easy removal of ambient RNA using 'SoupX' by Matthew D Young and Sam Behjati (2020) <doi:10.1093/gigascience/giaa151> or 'CellBender' by Stephen J Fleming et al. (2022) <doi:10.1101/791699>. Furthermore, it is possible to preprocess data using 'Pagoda2' by Nikolas Barkas et al. (2021) https://github.com/kharchenkolab/pagoda2 or 'Seurat' by Yuhan Hao et al. (2021) <doi:10.1016/j.cell.2021.04.048> followed by embedding of cells using 'Conos' by Nikolas Barkas et al. (2019) <doi:10.1038/s41592-019-0466-z>. Finally, doublets can be detected using 'scrublet' by Samuel L. Wolock et al. (2019) <doi:10.1016/j.cels.2018.11.005> or 'DoubletDetection' by Gayoso et al. (2020) <doi:10.5281/zenodo.2678041>. In the end, cells are filtered based on user input for use in downstream applications. |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.0.0) |
| Imports: | cowplot, dplyr, ggbeeswarm, ggplot2, ggpmisc, ggpubr, ggrepel, magrittr, Matrix, methods, R6, scales, sccore, sparseMatrixStats, stats, tibble, tidyr, utils |
| Suggests: | conos, data.table, markdown, pagoda2, reticulate, rhdf5, Seurat, SoupX, testthat (≥ 3.0.0) |
| RoxygenNote: | 7.3.1 |
| URL: | https://github.com/khodosevichlab/CRMetrics |
| BugReports: | https://github.com/khodosevichlab/CRMetrics/issues |
| Maintainer: | Rasmus Rydbirk <rrydbirk@outlook.dk> |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2024-11-07 19:24:09 UTC; ucloud |
| Author: | Rasmus Rydbirk [aut, cre], Fabienne Kick [aut], Henrietta Holze [aut], Xian Xin [ctb] |
| Repository: | CRAN |
| Date/Publication: | 2024-11-08 00:20:06 UTC |
CRMetrics class object
Description
Functions to analyze Cell Ranger count data. To initialize a new object, 'data.path' or 'cms' is needed. 'metadata' is also recommended, but not required.
Public fields
metadatadata.frame or character Path to metadata file or name of metadata data.frame object. Metadata must contain a column named 'sample' containing sample names that must match folder names in 'data.path' (default = NULL)
data.pathcharacter Path(s) to Cell Ranger count data, one directory per sample. If multiple paths, do c("path1","path2") (default = NULL)
cmslist List with count matrices (default = NULL)
cms.preprocessedlist List with preprocessed count matrices after $doPreprocessing() (default = NULL)
cms.rawlist List with raw, unfiltered count matrices, i.e., including all CBs detected also empty droplets (default = NULL)
summary.metricsdata.frame Summary metrics from Cell Ranger (default = NULL)
detailed.metricsdata.frame Detailed metrics, i.e., no. genes and UMIs per cell (default = NULL)
comp.groupcharacter A group present in the metadata to compare the metrics by, can be added with addComparison (default = NULL)
verboselogical Print messages or not (default = TRUE)
themeggplot2 theme (default: theme_bw())
palPlotting palette (default = NULL)
n.coresnumeric Number of cores for calculations (default = 1) Initialize a CRMetrics object
Methods
Public methods
Method new()
To initialize new object, 'data.path' or 'cms' is needed. 'metadata' is also recommended, but not required.
Usage
CRMetrics$new( data.path = NULL, metadata = NULL, cms = NULL, samples = NULL, unique.names = TRUE, sep.cells = "!!", comp.group = NULL, verbose = TRUE, theme = theme_bw(), n.cores = 1, sep.meta = ",", raw.meta = FALSE, pal = NULL )
Arguments
data.pathcharacter Path to directory with Cell Ranger count data, one directory per sample (default = NULL).
metadatadata.frame or character Path to metadata file (comma-separated) or name of metadata dataframe object. Metadata must contain a column named 'sample' containing sample names that must match folder names in 'data.path' (default = NULL)
cmslist List with count matrices (default = NULL)
samplescharacter Sample names. Only relevant is cms is provided (default = NULL)
unique.nameslogical Create unique cell names. Only relevant if cms is provided (default = TRUE)
sep.cellscharacter Sample-cell separator. Only relevant if cms is provided and
unique.names=TRUE(default = "!!")comp.groupcharacter A group present in the metadata to compare the metrics by, can be added with addComparison (default = NULL)
verboselogical Print messages or not (default = TRUE)
themeggplot2 theme (default: theme_bw())
n.coresinteger Number of cores for the calculations (default = self$n.cores)
sep.metacharacter Separator for metadata file (default = ",")
raw.metalogical Keep metadata in its raw format. If FALSE, classes will be converted using "type.convert" (default = FALSE)
palcharacter Plotting palette (default = NULL)
Returns
CRMetrics object
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
}
Method addDetailedMetrics()
Function to read in detailed metrics. This is not done upon initialization for speed.
Usage
CRMetrics$addDetailedMetrics( cms = self$cms, min.transcripts.per.cell = 100, n.cores = self$n.cores, verbose = self$verbose )
Arguments
cmslist List of (sparse) count matrices (default = self$cms)
min.transcripts.per.cellnumeric Minimal number of transcripts per cell (default = 100)
n.coresinteger Number of cores for the calculations (default = self$n.cores).
verboselogical Print messages or not (default = self$verbose).
Returns
Count matrices
Examples
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Run function
crm$addDetailedMetrics()
Method addComparison()
Add comparison group for statistical testing.
Usage
CRMetrics$addComparison(comp.group, metadata = self$metadata)
Arguments
comp.groupcharacter Comparison metric (default = self$comp.group).
metadatadata.frame Metadata for samples (default = self$metadata).
Returns
Vector
Examples
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Add metadata
crm$metadata <- data.frame(sex = c("male","female"))
# Add comparison group
crm$addComparison(comp.group = "sex")
Method plotSamples()
Plot the number of samples.
Usage
CRMetrics$plotSamples( comp.group = self$comp.group, h.adj = 0.05, exact = FALSE, metadata = self$metadata, second.comp.group = NULL, pal = self$pal )
Arguments
comp.groupcharacter Comparison metric, must match a column name of metadata (default = self$comp.group).
h.adjnumeric Position of statistics test p value as % of max(y) (default = 0.05).
exactlogical Whether to calculate exact p values (default = FALSE).
metadatadata.frame Metadata for samples (default = self$metadata).
second.comp.groupcharacter Second comparison metric, must match a column name of metadata (default = NULL).
palcharacter Plotting palette (default = self$pal)
Returns
ggplot2 object
Examples
samples <- c("sample1", "sample2")
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
names(testdata.cms) <- samples
# Create metadata
metadata <- data.frame(sample = samples,
sex = c("male","female"),
condition = c("a","b"))
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, metadata = metadata, n.cores = 1)
# Plot
crm$plotSamples(comp.group = "sex", second.comp.group = "condition")
Method plotSummaryMetrics()
Plot all summary stats or a selected list.
Usage
CRMetrics$plotSummaryMetrics(
comp.group = self$comp.group,
second.comp.group = NULL,
metrics = NULL,
h.adj = 0.05,
plot.stat = TRUE,
stat.test = c("non-parametric", "parametric"),
exact = FALSE,
metadata = self$metadata,
summary.metrics = self$summary.metrics,
plot.geom = "bar",
se = FALSE,
group.reg.lines = FALSE,
secondary.testing = TRUE,
pal = self$pal
)Arguments
comp.groupcharacter Comparison metric (default = self$comp.group).
second.comp.groupcharacter Second comparison metric, used for the metric "samples per group" or when "comp.group" is a numeric or an integer (default = NULL).
metricscharacter Metrics to plot (default = NULL).
h.adjnumeric Position of statistics test p value as % of max(y) (default = 0.05)
plot.statlogical Show statistics in plot. Will be FALSE if "comp.group" = "sample" or if "comp.group" is a numeric or an integer (default = TRUE)
stat.testcharacter Statistical test to perform to compare means. Can either be "non-parametric" or "parametric" (default = "non-parametric").
exactlogical Whether to calculate exact p values (default = FALSE).
metadatadata.frame Metadata for samples (default = self$metadata).
summary.metricsdata.frame Summary metrics (default = self$summary.metrics).
plot.geomcharacter Which geometric is used to plot the data (default = "point").
selogical For regression lines, show SE (default = FALSE)
group.reg.lineslogical For regression lines, if FALSE show one line, if TRUE show line per group defined by second.comp.group (default = FALSE)
secondary.testinglogical Whether to show post hoc testing (default = TRUE)
palcharacter Plotting palette (default = self$pal)
Returns
ggplot2 object
Examples
\donttest{
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Add summary metrics
crm$addSummaryFromCms()
crm$plotSummaryMetrics(plot.geom = "point")
}
Method plotDetailedMetrics()
Plot detailed metrics from the detailed.metrics object
Usage
CRMetrics$plotDetailedMetrics( comp.group = self$comp.group, detailed.metrics = self$detailed.metrics, metadata = self$metadata, metrics = NULL, plot.geom = "violin", hline = TRUE, pal = self$pal )
Arguments
comp.groupcharacter Comparison metric (default = self$comp.group).
detailed.metricsdata.frame Object containing the count matrices (default = self$detailed.metrics).
metadatadata.frame Metadata for samples (default = self$metadata).
metricscharacter Metrics to plot. NULL plots both plots (default = NULL).
plot.geomcharacter How to plot the data (default = "violin").
hlinelogical Whether to show median as horizontal line (default = TRUE)
palcharacter Plotting palette (default = self$pal)
data.pathcharacter Path to Cell Ranger count data (default = self$data.path).
Returns
ggplot2 object
Examples
\donttest{
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Add detailed metrics
crm$addDetailedMetrics()
# Plot
crm$plotDetailedMetrics()
}
Method plotEmbedding()
Plot cells in embedding using Conos and color by depth and doublets.
Usage
CRMetrics$plotEmbedding(
depth = FALSE,
doublet.method = NULL,
doublet.scores = FALSE,
depth.cutoff = 1000,
mito.frac = FALSE,
mito.cutoff = 0.05,
species = c("human", "mouse"),
size = 0.3,
sep = "!!",
pal = NULL,
...
)Arguments
depthlogical Plot depth or not (default = FALSE).
doublet.methodcharacter Doublet detection method (default = NULL).
doublet.scoreslogical Plot doublet scores or not (default = FALSE).
depth.cutoffnumeric Depth cutoff (default = 1e3).
mito.fraclogical Plot mitochondrial fraction or not (default = FALSE).
mito.cutoffnumeric Mitochondrial fraction cutoff (default = 0.05).
speciescharacter Species to calculate the mitochondrial fraction for (default = c("human","mouse")).
sizenumeric Dot size (default = 0.3)
sepcharacter Separator for creating unique cell names (default = "!!")
palcharacter Plotting palette (default = NULL)
...Plotting parameters passed to
sccore::embeddingPlot.
Returns
ggplot2 object
Examples
\donttest{
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
crm$plotEmbedding()
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
}
Method plotDepth()
Plot the sequencing depth in histogram.
Usage
CRMetrics$plotDepth( cutoff = 1000, samples = self$metadata$sample, sep = "!!", keep.col = "#E7CDC2", filter.col = "#A65141" )
Arguments
cutoffnumeric The depth cutoff to color the cells in the embedding (default = 1e3).
samplescharacter Sample names to include for plotting (default = $metadata$sample).
sepcharacter Separator for creating unique cell names (default = "!!")
keep.colcharacter Color for density of cells that are kept (default = "#E7CDC2")
filter.colCharacter Color for density of cells to be filtered (default = "#A65141")
Returns
ggplot2 object
Examples
\donttest{
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Plot
crm$plotDepth()
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
}
Method plotMitoFraction()
Plot the mitochondrial fraction in histogram.
Usage
CRMetrics$plotMitoFraction(
cutoff = 0.05,
species = c("human", "mouse"),
samples = self$metadata$sample,
sep = "!!",
keep.col = "#E7CDC2",
filter.col = "#A65141"
)Arguments
cutoffnumeric The mito. fraction cutoff to color the embedding (default = 0.05)
speciescharacter Species to calculate the mitochondrial fraction for (default = "human")
samplescharacter Sample names to include for plotting (default = $metadata$sample)
sepcharacter Separator for creating unique cell names (default = "!!")
keep.colcharacter Color for density of cells that are kept (default = "#E7CDC2")
filter.colCharacter Color for density of cells to be filtered (default = "#A65141")
Returns
ggplot2 object
Examples
\donttest{
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Plot
crm$plotMitoFraction()
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
}
Method detectDoublets()
Detect doublet cells.
Usage
CRMetrics$detectDoublets(
method = c("scrublet", "doubletdetection"),
cms = self$cms,
samples = self$metadata$sample,
env = "r-reticulate",
conda.path = system("whereis conda"),
n.cores = self$n.cores,
verbose = self$verbose,
args = list(),
export = FALSE,
data.path = self$data.path
)Arguments
methodcharacter Which method to use, either
scrubletordoubletdetection(default="scrublet").cmslist List containing the count matrices (default=self$cms).
samplescharacter Vector of sample names. If NULL, samples are extracted from cms (default = self$metadata$sample)
envcharacter Environment to run python in (default="r-reticulate").
conda.pathcharacter Path to conda environment (default=system("whereis conda")).
n.coresinteger Number of cores to use (default = self$n.cores)
verboselogical Print messages or not (default = self$verbose)
argslist A list with additional arguments for either
DoubletDetectionorscrublet. Please check the respective manuals.exportboolean Export CMs in order to detect doublets outside R (default = FALSE)
data.pathcharacter Path to write data, only relevant if
export = TRUE. Last character must be/(default = self$data.path)
Returns
data.frame
Examples
\dontrun{
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Detect doublets
crm$detectDoublets(method = "scrublet",
conda.path = "/opt/software/miniconda/4.12.0/condabin/conda")
}
Method doPreprocessing()
Perform conos preprocessing.
Usage
CRMetrics$doPreprocessing(
cms = self$cms,
preprocess = c("pagoda2", "seurat"),
min.transcripts.per.cell = 100,
verbose = self$verbose,
n.cores = self$n.cores,
get.largevis = FALSE,
tsne = FALSE,
make.geneknn = FALSE,
cluster = FALSE,
...
)Arguments
cmslist List containing the count matrices (default = self$cms).
preprocesscharacter Method to use for preprocessing (default = c("pagoda2","seurat")).
min.transcripts.per.cellnumeric Minimal transcripts per cell (default = 100)
verboselogical Print messages or not (default = self$verbose).
n.coresinteger Number of cores for the calculations (default = self$n.cores).
get.largevislogical For Pagoda2, create largeVis embedding (default = FALSE)
tsnelogical Create tSNE embedding (default = FALSE)
make.geneknnlogical For Pagoda2, estimate gene kNN (default = FALSE)
clusterlogical For Seurat, estimate clusters (default = FALSE)
...Additional arguments for
Pagaoda2::basicP2Procorconos:::basicSeuratProc
Returns
Conos object
Examples
\donttest{
if (requireNamespace("pagoda2", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Perform preprocessing
crm$doPreprocessing(preprocess = "pagoda2")
} else {
message("Package 'pagoda2' not available.")
}
}
Method createEmbedding()
Create Conos embedding.
Usage
CRMetrics$createEmbedding( cms = self$cms.preprocessed, verbose = self$verbose, n.cores = self$n.cores, arg.buildGraph = list(), arg.findCommunities = list(), arg.embedGraph = list(method = "UMAP") )
Arguments
cmslist List containing the preprocessed count matrices (default = self$cms.preprocessed).
verboselogical Print messages or not (default = self$verbose).
n.coresinteger Number of cores for the calculations (default = self$n.cores).
arg.buildGraphlist A list with additional arguments for the
buildGraphfunction in Conos (default = list())arg.findCommunitieslist A list with additional arguments for the
findCommunitiesfunction in Conos (default = list())arg.embedGraphlist A list with additional arguments for the
embedGraphfunction in Conos (default = list(method = "UMAP))
Returns
Conos object
Examples
\donttest{
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
}
Method filterCms()
Filter cells based on depth, mitochondrial fraction and doublets from the count matrix.
Usage
CRMetrics$filterCms(
depth.cutoff = NULL,
mito.cutoff = NULL,
doublets = NULL,
species = c("human", "mouse"),
samples.to.exclude = NULL,
verbose = self$verbose,
sep = "!!",
raw = FALSE
)Arguments
depth.cutoffnumeric Depth (transcripts per cell) cutoff (default = NULL).
mito.cutoffnumeric Mitochondrial fraction cutoff (default = NULL).
doubletscharacter Doublet detection method to use (default = NULL).
speciescharacter Species to calculate the mitochondrial fraction for (default = "human").
samples.to.excludecharacter Sample names to exclude (default = NULL)
verboselogical Show progress (default = self$verbose)
sepcharacter Separator for creating unique cell names (default = "!!")
rawboolean Filter on raw, unfiltered count matrices. Usually not intended (default = FALSE)
Returns
list of filtered count matrices
Examples
\donttest{
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Filter CMs
crm$filterCms(depth.cutoff = 1e3, mito.cutoff = 0.05)
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
}
Method selectMetrics()
Select metrics from summary.metrics
Usage
CRMetrics$selectMetrics(ids = NULL)
Arguments
idscharacter Metric id to select (default = NULL).
Returns
vector
Examples
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Select metrics
crm$selectMetrics()
selection.metrics <- crm$selectMetrics(c(1:4))
Method plotFilteredCells()
Plot filtered cells in an embedding, in a bar plot, on a tile or export the data frame
Usage
CRMetrics$plotFilteredCells(
type = c("embedding", "bar", "tile", "export"),
depth = TRUE,
depth.cutoff = 1000,
doublet.method = NULL,
mito.frac = TRUE,
mito.cutoff = 0.05,
species = c("human", "mouse"),
size = 0.3,
sep = "!!",
cols = c("grey80", "red", "blue", "green", "yellow", "black", "pink", "purple"),
...
)Arguments
typecharacter The type of plot to use: embedding, bar, tile or export (default = c("embedding","bar","tile","export")).
depthlogical Plot the depth or not (default = TRUE).
depth.cutoffnumeric Depth cutoff, either a single number or a vector with cutoff per sample and with sampleIDs as names (default = 1e3).
doublet.methodcharacter Method to detect doublets (default = NULL).
mito.fraclogical Plot the mitochondrial fraction or not (default = TRUE).
mito.cutoffnumeric Mitochondrial fraction cutoff, either a single number or a vector with cutoff per sample and with sampleIDs as names (default = 0.05).
speciescharacter Species to calculate the mitochondrial fraction for (default = c("human","mouse")).
sizenumeric Dot size (default = 0.3)
sepcharacter Separator for creating unique cell names (default = "!!")
colscharacter Colors used for plotting (default = c("grey80","red","blue","green","yellow","black","pink","purple"))
...Plotting parameters passed to
sccore::embeddingPlot.
Returns
ggplot2 object or data frame
Examples
\donttest{
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Plot and extract result
crm$plotFilteredCells(type = "embedding")
filtered.cells <- crm$plotFilteredCells(type = "export")
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
}
Method getDepth()
Extract sequencing depth from Conos object.
Usage
CRMetrics$getDepth(cms = self$cms)
Arguments
cmslist List of (sparse) count matrices (default = self$cms)
Returns
data frame
Examples
\donttest{
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Get depth
crm$getDepth()
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
}
Method getMitoFraction()
Calculate the fraction of mitochondrial genes.
Usage
CRMetrics$getMitoFraction(species = c("human", "mouse"), cms = self$cms)Arguments
speciescharacter Species to calculate the mitochondrial fraction for (default = "human").
cmslist List of (sparse) count matrices (default = self$cms)
Returns
data frame
Examples
\donttest{
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Get mito. fraction
crm$getMitoFraction(species = c("human", "mouse"))
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
}
Method prepareCellbender()
Create plots and script call for CellBender
Usage
CRMetrics$prepareCellbender( shrinkage = 100, show.expected.cells = TRUE, show.total.droplets = TRUE, expected.cells = NULL, total.droplets = NULL, cms.raw = self$cms.raw, umi.counts = self$cellbender$umi.counts, data.path = self$data.path, samples = self$metadata$sample, verbose = self$verbose, n.cores = self$n.cores, unique.names = FALSE, sep = "!!" )
Arguments
shrinkageinteger Select every nth UMI count per cell for plotting. Improves plotting speed drastically. To plot all cells, set to 1 (default = 100)
show.expected.cellslogical Plot line depicting expected number of cells (default = TRUE)
show.total.dropletslogical Plot line depicting total droplets included for CellBender run (default = TRUE)
expected.cellsnamed numeric If NULL, expected cells will be deduced from the number of cells per sample identified by Cell Ranger. Otherwise, a named vector of expected cells with sample IDs as names. Sample IDs must match those in summary.metrics (default: stored named vector)
total.dropletsnamed numeric If NULL, total droplets included will be deduced from expected cells multiplied by 3. Otherwise, a named vector of total droplets included with sample IDs as names. Sample IDs must match those in summary.metrics (default: stored named vector)
cms.rawlist Raw count matrices from HDF5 Cell Ranger outputs (default = self$cms.raw)
umi.countslist UMI counts calculated as column sums of raw count matrices from HDF5 Cell Ranger outputs (default: stored list)
data.pathcharacter Path to Cell Ranger outputs (default = self$data.path)
samplescharacter Sample names to include (default = self$metadata$sample)
verboselogical Show progress (default: stored vector)
n.coresinteger Number of cores (default: stored vector)
unique.nameslogical Create unique cell names (default = FALSE)
sepcharacter Separator for creating unique cell names (default = "!!")
Returns
ggplot2 object and bash script
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data")
crm$prepareCellbender()
}
Method saveCellbenderScript()
Usage
CRMetrics$saveCellbenderScript( file = "cellbender_script.sh", fpr = 0.01, epochs = 150, use.gpu = TRUE, expected.cells = NULL, total.droplets = NULL, data.path = self$data.path, samples = self$metadata$sample, args = NULL )
Arguments
filecharacter File name for CellBender script. Will be stored in
data.path(default: "cellbender_script.sh")fprnumeric False positive rate for CellBender (default = 0.01)
epochsinteger Number of epochs for CellBender (default = 150)
use.gpulogical Use CUDA capable GPU (default = TRUE)
expected.cellsnamed numeric If NULL, expected cells will be deduced from the number of cells per sample identified by Cell Ranger. Otherwise, a named vector of expected cells with sample IDs as names. Sample IDs must match those in summary.metrics (default: stored named vector)
total.dropletsnamed numeric If NULL, total droplets included will be deduced from expected cells multiplied by 3. Otherwise, a named vector of total droplets included with sample IDs as names. Sample IDs must match those in summary.metrics (default: stored named vector)
data.pathcharacter Path to Cell Ranger outputs (default = self$data.path)
samplescharacter Sample names to include (default = self$metadata$sample)
argscharacter (optional) Additional parameters for CellBender
Returns
bash script
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
}
Method getExpectedCells()
Extract the expected number of cells per sample based on the Cell Ranger summary metrics
Usage
CRMetrics$getExpectedCells(samples = self$metadata$sample)
Arguments
samplescharacter Sample names to include (default = self$metadata$sample)
Returns
A numeric vector
Examples
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Get summary
crm$addSummaryFromCms()
# Get no. cells
crm$getExpectedCells()
Method getTotalDroplets()
Get the total number of droplets included in the CellBender estimations. Based on the Cell Ranger summary metrics and multiplied by a preset multiplier.
Usage
CRMetrics$getTotalDroplets(samples = self$metadata$sample, multiplier = 3)
Arguments
samplescharacter Samples names to include (default = self$metadata$sample)
multipliernumeric Number to multiply expected number of cells with (default = 3)
Returns
A numeric vector
Examples
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Add summary
crm$addSummaryFromCms()
# Get no. droplets
crm$getTotalDroplets()
Method addCms()
Add a list of count matrices to the CRMetrics object.
Usage
CRMetrics$addCms( cms = NULL, data.path = self$data.path, samples = self$metadata$sample, cellbender = FALSE, raw = FALSE, symbol = TRUE, unique.names = TRUE, sep = "!!", add.metadata = TRUE, n.cores = self$n.cores, verbose = self$verbose )
Arguments
cmslist List of (sparse) count matrices (default = NULL)
data.pathcharacter Path to cellranger count data (default = self$data.path).
samplescharacter Vector of sample names. If NULL, samples are extracted from cms (default = self$metadata$sample)
cellbenderlogical Add CellBender filtered count matrices in HDF5 format. Requires that "cellbender" is in the names of the files (default = FALSE)
rawlogical Add raw count matrices from Cell Ranger output. Cannot be combined with
cellbender=TRUE(default = FALSE)symbolcharacter The type of gene IDs to use, SYMBOL (TRUE) or ENSEMBLE (default = TRUE)
unique.nameslogical Make cell names unique based on
sepparameter (default = TRUE)sepcharacter Separator used to create unique cell names (default = "!!")
add.metadataboolean Add metadata from cms or not (default = TRUE)
n.coresinteger Number of cores to use (default = self$n.cores)
verboseboolean Print progress (default = self$verbose)
Returns
Add list of (sparse) count matrices to R6 class object
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
crm$addCms(cms = testdata.cms)
}
Method plotCbTraining()
Plot the results from the CellBender estimations
Usage
CRMetrics$plotCbTraining( data.path = self$data.path, samples = self$metadata$sample, pal = self$pal )
Arguments
data.pathcharacter Path to Cell Ranger outputs (default = self$data.path)
samplescharacter Sample names to include (default = self$metadata$sample)
palcharacter Plotting palette (default = self$pal)
Returns
A ggplot2 object
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## Run CellBender script
crm$plotCbTraining()
}
Method plotCbCellProbs()
Plot the CellBender assigned cell probabilities
Usage
CRMetrics$plotCbCellProbs( data.path = self$data.path, samples = self$metadata$sample, low.col = "gray", high.col = "red" )
Arguments
data.pathcharacter Path to Cell Ranger outputs (default = self$data.path)
samplescharacter Sample names to include (default = self$metadata$sample)
low.colcharacter Color for low probabilities (default = "gray")
high.colcharacter Color for high probabilities (default = "red")
Returns
A ggplot2 object
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## Run the CellBender script
crm$plotCbCellProbs()
}
Method plotCbAmbExp()
Plot the estimated ambient gene expression per sample from CellBender calculations
Usage
CRMetrics$plotCbAmbExp( cutoff = 0.005, data.path = self$data.path, samples = self$metadata$sample )
Arguments
cutoffnumeric Horizontal line included in the plot to indicate highly expressed ambient genes (default = 0.005)
data.pathcharacter Path to Cell Ranger outputs (default = self$data.path)
samplescharacter Sample names to include (default = self$metadata$sample)
Returns
A ggplot2 object
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## Run CellBender script
crm$plotCbAmbExp()
}
Method plotCbAmbGenes()
Plot the most abundant estimated ambient genes from the CellBender calculations
Usage
CRMetrics$plotCbAmbGenes( cutoff = 0.005, data.path = self$data.path, samples = self$metadata$sample, pal = self$pal )
Arguments
cutoffnumeric Cutoff of ambient gene expression to use to extract ambient genes per sample
data.pathcharacter Path to Cell Ranger outputs (default = self$data.path)
samplescharacter Sample names to include (default = self$metadata$sample)
palcharacter Plotting palette (default = self$pal)
Returns
A ggplot2 object
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## Run CellBender script
crm$plotCbAmbGenes()
}
Method addSummaryFromCms()
Add summary metrics from a list of count matrices
Usage
CRMetrics$addSummaryFromCms( cms = self$cms, n.cores = self$n.cores, verbose = self$verbose )
Arguments
cmslist A list of filtered count matrices (default = self$cms)
n.coresinteger Number of cores to use (default = self$n.cores)
verboselogical Show progress (default = self$verbose)
Returns
data.frame
Examples
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Add summary
crm$addSummaryFromCms()
Method runSoupX()
Run SoupX ambient RNA estimation and correction
Usage
CRMetrics$runSoupX( data.path = self$data.path, samples = self$metadata$sample, n.cores = self$n.cores, verbose = self$verbose, arg.load10X = list(), arg.autoEstCont = list(), arg.adjustCounts = list() )
Arguments
data.pathcharacter Path to Cell Ranger outputs (default = self$data.path)
samplescharacter Sample names to include (default = self$metadata$sample)
n.coresnumeric Number of cores (default = self$n.cores)
verboselogical Show progress (default = self$verbose)
arg.load10Xlist A list with additional parameters for
SoupX::load10X(default = list())arg.autoEstContlist A list with additional parameters for
SoupX::autoEstCont(default = list())arg.adjustCountslist A list with additional parameters for
SoupX::adjustCounts(default = list())
Returns
List containing a list with corrected counts, and a data.frame containing plotting estimations
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$runSoupX()
}
Method plotSoupX()
Plot the results from the SoupX estimations
Usage
CRMetrics$plotSoupX(plot.df = self$soupx$plot.df)
Arguments
plot.dfdata.frame SoupX estimations (default = self$soupx$plot.df)
Returns
A ggplot2 object
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$runSoupX()
crm$plotSoupX()
}
Method plotCbCells()
Plot CellBender cell estimations against the estimated cell numbers from Cell Ranger
Usage
CRMetrics$plotCbCells( data.path = self$data.path, samples = self$metadata$sample, pal = self$pal )
Arguments
data.pathcharacter Path to Cell Ranger outputs (default = self$data.path)
samplescharacter Sample names to include (default = self$metadata$sample)
palcharacter Plotting palette (default = self$pal)
Returns
A ggplot2 object
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## Run CellBender script
crm$plotCbCells()
}
Method addDoublets()
Add doublet results created from exported Python script
Usage
CRMetrics$addDoublets(
method = c("scrublet", "doubletdetection"),
data.path = self$data.path,
samples = self$metadata$sample,
cms = self$cms,
verbose = self$verbose
)Arguments
methodcharacter Which method to use, either
scrubletordoubletdetection(default is both).data.pathcharacter Path to Cell Ranger outputs (default = self$data.path)
samplescharacter Sample names to include (default = self$metadata$sample)
cmslist List containing the count matrices (default = self$cms).
verboseboolean Print progress (default = self$verbose)
Returns
List of doublet results
Examples
\dontrun{
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$detectDoublets(export = TRUE)
## Run Python script
crm$addDoublets()
}
Method clone()
The objects of this class are cloneable with this method.
Usage
CRMetrics$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Examples
## ------------------------------------------------
## Method `CRMetrics$new`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$addDetailedMetrics`
## ------------------------------------------------
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Run function
crm$addDetailedMetrics()
## ------------------------------------------------
## Method `CRMetrics$addComparison`
## ------------------------------------------------
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Add metadata
crm$metadata <- data.frame(sex = c("male","female"))
# Add comparison group
crm$addComparison(comp.group = "sex")
## ------------------------------------------------
## Method `CRMetrics$plotSamples`
## ------------------------------------------------
samples <- c("sample1", "sample2")
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
names(testdata.cms) <- samples
# Create metadata
metadata <- data.frame(sample = samples,
sex = c("male","female"),
condition = c("a","b"))
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, metadata = metadata, n.cores = 1)
# Plot
crm$plotSamples(comp.group = "sex", second.comp.group = "condition")
## ------------------------------------------------
## Method `CRMetrics$plotSummaryMetrics`
## ------------------------------------------------
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Add summary metrics
crm$addSummaryFromCms()
crm$plotSummaryMetrics(plot.geom = "point")
## ------------------------------------------------
## Method `CRMetrics$plotDetailedMetrics`
## ------------------------------------------------
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Add detailed metrics
crm$addDetailedMetrics()
# Plot
crm$plotDetailedMetrics()
## ------------------------------------------------
## Method `CRMetrics$plotEmbedding`
## ------------------------------------------------
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
crm$plotEmbedding()
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
## ------------------------------------------------
## Method `CRMetrics$plotDepth`
## ------------------------------------------------
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Plot
crm$plotDepth()
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
## ------------------------------------------------
## Method `CRMetrics$plotMitoFraction`
## ------------------------------------------------
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Plot
crm$plotMitoFraction()
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
## ------------------------------------------------
## Method `CRMetrics$detectDoublets`
## ------------------------------------------------
## Not run:
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Detect doublets
crm$detectDoublets(method = "scrublet",
conda.path = "/opt/software/miniconda/4.12.0/condabin/conda")
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$doPreprocessing`
## ------------------------------------------------
if (requireNamespace("pagoda2", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Perform preprocessing
crm$doPreprocessing(preprocess = "pagoda2")
} else {
message("Package 'pagoda2' not available.")
}
## ------------------------------------------------
## Method `CRMetrics$createEmbedding`
## ------------------------------------------------
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
## ------------------------------------------------
## Method `CRMetrics$filterCms`
## ------------------------------------------------
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Filter CMs
crm$filterCms(depth.cutoff = 1e3, mito.cutoff = 0.05)
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
## ------------------------------------------------
## Method `CRMetrics$selectMetrics`
## ------------------------------------------------
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Select metrics
crm$selectMetrics()
selection.metrics <- crm$selectMetrics(c(1:4))
## ------------------------------------------------
## Method `CRMetrics$plotFilteredCells`
## ------------------------------------------------
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Plot and extract result
crm$plotFilteredCells(type = "embedding")
filtered.cells <- crm$plotFilteredCells(type = "export")
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
## ------------------------------------------------
## Method `CRMetrics$getDepth`
## ------------------------------------------------
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Get depth
crm$getDepth()
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
## ------------------------------------------------
## Method `CRMetrics$getMitoFraction`
## ------------------------------------------------
if (requireNamespace("pagoda2", quietly = TRUE)) {
if (requireNamespace("conos", quietly = TRUE)) {
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Create embedding
crm$doPreprocessing()
crm$createEmbedding()
# Get mito. fraction
crm$getMitoFraction(species = c("human", "mouse"))
} else {
message("Package 'conos' not available.")
}
} else {
message("Package 'pagoda2' not available.")
}
## ------------------------------------------------
## Method `CRMetrics$prepareCellbender`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data")
crm$prepareCellbender()
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$saveCellbenderScript`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$getExpectedCells`
## ------------------------------------------------
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Get summary
crm$addSummaryFromCms()
# Get no. cells
crm$getExpectedCells()
## ------------------------------------------------
## Method `CRMetrics$getTotalDroplets`
## ------------------------------------------------
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Add summary
crm$addSummaryFromCms()
# Get no. droplets
crm$getTotalDroplets()
## ------------------------------------------------
## Method `CRMetrics$addCms`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
crm$addCms(cms = testdata.cms)
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$plotCbTraining`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## Run CellBender script
crm$plotCbTraining()
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$plotCbCellProbs`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## Run the CellBender script
crm$plotCbCellProbs()
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$plotCbAmbExp`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## Run CellBender script
crm$plotCbAmbExp()
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$plotCbAmbGenes`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## Run CellBender script
crm$plotCbAmbGenes()
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$addSummaryFromCms`
## ------------------------------------------------
# Simulate data
testdata.cms <- lapply(seq_len(2), \(x) {
out <- Matrix::rsparsematrix(2e3, 1e3, 0.1)
out[out < 0] <- 1
dimnames(out) <- list(sapply(seq_len(2e3), \(x) paste0("gene",x)),
sapply(seq_len(1e3), \(x) paste0("cell",x)))
return(out)
})
# Initialize
crm <- CRMetrics$new(cms = testdata.cms, samples = c("sample1", "sample2"), n.cores = 1)
# Add summary
crm$addSummaryFromCms()
## ------------------------------------------------
## Method `CRMetrics$runSoupX`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$runSoupX()
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$plotSoupX`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$runSoupX()
crm$plotSoupX()
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$plotCbCells`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$prepareCellbender()
crm$saveCellbenderScript()
## Run CellBender script
crm$plotCbCells()
## End(Not run)
## ------------------------------------------------
## Method `CRMetrics$addDoublets`
## ------------------------------------------------
## Not run:
crm <- CRMetrics$new(data.path = "/path/to/count/data/")
crm$detectDoublets(export = TRUE)
## Run Python script
crm$addDoublets()
## End(Not run)
Add detailed metrics
Description
Add detailed metrics, requires to load raw count matrices using pagoda2.
Usage
addDetailedMetricsInner(cms, verbose = TRUE, n.cores = 1)
Arguments
cms |
List containing the count matrices. |
verbose |
Print messages (default = TRUE). |
n.cores |
Number of cores for the calculations (default = 1). |
Value
data frame
Add statistics to plot
Description
Use ggpubr to add statistics to plots.
Usage
addPlotStats(
p,
comp.group,
metadata,
h.adj = 0.05,
primary.test,
secondary.test,
exact = FALSE
)
Arguments
p |
Plot to add statistics to. |
comp.group |
Comparison metric. |
metadata |
Metadata for samples. |
h.adj |
Position of statistics test p value as % of max(y) (default = 0.05). |
primary.test |
Primary statistical test, e.g. "anova", "kruskal.test". |
secondary.test |
Secondary statistical test, e.g. "t-test", "wilcox.test" |
exact |
Whether to calculate exact p values (default = FALSE). |
Value
ggplot2 object
Add statistics to plot
Description
Use ggpubr to add statistics to samples or plot
Usage
addPlotStatsSamples(
p,
comp.group,
metadata,
h.adj = 0.05,
exact = FALSE,
second.comp.group
)
Arguments
p |
Plot to add statistics to. |
comp.group |
Comparison metric. |
metadata |
Metadata for samples. |
h.adj |
Position of statistics test p value as % of max(y) (default = 0.05). |
exact |
Whether to calculate exact p values (default = FALSE). |
second.comp.group |
Second comparison metric. |
Value
ggplot2 object
Add summary metrics
Description
Add summary metrics by reading Cell Ranger metrics summary files.
Usage
addSummaryMetrics(data.path, metadata, n.cores = 1, verbose = TRUE)
Arguments
data.path |
Path to cellranger count data. |
metadata |
Metadata for samples. |
n.cores |
Number of cores for the calculations (default = 1). |
verbose |
Print messages (default = TRUE). |
Value
data frame
Set correct 'comp.group' parameter
Description
Set comp.group to 'category' if null.
Usage
checkCompGroup(comp.group, category, verbose = TRUE)
Arguments
comp.group |
Comparison metric. |
category |
Comparison metric to use if comp.group is not provided. |
verbose |
Print messages (default = TRUE). |
Value
vector
Check whether 'comp.group' is in metadata
Description
Checks whether 'comp.group' is any of the column names in metadata.
Usage
checkCompMeta(comp.group, metadata)
Arguments
comp.group |
Comparison metric. |
metadata |
Metadata for samples. |
Value
nothing or stop
Check data path
Description
Helper function to check that data.path is not NULL
Usage
checkDataPath(data.path)
Arguments
data.path |
character Path to be checked |
Create unique cell names
Description
Create unique cell names from sample IDs and cell IDs
Usage
createUniqueCellNames(cms, samples, sep = "!!")
Arguments
cms |
list List of count matrices, should be named (optional) |
samples |
character Optional, list of sample names |
sep |
character Separator between sample IDs and cell IDs (default = "!!") |
Create filtering vector
Description
Create logical filtering vector based on a numeric vector and a (sample-wise) cutoff
Usage
filterVector(num.vec, name, filter, samples, sep = "!!")
Arguments
num.vec |
numeric Numeric vector to create filter on |
name |
character Name of filter |
filter |
numeric Either a single numeric value or a numeric value with length of samples |
samples |
character Sample IDs |
sep |
character Separator to split cells by into sample-wise lists (default = "!!") |
Get H5 file paths
Description
Get file paths for H5 files
Usage
getH5Paths(data.path, samples = NULL, type = NULL)
Arguments
data.path |
character Path for directory containing sample-wise directories with Cell Ranger count outputs |
samples |
character Sample names to include (default = NULL) |
type |
character Type of H5 files to get paths for, one of "raw", "filtered" (Cell Ranger count outputs), "cellbender" (raw CellBender outputs), "cellbender_filtered" (CellBender filtered outputs) (default = "type") |
Get labels for percentage of filtered cells
Description
Labels the percentage of filtered cells based on mitochondrial fraction, sequencing depth and doublets as low, medium or high
Usage
labelsFilter(filter.data)
Arguments
filter.data |
Data frame containing the mitochondrial fraction, depth and doublets per sample. |
Value
data frame
Calculate percentage of filtered cells
Description
Calculate percentage of filtered cells based on the filter
Usage
percFilter(filter.data, filter = "mito", no.vars = 1)
Arguments
filter.data |
Data frame containing the mitochondrial fraction, depth and doublets per sample. |
filter |
The variable to filter (default = "mito") |
no.vars |
numeric Number of variables (default = 1) |
Value
vector
Plot the data as points, as bars as a histogram, or as a violin
Description
Plot the data as points, barplot, histogram or violin
Usage
plotGeom(g, plot.geom, col, pal = NULL)
Arguments
g |
ggplot2 object |
plot.geom |
The plot.geom to use, "point", "bar", "histogram", or "violin". |
pal |
character Palette (default = NULL) |
Value
geom
Load 10x count matrices
Description
Load gene expression count data
Usage
read10x(
data.path,
samples = NULL,
raw = FALSE,
symbol = TRUE,
sep = "!!",
unique.names = TRUE,
n.cores = 1,
verbose = TRUE
)
Arguments
data.path |
Path to cellranger count data. |
samples |
Vector of sample names (default = NULL) |
raw |
logical Add raw count matrices (default = FALSE) |
symbol |
The type of gene IDs to use, SYMBOL (TRUE) or ENSEMBLE (default = TRUE). |
sep |
Separator for cell names (default = "!!"). |
n.cores |
Number of cores for the calculations (default = 1). |
verbose |
Print messages (default = TRUE). |
Value
data frame
Examples
## Not run:
cms <- read10x(data.path = "/path/to/count/data",
samples = crm$metadata$samples,
raw = FALSE,
symbol = TRUE,
n.cores = crm$n.cores)
## End(Not run)
Read 10x HDF5 files
Description
Read 10x HDF5 files
Usage
read10xH5(
data.path,
samples = NULL,
type = c("raw", "filtered", "cellbender", "cellbender_filtered"),
symbol = TRUE,
sep = "!!",
n.cores = 1,
verbose = TRUE,
unique.names = FALSE
)
Arguments
data.path |
character |
samples |
character vector, select specific samples for processing (default = NULL) |
type |
name of H5 file to search for, "raw" and "filtered" are Cell Ranger count outputs, "cellbender" is output from CellBender after running script from saveCellbenderScript |
symbol |
logical Use gene SYMBOLs (TRUE) or ENSEMBL IDs (FALSE) (default = TRUE) |
sep |
character Separator for creating unique cell names from sample IDs and cell IDs (default = "!!") |
n.cores |
integer Number of cores (default = 1) |
verbose |
logical Print progress (default = TRUE) |
unique.names |
logical Create unique cell IDs (default = FALSE) |
Value
list with sparse count matrices
Examples
## Not run:
cms.h5 <- read10xH5(data.path = "/path/to/count/data")
## End(Not run)