MethScope-Tutorial

Generate cell/pixel/sample MRMPs’ embedding

In the below code, we will demonstrate how to genate the cell by MRMPs matrix using our example.cg file, please check out our (https://github.com/zhou-lab/methscope_data). Replace it with your own .cg data file path, and for more details about how to create .cg file, please check out (https://zhou-lab.github.io/YAME/). We provided three reference MRMPs definitions (mouse brain and human pan tissues), please check out our MRMPs reference tutorial for creating your own reference.

#path to your .cg and .cm files
example_file <- "example.cg"
reference_pattern <- "Liu2021_MouseBrain.cm"
input_pattern <- GenerateInput(example_file, reference_pattern)

If your input data contains many cells, we recommend splitting your .cg files and run the above code in parallel to improve the runtime. To split your .cg files, please refer to (https://zhou-lab.github.io/YAME/docs/subset.html). For understanding of each pattern, please check out our knowYourCG tool: https://www.bioconductor.org/packages/release/bioc/html/knowYourCG.html

Perform cell type annotation

To use our pre-trained model, simply use PredictCellType function as shown below for ultra fast cell type annotation. We provided two built in pre-trained models for mouse brain: Liu2021_MouseBrain_P1000 and human tissue atlas: Zhou2025_HumanAtlas_P1000.

prediction_result <- PredictCellType(MethScope:::Liu2021_MouseBrain_P1000,input_pattern)

Train the classification model

To train your own model, use our Input_training function with the cell_type_label vector that correspond to each row of your input_pattern matrix. You can provide your own list for xgb model parameters or set cross_validation = T to find the optimal parameter.

trained_model <- Input_training(input_pattern,cell_type_label)

Visualize the prediction result

We provided some built in functions for visualizing the prediction results

umap_plot <- PlotUMAP(input_pattern,prediction_result)
### cell_type_label is the true cell type label
PlotConfusion(prediction_result,cell_type_label)
PlotF1(prediction_result,cell_type_label)

Cell type deconvolution

Cell type proportions can be estimated with our nnls_deconv functions, to obtain the reference input, please check out our (https://github.com/zhou-lab/methscope_data) which stores the reference patterns.

reference_pattern <- "Liu2021_MouseBrain.cm"
reference_input <- readRDS("2021Liu_reference_pattern.rds")
cell_proportion <- nnls_deconv(reference_input,input_pattern)

Unsupervised clustering

After obtaining the cell by MRMPs matrix input_pattern, simply use it to cluster cells using existing pipeline. Here is a demonstration using Seurat for clustering and UMAP plotting

Pattern.obj <- CreateSeuratObject(counts = t(input_pattern), assay = "DNAm")
VariableFeatures(Pattern.obj) <- rownames(Pattern.obj[['DNAm']])
DefaultAssay(Pattern.obj) <- "DNAm"
Pattern.obj <- NormalizeData(Pattern.obj, assay = "DNAm", verbose = FALSE)
Pattern.obj <- ScaleData(Pattern.obj, assay = "DNAm", verbose = FALSE)
### Can directly use the initial counts matrix
Pattern.obj@assays$DNAm@layers$scale.data <- as.matrix(Pattern.obj@assays$DNAm@layers$counts)
Pattern.obj <- RunPCA(Pattern.obj,assay="DNAm",reduction.name = 'mpca', verbose = FALSE)
Pattern.obj <- FindNeighbors(Pattern.obj, reduction = "mpca", dims = 1:30)
Pattern.obj <- FindClusters(Pattern.obj, verbose = FALSE, resolution = 0.7)
Pattern.obj <- RunUMAP(Pattern.obj, reduction = "mpca",  reduction.name = "meth.umap", dims = 1:30)