Building new deconvolution models

Loading data into a `DigitalDLSorter` object

First, we have to load scRNA-Seq data into a DigitalDLSorter object. This S4 class contains all the slots needed to store the data generated during the construction of new deconvolution models. The information needed consists of three elements:

Count matrix: a matrix with genes as rows and cells as columns.
Cells metadata: a table with annotations (columns) for each cell (rows). The expected information in this data frame is a column with the ID used for each cell, a column with the corresponding cell types, and metadata that could be used as covariates in the following steps (gender, sample type…).
Genes metadata with annotations (columns) for each gene (rows). This data frame should contain the notation used for each gene in the counts matrix and other covariates such as gene length, GC content, etc.

This information may come from a pre-loaded SingleCellExperiment object or from files stored on disk. For the latter, tsv, tsv.gz, sparse matrices (mtx) and HDF5 (h5) formats are accepted. Finally, data will be stored as a SingleCellExperiment object in the single.cell.real slot of the new DigitalDLSorter object.

In addition, it is recommended to provide the bulk RNA-seq to be deconvoluted at this point so that only genes actually relevant for the deconvolution process are considered. The package will filter genes according to different criteria so that only those important for the task will be used for downstream steps.

In this tutorial, we will simulate both the scRNA-seq data used as a reference and the bulk RNA-seq to be deconvoluted. This is done by randomly sampling from a Poisson distribution, but importantly these simulated data are only for descriptive purposes to show functionalities of the package, they are not intended to generate realistic transcriptomic data.

## loading the packages
suppressMessages(library("digitalDLSorteR"))
suppressMessages(library("SingleCellExperiment"))
suppressMessages(library("SummarizedExperiment"))

## set seed for reproducibility
set.seed(123)
sce <- SingleCellExperiment(
  matrix(
    stats::rpois(50000, lambda = 5), nrow = 500, ncol = 100, 
    dimnames = list(paste0("Gene", seq(500)), paste0("RHC", seq(100)))
  ),
  colData = data.frame(
    Cell_ID = paste0("RHC", seq(100)),
    Cell_Type = sample(
      x = paste0("CellType", seq(5)), size = 100, replace = TRUE
    )
  ),
  rowData = data.frame(
    Gene_ID = paste0("Gene", seq(500))
  )
)

se <- SummarizedExperiment(
  matrix(
    stats::rpois(10000, lambda = 5), nrow = 500, ncol = 20, 
    dimnames = list(paste0("Gene", seq(500)), paste0("Sample_", seq(20)))
  ),
  colData = data.frame(
    Sample_ID = paste0("Sample_", seq(20))
  ),
  rowData = data.frame(
    Gene_ID = paste0("Gene", seq(500))
  )
)

Then, we create the DigitalDLSorter object as follows:

DDLSToy <- createDDLSobject(
  sc.data = sce,
  sc.cell.ID.column = "Cell_ID",
  sc.gene.ID.column = "Gene_ID",
  sc.filt.genes.cluster = FALSE, 
  sc.log.FC = FALSE,
  bulk.data = se,
  bulk.sample.ID.column = "Sample_ID",
  bulk.gene.ID.column = "Gene_ID",
  project = "ToyExample"
)

## === Processing bulk transcriptomics data

## 'as(<dgCMatrix>, "dgTMatrix")' is deprecated.
## Use 'as(., "TsparseMatrix")' instead.
## See help("Deprecated") and help("Matrix-deprecated").

##       - Filtering features:

##          - Selected features: 500

##          - Discarded features: 0

##

## === Processing single-cell data

##       - Filtering features:

##          - Selected features: 500

##          - Discarded features: 0

## 
## === No mitochondrial genes were found by using ^mt- as regrex

## 
## === Final number of dimensions for further analyses: 500

DDLSToy

## An object of class DigitalDLSorter 
## Real single-cell profiles:
##   500 features and 100 cells
##   rownames: Gene307 Gene379 Gene344 ... Gene344 Gene157 Gene48 Gene462 
##   colnames: RHC54 RHC88 RHC80 ... RHC80 RHC31 RHC61 RHC68 
## Bulk samples to deconvolute:
##   Bulk.DT bulk samples:
##     500 features and 20 samples
##     rownames: Gene324 Gene173 Gene332 ... Gene332 Gene463 Gene422 Gene41 
##     colnames: Sample_19 Sample_4 Sample_9 ... Sample_9 Sample_5 Sample_10 Sample_8 
## Project: ToyExample

In documentation, you can see all the parameters that createDDLSobject offers to process the loaded data, such as sc.min.counts and sc.min.cells, etc. In this case, we are setting sc.log.FC = FALSE due to the fact these are simulated data and there is no biological signal in every cell type. However, this parameter should be set to TRUE when working with actual data.

In addition, in case of working with very large scRNA-Seq datasets, digitalDLSorteR allows to use HDF5 files as back-end to handle data that do not fit in RAM using the HDF5Array and DelayedArray packages. We only recommend them when analyzing huge amounts of data that don’t fit in RAM. HDF5 files, despite being very powerful and useful for dealing with RAM problems, make processes much slower. As an example, the following code chunk would create an HDF5 file with the scRNA-seq data that allows working without loading them into RAM. See the documentation for more details.

DDLSToy <- createDDLSobject(
sc.data = sce,
  sc.cell.ID.column = "Cell_ID",
  sc.gene.ID.column = "Gene_ID",
  sc.filt.genes.cluster = FALSE, 
  sc.log.FC = FALSE,
  sc.gene.ID.column = "external_gene_name",
  sc.file.backend = "singlecell_data.h5",
  project = "ToyExampleBreast"
)

Oversampling of single-cell profiles

digitalDLSorteR offers the possibility to simulate new single-cell profiles from real ones to increase signal and variability in small datasets or when under-represented cell types are present. This step is optional but recommended in these situations. The estimateZinbwaveParams and simSCProfiles functions are used for this purpose.

Tuning of the ZINB-WaVE model to simulate new single-cell profiles

First step is to estimate a set of parameters that fit the real single-cell data to simulate new realistic single-cell profiles. We chose the ZINB-WaVE framework (Risso et al. 2018) that estimates the parameters of a ZINB (zero-inflated negative binomial) distribution. It was chosen for its ability to accommodate not only variability within a particular cell type, but also variability within the entire experiment.

This process is performed by the estimateZinbwaveParams function, which makes use of the zinbwave package. You must specify the column corresponding to cell types in the cells metadata, and other cell/gene covariates can be added based on your experimental design, such as patient, gender or gene length. This process may take a few minutes to run, so be patient. In any case, you can adjust the number of used threads in some steps during the estimation with the threads argument depending on your computational resources by the BiocParallel package. In case of large datasets with some cell types under-represented, the subset.cells parameter allows making a subset of cells to speed up the process. With the following code, a total of 40 cells will be taken from the original scRNA-Seq data and used to fit a ZINB-WaVE model.

DDLSToy <- estimateZinbwaveParams(
  object = DDLSToy,
  cell.ID.column = "Cell_ID",
  gene.ID.column = "Gene_ID",
  cell.type.column = "Cell_Type",
  subset.cells = 40,
  threads = 1,
  verbose = TRUE
)

## === Setting parallel environment to 1 thread(s)

## === Estimating parameters for all cell types in the experiment

## === Creating cell model matrix based on Cell_Type columns:

##  ~Cell_Type

## === Number of cells for each cell type:
##     - CellType1: 11
##     - CellType2: 6
##     - CellType3: 10
##     - CellType4: 7
##     - CellType5: 6

## === Creating gene model matrix without gene covariates

## === Running estimation process (Start time 10:41:35)

## === Removing genes without expression in any cell

## >>> Fitting ZINB-WaVE model

## Create model:

## ok

## Initialize parameters:

## ok

## Optimize parameters:

## Iteration 1

## penalized log-likelihood = -53964.80172306

## After dispersion optimization = -43181.4550995101

##    user  system elapsed 
##   1.409   0.028   1.437

## After right optimization = -42586.6075796502

## After orthogonalization = -42586.6075796502

##    user  system elapsed 
##   0.145   0.000   0.144

## After left optimization = -42577.0574792499

## After orthogonalization = -42577.0574792499

## Iteration 2

## penalized log-likelihood = -42577.0574792499

## After dispersion optimization = -42577.0574953121

##    user  system elapsed 
##   0.883   0.008   0.891

## After right optimization = -42575.0718800638

## After orthogonalization = -42575.0718800638

##    user  system elapsed 
##   0.113   0.000   0.113

## After left optimization = -42574.0730998069

## After orthogonalization = -42574.0730998069

## Iteration 3

## penalized log-likelihood = -42574.0730998069

## ok

## 
## DONE

## 
## Invested time: 5.09

DDLSToy

## An object of class DigitalDLSorter 
## Real single-cell profiles:
##   500 features and 100 cells
##   rownames: Gene346 Gene309 Gene385 ... Gene385 Gene196 Gene93 Gene198 
##   colnames: RHC40 RHC56 RHC79 ... RHC79 RHC26 RHC88 RHC78 
## ZinbModel object:
##   40 samples;   500 genes.
##   5 sample-level covariate(s) (mu);   5 sample-level covariate(s) (pi);
##   1 gene-level covariate(s) (mu);   1 gene-level covariate(s) (pi);
##   0 latent factor(s).
## Bulk samples to deconvolute:
##   Bulk.DT bulk samples:
##     500 features and 20 samples
##     rownames: Gene6 Gene261 Gene7 ... Gene7 Gene379 Gene182 Gene71 
##     colnames: Sample_11 Sample_6 Sample_4 ... Sample_4 Sample_1 Sample_18 Sample_17 
## Project: ToyExample

Simulating new single-cell profiles

Once ZINB-WaVE parameters have been estimated, the simSCProfiles function uses them to simulate new single-cell profiles based on the real ones. It is done by randomly sampling from a negative binomial distribution with the estimated ZINB parameters \(\mu\) and \(\theta\), and introducing dropouts by sampling from a binomial distribution with the estimated probability \(\pi\). You must specify the number of cell profiles per cell type to be generated (n.cells). For example, if your data set is composed of 5 cell types and n.cells is equal to 10, the number of simulated profiles will be 50.

DDLSToy <- simSCProfiles(
  object = DDLSToy,
  cell.ID.column = "Cell_ID",
  cell.type.column = "Cell_Type",
  n.cells = 10,
  suffix.names = "_Simul",
  verbose = TRUE
)

## === Getting parameters from model:

##     - mu: 40, 500

##     - pi: 40, 500

##     - Theta: 500

## === Selected cell type(s) from ZINB-WaVE model (5 cell type(s)):

##     - CellType2
##     - CellType3
##     - CellType4
##     - CellType5
##     - CellType1

## === Simulated matrix dimensions:

##     - n (cells): 50

##     - J (genes): 500

##     - i (# entries): 25000

## 
## DONE

These simulated single-cell profiles are stored in single.cell.simul slot to be used to simulate new bulk RNA-Seq profiles with a known cell composition.

DDLSToy

## An object of class DigitalDLSorter 
## Real single-cell profiles:
##   500 features and 100 cells
##   rownames: Gene179 Gene248 Gene230 ... Gene230 Gene160 Gene97 Gene240 
##   colnames: RHC41 RHC64 RHC85 ... RHC85 RHC98 RHC29 RHC17 
## ZinbModel object:
##   40 samples;   500 genes.
##   5 sample-level covariate(s) (mu);   5 sample-level covariate(s) (pi);
##   1 gene-level covariate(s) (mu);   1 gene-level covariate(s) (pi);
##   0 latent factor(s).
## Simulated single-cell profiles:
##   500 features and 50 cells
##   rownames: Gene366 Gene318 Gene375 ... Gene375 Gene82 Gene140 Gene346 
##   colnames: CellType3_Simul20 CellType5_Simul32 CellType5_Simul31 ... CellType5_Simul31 CellType4_Simul27 CellType5_Simul40 CellType1_Simul48 
## Bulk samples to deconvolute:
##   Bulk.DT bulk samples:
##     500 features and 20 samples
##     rownames: Gene438 Gene4 Gene30 ... Gene30 Gene216 Gene278 Gene246 
##     colnames: Sample_15 Sample_2 Sample_6 ... Sample_6 Sample_3 Sample_7 Sample_1 
## Project: ToyExample

In this step, it is also possible to store the new simulated single-cell profiles in a HDF5 file. Indeed, they can be simulated in batches, avoiding loading all data into RAM. The code would be as follows:

DDLSToy <- simSCProfiles(
  object = DDLSToy,
  cell.ID.column = "Cell_ID",
  cell.type.column = "Cell_Type",
  n.cells = 10,
  suffix.names = "_Simul",
  file.backend = "simulated_singlecell_data.h5",
  block.processing = TRUE,
  block.size = 20, # number of single-cell profiles simulated per batch
  verbose = TRUE
)

Generation of cell composition matrix for pseudo-bulk RNA-Seq samples

To simulate pseudobulk samples with a known cell composition, it is necessary to generate a cell composition matrix that determines the proportion of every cell type in every sample. This is carried out using the generateBulkCellMatrix function that stores these results in the prob.cell.types slot as a ProbMatrixCellTypes object.

This process starts with dividing single-cell profiles into training and test data (see train.freq.cells argument in documentation). Each subset will be used to generate each subset of bulk samples (training and test) in order to avoid any distortion of results during model evaluation. Then, proportions are generated using six different methods to avoid biases during training due to the cellular composition of the simulated bulk RNA-Seq samples:

Cell proportions are randomly sampled from a truncated uniform distribution with predefined limits according to a priori knowledge of the abundance of each cell type (see prob.design argument). This information can be inferred from the single cell analysis itself or from the literature.
A second set is generated by randomly permuting cell type labels from a distribution generated by the previous method.
Cell proportions are randomly sampled as by method 1 without replacement.
Using the last method to generate proportions, cell types labels are randomly sampled.
Cell proportions are randomly sampled from a Dirichlet distribution.
Pseudo-bulk RNA-Seq samples composed of the same cell type are generated in order to provide ‘pure’ pseudobulk samples.

Proportion of each type of sample can be set by the proportion.method argument. Moreover, prob.sparsity controls the number of zeros (number of cell types that will be zero in each sample) produced by each method. This parameter was introduced in order to increase the level of sparsity in the simulated cell composition matrices in order to increase the number of situations the model is trained with. Finally, other important parameters are n.cells, which determines the number of cells that will compose each pseudobulk sample, and num.bulk.samples, which defines the total number of pseudobulk samples generated (training + test subsets). The code would be as follows:

## for reproducibility
set.seed(123)

## prior knowledge for prob.design argument
probMatrix <- data.frame(
  Cell_Type = paste0("CellType", seq(5)),
  from = c(rep(1, 2), 1, rep(30, 2)),
  to = c(rep(15, 2), 50, rep(70, 2))
)

DDLSToy <- generateBulkCellMatrix(
  object = DDLSToy,
  cell.ID.column = "Cell_ID",
  cell.type.column = "Cell_Type",
  prob.design = probMatrix,
  num.bulk.samples = 250,
  n.cells = 100,
  verbose = TRUE
)

## 
## === The number of bulk RNA-Seq samples that will be generated is equal to 250

## 
## === Training set cells by type:

##     - CellType2: 18
##     - CellType3: 29
##     - CellType4: 20
##     - CellType5: 18
##     - CellType1: 27

## === Test set cells by type:

##     - CellType2: 5
##     - CellType3: 6
##     - CellType4: 6
##     - CellType5: 11
##     - CellType1: 10

## === Probability matrix for training data:

##     - Bulk RNA-Seq samples: 188
##     - Cell types: 5

## === Probability matrix for test data:

##     - Bulk RNA-Seq samples: 62
##     - Cell types: 5

## DONE

DDLSToy

## An object of class DigitalDLSorter 
## Real single-cell profiles:
##   500 features and 100 cells
##   rownames: Gene112 Gene101 Gene73 ... Gene73 Gene127 Gene284 Gene381 
##   colnames: RHC18 RHC60 RHC68 ... RHC68 RHC92 RHC70 RHC43 
## ZinbModel object:
##   40 samples;   500 genes.
##   5 sample-level covariate(s) (mu);   5 sample-level covariate(s) (pi);
##   1 gene-level covariate(s) (mu);   1 gene-level covariate(s) (pi);
##   0 latent factor(s).
## Simulated single-cell profiles:
##   500 features and 50 cells
##   rownames: Gene108 Gene259 Gene89 ... Gene89 Gene330 Gene431 Gene452 
##   colnames: CellType3_Simul16 CellType2_Simul1 CellType5_Simul36 ... CellType5_Simul36 CellType1_Simul43 CellType4_Simul23 CellType4_Simul26 
## Cell type composition matrices:
##   Cell type matrix for traindata: 188 bulk samples and 5 cell types 
##   Cell type matrix for testdata: 62 bulk samples and 5 cell types 
## Bulk samples to deconvolute:
##   Bulk.DT bulk samples:
##     500 features and 20 samples
##     rownames: Gene256 Gene391 Gene69 ... Gene69 Gene137 Gene327 Gene277 
##     colnames: Sample_16 Sample_14 Sample_7 ... Sample_7 Sample_2 Sample_8 Sample_1 
## Project: ToyExample

Remember that this is a simulated example. In real circumstances, depending on the number of single-cell profiles loaded/simulated at the beginning and the computational resources, about 15,000-20,000 samples would be recommended. You can inspect the cell composition matrix created in this step with the getter function getProbMatrix:

head(getProbMatrix(DDLSToy, type.data = "train"))

##        CellType1 CellType2 CellType3 CellType4 CellType5
## Bulk_1         0         0         0        52        48
## Bulk_2        74        26         0         0         0
## Bulk_3        16         0         9         0        75
## Bulk_4         0       100         0         0         0
## Bulk_5         8         1        25        43        23
## Bulk_6         8         7        17        35        33

tail(getProbMatrix(DDLSToy, type.data = "train"))

##          CellType1 CellType2 CellType3 CellType4 CellType5
## Bulk_183         0         0         0         0       100
## Bulk_184         0         0         0         0       100
## Bulk_185         0         0         0         0       100
## Bulk_186         0         0         0         0       100
## Bulk_187         0         0         0         0       100
## Bulk_188         0         0         0         0       100

Moreover, distributions can be plotted using the showProbPlot function:

lapply(
  1:6, function(x) {
    showProbPlot(
      DDLSToy, type.data = "train", set = x, type.plot = "boxplot"
    )
  }
)

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

Simulation of pseudo-bulk RNA-Seq samples with known cell composition

Now, the simulated cell proportions are used to create the pseudobulk samples. They are simulated by aggregating single-cell profiles of each cell type according to these proportions. The idea is to simulate a real bulk RNA-Seq data in which the gene expression levels of each cell are aggregated into a single sample. Therefore, this expression matrix will be generated according to the following equation:

\[\begin{equation} T_{ij} = \sum_{k = 1}^{K} \sum_{z = 1}^Z C_{izk} \end{equation}\]

\[\begin{equation*} \textrm{such as} \left\{ \begin{array}{l} i = 1 \ldots M;\\ j = 1 \ldots N \\ Z = 1 \ldots \textrm{n.cells} \cdot P_{kj} \\ \sum_{k = 1}^K Z \cdot P_{kj} = \textrm{n.cells} \end{array} \right. \end{equation*}\]

where \(T_{ij}\) is the expression level of gene \(i\) in bulk sample \(j\); \(C_{izk}\) is the expression level of gene \(i\) in cell \(z\) in bulk sample \(j\); and \(P_{kj}\) is the proportion of cell type \(k\) in bulk sample \(j\) (the cell composition matrix generated in the previous step). \(Z\) represents the number of cells that will make up the proportion of cell type \(k\) in the bulk sample \(j\) and corresponds to the n.cells parameter from the generateBulkCellMatrix function. Cells are randomly sampled based on their cell type and how they were divided into training and test subsets. This step is performed by simBulkProfiles as follows:

DDLSToy <- simBulkProfiles(
  object = DDLSToy, type.data = "both"
)

## === Setting parallel environment to 1 thread(s)

## 
## === Generating train bulk samples:

## 
## === Generating test bulk samples:

## 
## DONE

These samples are stored as a SummarizedExperiment object in the bulk.simul slot where they can be inspected at any time:

DDLSToy

## An object of class DigitalDLSorter 
## Real single-cell profiles:
##   500 features and 100 cells
##   rownames: Gene482 Gene459 Gene193 ... Gene193 Gene130 Gene233 Gene470 
##   colnames: RHC46 RHC7 RHC52 ... RHC52 RHC89 RHC84 RHC39 
## ZinbModel object:
##   40 samples;   500 genes.
##   5 sample-level covariate(s) (mu);   5 sample-level covariate(s) (pi);
##   1 gene-level covariate(s) (mu);   1 gene-level covariate(s) (pi);
##   0 latent factor(s).
## Simulated single-cell profiles:
##   500 features and 50 cells
##   rownames: Gene167 Gene21 Gene69 ... Gene69 Gene88 Gene127 Gene397 
##   colnames: CellType3_Simul11 CellType5_Simul34 CellType2_Simul5 ... CellType2_Simul5 CellType4_Simul21 CellType2_Simul4 CellType4_Simul30 
## Cell type composition matrices:
##   Cell type matrix for traindata: 188 bulk samples and 5 cell types 
##   Cell type matrix for testdata: 62 bulk samples and 5 cell types 
## Simulated bulk samples:
##   train bulk samples:
##     500 features and 188 samples
##     rownames: Gene266 Gene347 Gene316 ... Gene316 Gene268 Gene500 Gene5 
##     colnames: Bulk_156 Bulk_133 Bulk_71 ... Bulk_71 Bulk_105 Bulk_29 Bulk_94 
##   test bulk samples:
##     500 features and 62 samples
##     rownames: Gene464 Gene82 Gene277 ... Gene277 Gene232 Gene364 Gene167 
##     colnames: Bulk_48 Bulk_9 Bulk_58 ... Bulk_58 Bulk_59 Bulk_52 Bulk_35 
## Bulk samples to deconvolute:
##   Bulk.DT bulk samples:
##     500 features and 20 samples
##     rownames: Gene300 Gene113 Gene274 ... Gene274 Gene283 Gene400 Gene130 
##     colnames: Sample_11 Sample_10 Sample_1 ... Sample_1 Sample_8 Sample_14 Sample_17 
## Project: ToyExample

The simBulkProfiles offers different ways to simulate these pseudobulk samples, but in our experience, the best option is to aggregate raw counts and normalize them afterwards (pseudobulk.function = "AddRawCount").

Again, these pseudobulk samples can be stored as an HDF5 file. This is the most recommended step of digitalDLSorteR to use this functionality, as it is the most computationally expensive part of the package and these samples will only be accessed during training and evaluation of Deep Neural Network (DNN) model. As in simSCProfiles, samples can be simulated in batches and a desired number of threads can also be set:

DDLSToy <- simBulkProfiles(
  object = DDLSToy, 
  type.data = "both", 
  file.backend = "pseudobulk_samples.h5",
  block.processing = TRUE,
  block.size = 1000, 
  threads = 2
)

Deep Neural Network training

Once the pseudobulk samples have been generated, a deep neural network can be trained and evaluated. trainDigitalDLSorterModel is the function in charge of both steps and uses the keras package with tensorflow as back-end. If you want more information about keras or have any problems during its installation, please see Keras/TensorFlow installation and configuration for more details. In any case, the installTFpython function automates this process, so we recommend its use.

In terms of architecture and model parameters, trainDigitalDLSorterModel implements by default two hidden layers with 200 neurons each, although any of these parameters can be modified through the trainDigitalDLSorterModel parameters. In addition, for a more customized model, it is possible to provide a pre-built model in the custom.model parameter. See the documentation for more details.

The code with default parameters is as follows:

DDLSToy <- trainDDLSModel(
  object = DDLSToy, scaling = "standardize", batch.size = 12
)

## === Training and test from stored data

##     Using only simulated bulk samples
## 
##     Using only simulated bulk samples

## Model: "DigitalDLSorter"
## _____________________________________________________________________
## Layer (type)                   Output Shape               Param #    
## =====================================================================
## Dense1 (Dense)                 (None, 200)                100200     
## _____________________________________________________________________
## BatchNormalization1 (BatchNorm (None, 200)                800        
## _____________________________________________________________________
## Activation1 (Activation)       (None, 200)                0          
## _____________________________________________________________________
## Dropout1 (Dropout)             (None, 200)                0          
## _____________________________________________________________________
## Dense2 (Dense)                 (None, 200)                40200      
## _____________________________________________________________________
## BatchNormalization2 (BatchNorm (None, 200)                800        
## _____________________________________________________________________
## Activation2 (Activation)       (None, 200)                0          
## _____________________________________________________________________
## Dropout2 (Dropout)             (None, 200)                0          
## _____________________________________________________________________
## Dense3 (Dense)                 (None, 5)                  1005       
## _____________________________________________________________________
## BatchNormalization3 (BatchNorm (None, 5)                  20         
## _____________________________________________________________________
## ActivationSoftmax (Activation) (None, 5)                  0          
## =====================================================================
## Total params: 143,025
## Trainable params: 142,215
## Non-trainable params: 810
## _____________________________________________________________________

## 
## === Training DNN with 188 samples:

## Epoch 1/60
## 
 1/16 [>.............................] - ETA: 10s - loss: 0.9629 - accuracy: 0.4167 - mean_absolute_error: 0.1953 - categorical_accuracy: 0.4167
16/16 [==============================] - 1s 2ms/step - loss: 0.3896 - accuracy: 0.6809 - mean_absolute_error: 0.1227 - categorical_accuracy: 0.6809
## Epoch 2/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.2886 - accuracy: 0.7500 - mean_absolute_error: 0.1089 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.2132 - accuracy: 0.7766 - mean_absolute_error: 0.0878 - categorical_accuracy: 0.7766
## Epoch 3/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.3014 - accuracy: 0.7500 - mean_absolute_error: 0.1058 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 1ms/step - loss: 0.2016 - accuracy: 0.7340 - mean_absolute_error: 0.0857 - categorical_accuracy: 0.7340
## Epoch 4/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.2132 - accuracy: 0.7500 - mean_absolute_error: 0.0914 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 1ms/step - loss: 0.1901 - accuracy: 0.7872 - mean_absolute_error: 0.0819 - categorical_accuracy: 0.7872
## Epoch 5/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.2105 - accuracy: 0.9167 - mean_absolute_error: 0.0817 - categorical_accuracy: 0.9167
16/16 [==============================] - 0s 1ms/step - loss: 0.1565 - accuracy: 0.8617 - mean_absolute_error: 0.0729 - categorical_accuracy: 0.8617
## Epoch 6/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1350 - accuracy: 0.8333 - mean_absolute_error: 0.0620 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 2ms/step - loss: 0.1756 - accuracy: 0.7447 - mean_absolute_error: 0.0792 - categorical_accuracy: 0.7447
## Epoch 7/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.2695 - accuracy: 0.5833 - mean_absolute_error: 0.1144 - categorical_accuracy: 0.5833
16/16 [==============================] - 0s 1ms/step - loss: 0.1974 - accuracy: 0.7287 - mean_absolute_error: 0.0855 - categorical_accuracy: 0.7287
## Epoch 8/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0941 - accuracy: 0.7500 - mean_absolute_error: 0.0588 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 1ms/step - loss: 0.1745 - accuracy: 0.8245 - mean_absolute_error: 0.0790 - categorical_accuracy: 0.8245
## Epoch 9/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1302 - accuracy: 0.7500 - mean_absolute_error: 0.0691 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 1ms/step - loss: 0.1596 - accuracy: 0.7660 - mean_absolute_error: 0.0761 - categorical_accuracy: 0.7660
## Epoch 10/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0769 - accuracy: 0.8333 - mean_absolute_error: 0.0473 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 1ms/step - loss: 0.1463 - accuracy: 0.7819 - mean_absolute_error: 0.0682 - categorical_accuracy: 0.7819
## Epoch 11/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1358 - accuracy: 0.8333 - mean_absolute_error: 0.0656 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 1ms/step - loss: 0.1376 - accuracy: 0.8245 - mean_absolute_error: 0.0679 - categorical_accuracy: 0.8245
## Epoch 12/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1465 - accuracy: 0.8333 - mean_absolute_error: 0.0639 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 1ms/step - loss: 0.1396 - accuracy: 0.7234 - mean_absolute_error: 0.0691 - categorical_accuracy: 0.7234
## Epoch 13/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1402 - accuracy: 0.7500 - mean_absolute_error: 0.0643 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 1ms/step - loss: 0.1441 - accuracy: 0.7713 - mean_absolute_error: 0.0702 - categorical_accuracy: 0.7713
## Epoch 14/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0914 - accuracy: 0.7500 - mean_absolute_error: 0.0464 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 1ms/step - loss: 0.1408 - accuracy: 0.7819 - mean_absolute_error: 0.0695 - categorical_accuracy: 0.7819
## Epoch 15/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1293 - accuracy: 0.7500 - mean_absolute_error: 0.0627 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1369 - accuracy: 0.7766 - mean_absolute_error: 0.0684 - categorical_accuracy: 0.7766
## Epoch 16/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1234 - accuracy: 0.7500 - mean_absolute_error: 0.0649 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1374 - accuracy: 0.7660 - mean_absolute_error: 0.0684 - categorical_accuracy: 0.7660
## Epoch 17/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1323 - accuracy: 0.5833 - mean_absolute_error: 0.0700 - categorical_accuracy: 0.5833
16/16 [==============================] - 0s 2ms/step - loss: 0.1445 - accuracy: 0.7021 - mean_absolute_error: 0.0711 - categorical_accuracy: 0.7021
## Epoch 18/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1323 - accuracy: 0.9167 - mean_absolute_error: 0.0690 - categorical_accuracy: 0.9167
16/16 [==============================] - 0s 2ms/step - loss: 0.1289 - accuracy: 0.8351 - mean_absolute_error: 0.0658 - categorical_accuracy: 0.8351
## Epoch 19/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1238 - accuracy: 0.8333 - mean_absolute_error: 0.0612 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 2ms/step - loss: 0.1260 - accuracy: 0.7500 - mean_absolute_error: 0.0645 - categorical_accuracy: 0.7500
## Epoch 20/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1846 - accuracy: 0.9167 - mean_absolute_error: 0.0811 - categorical_accuracy: 0.9167
16/16 [==============================] - 0s 2ms/step - loss: 0.1367 - accuracy: 0.7819 - mean_absolute_error: 0.0677 - categorical_accuracy: 0.7819
## Epoch 21/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0540 - accuracy: 0.8333 - mean_absolute_error: 0.0389 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 2ms/step - loss: 0.1321 - accuracy: 0.7766 - mean_absolute_error: 0.0665 - categorical_accuracy: 0.7766
## Epoch 22/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1275 - accuracy: 0.8333 - mean_absolute_error: 0.0680 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 2ms/step - loss: 0.1193 - accuracy: 0.7872 - mean_absolute_error: 0.0628 - categorical_accuracy: 0.7872
## Epoch 23/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1168 - accuracy: 1.0000 - mean_absolute_error: 0.0533 - categorical_accuracy: 1.0000
16/16 [==============================] - 0s 2ms/step - loss: 0.1410 - accuracy: 0.7553 - mean_absolute_error: 0.0700 - categorical_accuracy: 0.7553
## Epoch 24/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1146 - accuracy: 0.8333 - mean_absolute_error: 0.0583 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 2ms/step - loss: 0.1333 - accuracy: 0.7979 - mean_absolute_error: 0.0667 - categorical_accuracy: 0.7979
## Epoch 25/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1746 - accuracy: 0.5833 - mean_absolute_error: 0.0843 - categorical_accuracy: 0.5833
16/16 [==============================] - 0s 2ms/step - loss: 0.1238 - accuracy: 0.7819 - mean_absolute_error: 0.0639 - categorical_accuracy: 0.7819
## Epoch 26/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0735 - accuracy: 0.9167 - mean_absolute_error: 0.0471 - categorical_accuracy: 0.9167
16/16 [==============================] - 0s 1ms/step - loss: 0.1220 - accuracy: 0.7872 - mean_absolute_error: 0.0636 - categorical_accuracy: 0.7872
## Epoch 27/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1567 - accuracy: 0.9167 - mean_absolute_error: 0.0755 - categorical_accuracy: 0.9167
16/16 [==============================] - 0s 1ms/step - loss: 0.1237 - accuracy: 0.7926 - mean_absolute_error: 0.0637 - categorical_accuracy: 0.7926
## Epoch 28/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1334 - accuracy: 0.8333 - mean_absolute_error: 0.0707 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 1ms/step - loss: 0.1135 - accuracy: 0.7979 - mean_absolute_error: 0.0597 - categorical_accuracy: 0.7979
## Epoch 29/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1248 - accuracy: 0.5833 - mean_absolute_error: 0.0677 - categorical_accuracy: 0.5833
16/16 [==============================] - 0s 1ms/step - loss: 0.1318 - accuracy: 0.7340 - mean_absolute_error: 0.0660 - categorical_accuracy: 0.7340
## Epoch 30/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1227 - accuracy: 0.6667 - mean_absolute_error: 0.0665 - categorical_accuracy: 0.6667
16/16 [==============================] - 0s 1ms/step - loss: 0.1244 - accuracy: 0.7766 - mean_absolute_error: 0.0631 - categorical_accuracy: 0.7766
## Epoch 31/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1201 - accuracy: 0.6667 - mean_absolute_error: 0.0657 - categorical_accuracy: 0.6667
16/16 [==============================] - 0s 2ms/step - loss: 0.1131 - accuracy: 0.7819 - mean_absolute_error: 0.0604 - categorical_accuracy: 0.7819
## Epoch 32/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.2057 - accuracy: 0.7500 - mean_absolute_error: 0.0878 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1110 - accuracy: 0.7500 - mean_absolute_error: 0.0597 - categorical_accuracy: 0.7500
## Epoch 33/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1626 - accuracy: 0.5833 - mean_absolute_error: 0.0800 - categorical_accuracy: 0.5833
16/16 [==============================] - 0s 1ms/step - loss: 0.1146 - accuracy: 0.7128 - mean_absolute_error: 0.0624 - categorical_accuracy: 0.7128
## Epoch 34/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1155 - accuracy: 0.9167 - mean_absolute_error: 0.0549 - categorical_accuracy: 0.9167
16/16 [==============================] - 0s 2ms/step - loss: 0.1175 - accuracy: 0.8564 - mean_absolute_error: 0.0618 - categorical_accuracy: 0.8564
## Epoch 35/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1188 - accuracy: 0.7500 - mean_absolute_error: 0.0628 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1265 - accuracy: 0.7287 - mean_absolute_error: 0.0655 - categorical_accuracy: 0.7287
## Epoch 36/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0914 - accuracy: 0.7500 - mean_absolute_error: 0.0478 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1207 - accuracy: 0.7606 - mean_absolute_error: 0.0643 - categorical_accuracy: 0.7606
## Epoch 37/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0782 - accuracy: 1.0000 - mean_absolute_error: 0.0465 - categorical_accuracy: 1.0000
16/16 [==============================] - 0s 1ms/step - loss: 0.1069 - accuracy: 0.7447 - mean_absolute_error: 0.0584 - categorical_accuracy: 0.7447
## Epoch 38/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1257 - accuracy: 0.6667 - mean_absolute_error: 0.0678 - categorical_accuracy: 0.6667
16/16 [==============================] - 0s 1ms/step - loss: 0.1057 - accuracy: 0.7606 - mean_absolute_error: 0.0582 - categorical_accuracy: 0.7606
## Epoch 39/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0880 - accuracy: 0.4167 - mean_absolute_error: 0.0566 - categorical_accuracy: 0.4167
16/16 [==============================] - 0s 1ms/step - loss: 0.1129 - accuracy: 0.7606 - mean_absolute_error: 0.0589 - categorical_accuracy: 0.7606
## Epoch 40/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1115 - accuracy: 0.9167 - mean_absolute_error: 0.0640 - categorical_accuracy: 0.9167
16/16 [==============================] - 0s 1ms/step - loss: 0.1241 - accuracy: 0.7926 - mean_absolute_error: 0.0642 - categorical_accuracy: 0.7926
## Epoch 41/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1228 - accuracy: 0.8333 - mean_absolute_error: 0.0559 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 1ms/step - loss: 0.1116 - accuracy: 0.7553 - mean_absolute_error: 0.0609 - categorical_accuracy: 0.7553
## Epoch 42/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1125 - accuracy: 0.7500 - mean_absolute_error: 0.0661 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1181 - accuracy: 0.7606 - mean_absolute_error: 0.0620 - categorical_accuracy: 0.7606
## Epoch 43/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0752 - accuracy: 0.9167 - mean_absolute_error: 0.0449 - categorical_accuracy: 0.9167
16/16 [==============================] - 0s 2ms/step - loss: 0.1217 - accuracy: 0.7926 - mean_absolute_error: 0.0623 - categorical_accuracy: 0.7926
## Epoch 44/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1432 - accuracy: 0.5833 - mean_absolute_error: 0.0727 - categorical_accuracy: 0.5833
16/16 [==============================] - 0s 2ms/step - loss: 0.1089 - accuracy: 0.8191 - mean_absolute_error: 0.0585 - categorical_accuracy: 0.8191
## Epoch 45/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0471 - accuracy: 0.7500 - mean_absolute_error: 0.0391 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1170 - accuracy: 0.7713 - mean_absolute_error: 0.0623 - categorical_accuracy: 0.7713
## Epoch 46/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1493 - accuracy: 0.6667 - mean_absolute_error: 0.0740 - categorical_accuracy: 0.6667
16/16 [==============================] - 0s 2ms/step - loss: 0.1253 - accuracy: 0.7287 - mean_absolute_error: 0.0664 - categorical_accuracy: 0.7287
## Epoch 47/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0699 - accuracy: 0.7500 - mean_absolute_error: 0.0506 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1002 - accuracy: 0.8085 - mean_absolute_error: 0.0552 - categorical_accuracy: 0.8085
## Epoch 48/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0897 - accuracy: 0.8333 - mean_absolute_error: 0.0596 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 2ms/step - loss: 0.1183 - accuracy: 0.7713 - mean_absolute_error: 0.0620 - categorical_accuracy: 0.7713
## Epoch 49/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0650 - accuracy: 0.7500 - mean_absolute_error: 0.0375 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1095 - accuracy: 0.7500 - mean_absolute_error: 0.0593 - categorical_accuracy: 0.7500
## Epoch 50/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0777 - accuracy: 0.7500 - mean_absolute_error: 0.0497 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1201 - accuracy: 0.7553 - mean_absolute_error: 0.0637 - categorical_accuracy: 0.7553
## Epoch 51/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1105 - accuracy: 0.9167 - mean_absolute_error: 0.0515 - categorical_accuracy: 0.9167
16/16 [==============================] - 0s 2ms/step - loss: 0.1002 - accuracy: 0.8191 - mean_absolute_error: 0.0541 - categorical_accuracy: 0.8191
## Epoch 52/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0507 - accuracy: 0.8333 - mean_absolute_error: 0.0380 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 2ms/step - loss: 0.1012 - accuracy: 0.8298 - mean_absolute_error: 0.0559 - categorical_accuracy: 0.8298
## Epoch 53/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0737 - accuracy: 0.5000 - mean_absolute_error: 0.0500 - categorical_accuracy: 0.5000
16/16 [==============================] - 0s 2ms/step - loss: 0.1058 - accuracy: 0.8032 - mean_absolute_error: 0.0578 - categorical_accuracy: 0.8032
## Epoch 54/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.2556 - accuracy: 0.7500 - mean_absolute_error: 0.1024 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1196 - accuracy: 0.7926 - mean_absolute_error: 0.0623 - categorical_accuracy: 0.7926
## Epoch 55/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.2113 - accuracy: 0.6667 - mean_absolute_error: 0.0936 - categorical_accuracy: 0.6667
16/16 [==============================] - 0s 2ms/step - loss: 0.1011 - accuracy: 0.7660 - mean_absolute_error: 0.0570 - categorical_accuracy: 0.7660
## Epoch 56/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1566 - accuracy: 0.7500 - mean_absolute_error: 0.0833 - categorical_accuracy: 0.7500
16/16 [==============================] - 0s 2ms/step - loss: 0.1073 - accuracy: 0.7819 - mean_absolute_error: 0.0601 - categorical_accuracy: 0.7819
## Epoch 57/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1200 - accuracy: 0.9167 - mean_absolute_error: 0.0558 - categorical_accuracy: 0.9167
16/16 [==============================] - 0s 2ms/step - loss: 0.1269 - accuracy: 0.7819 - mean_absolute_error: 0.0644 - categorical_accuracy: 0.7819
## Epoch 58/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0935 - accuracy: 0.5833 - mean_absolute_error: 0.0570 - categorical_accuracy: 0.5833
16/16 [==============================] - 0s 2ms/step - loss: 0.0954 - accuracy: 0.7766 - mean_absolute_error: 0.0540 - categorical_accuracy: 0.7766
## Epoch 59/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.0849 - accuracy: 0.8333 - mean_absolute_error: 0.0501 - categorical_accuracy: 0.8333
16/16 [==============================] - 0s 1ms/step - loss: 0.1125 - accuracy: 0.7713 - mean_absolute_error: 0.0608 - categorical_accuracy: 0.7713
## Epoch 60/60
## 
 1/16 [>.............................] - ETA: 0s - loss: 0.1114 - accuracy: 0.6667 - mean_absolute_error: 0.0614 - categorical_accuracy: 0.6667
16/16 [==============================] - 0s 2ms/step - loss: 0.0891 - accuracy: 0.7713 - mean_absolute_error: 0.0521 - categorical_accuracy: 0.7713

## 
## === Evaluating DNN in test data (62 samples)

## 
1/2 [==============>...............] - ETA: 0s - loss: 0.3366 - accuracy: 0.7812 - mean_absolute_error: 0.1253 - categorical_accuracy: 0.7812
2/2 [==============================] - 0s 2ms/step - loss: 0.3797 - accuracy: 0.7742 - mean_absolute_error: 0.1356 - categorical_accuracy: 0.7742

##    - loss: 0.3797
##    - accuracy: 0.7742
##    - mean_absolute_error: 0.1356
##    - categorical_accuracy: 0.7742

## 
## === Generating prediction results using test data

## 
1/2 [==============>...............] - ETA: 0s
2/2 [==============================] - 0s 1ms/step

## DONE

DDLSToy will contain a DigitalDLSorterDNN object with all the information associated with the model in the trained.model slot: a keras.engine.sequential.Sequential object with the trained model, metrics and loss function histories during training, and prediction results on test data.

DDLSToy

## An object of class DigitalDLSorter 
## Real single-cell profiles:
##   500 features and 100 cells
##   rownames: Gene470 Gene322 Gene69 ... Gene69 Gene366 Gene196 Gene176 
##   colnames: RHC10 RHC71 RHC40 ... RHC40 RHC59 RHC84 RHC44 
## ZinbModel object:
##   40 samples;   500 genes.
##   5 sample-level covariate(s) (mu);   5 sample-level covariate(s) (pi);
##   1 gene-level covariate(s) (mu);   1 gene-level covariate(s) (pi);
##   0 latent factor(s).
## Simulated single-cell profiles:
##   500 features and 50 cells
##   rownames: Gene321 Gene242 Gene291 ... Gene291 Gene497 Gene431 Gene56 
##   colnames: CellType2_Simul8 CellType5_Simul37 CellType1_Simul45 ... CellType1_Simul45 CellType4_Simul22 CellType3_Simul11 CellType5_Simul31 
## Cell type composition matrices:
##   Cell type matrix for traindata: 188 bulk samples and 5 cell types 
##   Cell type matrix for testdata: 62 bulk samples and 5 cell types 
## Simulated bulk samples:
##   train bulk samples:
##     500 features and 188 samples
##     rownames: Gene488 Gene183 Gene91 ... Gene91 Gene21 Gene53 Gene484 
##     colnames: Bulk_159 Bulk_46 Bulk_106 ... Bulk_106 Bulk_139 Bulk_72 Bulk_2 
##   test bulk samples:
##     500 features and 62 samples
##     rownames: Gene259 Gene346 Gene26 ... Gene26 Gene460 Gene257 Gene431 
##     colnames: Bulk_32 Bulk_29 Bulk_8 ... Bulk_8 Bulk_54 Bulk_62 Bulk_31 
## Trained model: 60 epochs
##   Training metrics (last epoch):
##     loss: 0.0891
##     accuracy: 0.7713
##     mean_absolute_error: 0.0521
##     categorical_accuracy: 0.7713
##   Evaluation metrics on test data:
##     loss: 0.3797
##     accuracy: 0.7742
##     mean_absolute_error: 0.1356
##     categorical_accuracy: 0.7742 
## Bulk samples to deconvolute:
##   Bulk.DT bulk samples:
##     500 features and 20 samples
##     rownames: Gene281 Gene417 Gene328 ... Gene328 Gene421 Gene491 Gene341 
##     colnames: Sample_5 Sample_11 Sample_10 ... Sample_10 Sample_2 Sample_17 Sample_4 
## Project: ToyExample

Since this is a ‘toy’ example, results are not very accurate. For a real example of a well trained model, see the Performance of a real model: deconvolution of colorectal cancer samples vignette.

`on.the.fly` argument

The on.the.fly argument of trainDigitalDLSorterModel allows generating pseudobulk samples ‘on the fly’. It means that it is possible to skip the simulation of pseudobulk samples performed by the simBulkProfiles function, and create the samples at the same time as the neural network is being trained. Of course, running times during training will increase, but data are not loaded into RAM or stored in large HDF5 files. To use this functionality, it is only necessary to set on.the.fly = TRUE as follows:

DDLSToy <- trainDDLSModel(object = DDLSToy, on.the.fly = TRUE)

Evaluation of trained deconvolution model on test data: visualization of results

While the prediction results on test data are informative about the performance of the model, a more comprehensive analysis is needed. For this task, digitalDLSorteR provides a set of visualization functions to represent a variety of error metrics in different ways.

First, calculateEvalMetrics is needed to calculate the error metrics to be plotted. By default, absolute error (AbsErr), proportional absolute error (ppAbsErr), squared error (SqrErr) and proportional squared error (ppSqrErr) are calculated for every sample of test data. Furthermore, they are all aggregated using their average values according to three criteria: each cell type (CellType), proportion bins of 0.1 (pBin) and number of different cell types (nCellTypes).

DDLSToy <- calculateEvalMetrics(object = DDLSToy)

Now, these results can be plotted by the following battery of functions.

`distErrorPlot` and `barErrorPlot`: error distributions

The distErrorPlot function allows plotting how errors are distributed in different ways. Moreover, it allows to split the plots in different panels representing how errors are distributed by a given variable. Available variables are cell types (CellType) and number of cell types in samples (nCellTypes). In the following example, we will represent the overall errors by cell type.

distErrorPlot(
  DDLSToy,
  error = "AbsErr",
  x.by = "CellType",
  color.by = "CellType", 
  error.labels = FALSE, 
  type = "boxplot",
  size.point = 1
)

Now, if you want to know if there is a bias towards a specific cell type, yo can use facet.by parameter to split plots by cell type:

distErrorPlot(
  DDLSToy,
  error = "AbsErr",
  facet.by = "CellType",
  color.by = "nCellTypes", 
  type = "violinplot",
  size.point = 1
)

It is also possible to represent errors by number of different cell types in samples:

distErrorPlot(
  DDLSToy,
  error = "AbsErr",
  color.by = "CellType", 
  facet.by = "nCellTypes",
  type = "boxplot",
  size.point = 1
)

Finally, with barErrorPlot, the mean error values with their corresponding dispersion ranges can be plotted as follows:

barErrorPlot(DDLSToy, error = "MAE", by = "CellType")

`corrExpPredPlot`: correlation plots between predicted and expected proportions

Ideally, the model should provide predictions that linearly match the actual proportions. Therefore, you can generate correlation plots to assess the model performance. By default, the Pearson’s coefficient correlation (\(R\)) and the concordance correlation coefficient (\(CCC\)) are shown as annotations on the plots. The latter is a more realistic measure as it decreases as the points move away from the identity.

corrExpPredPlot(
  DDLSToy,
  color.by = "CellType",
  size.point = 1,
  corr = "both"
)

## `geom_smooth()` using formula = 'y ~ x'

As in the previous case, plots can be split according to different variables. Now, let’s split the results by CellType and nCellTypes as an example:

corrExpPredPlot(
  DDLSToy,
  color.by = "CellType",
  facet.by = "CellType",
  size.point = 1, 
  filter.sc = FALSE,
  corr = "both"
)

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 190 rows containing non-finite values (`stat_cor()`).

corrExpPredPlot(
  DDLSToy,
  color.by = "CellType",
  facet.by = "nCellTypes",
  size.point = 1,
  corr = "both"
)

## `geom_smooth()` using formula = 'y ~ x'

`blandAltmanLehPlot`: Bland-Altman agreement plots

The blandAltmanLehPlot function allows to display Bland-Altman agreement plots. This is a kind of graphical method for comparing the level of agreement between two different sets of values. The differences between predictions and actual proportions are plotted against their averages. The central dashed line represents the mean difference, while the two red dashed lines are the limits of agreement, which are defined as the mean difference plus and minus 1.96 times the standard deviation of the differences. 95% of the differences are expected to fall between these two limits, so the wider the margins, the worse the performance. It is also possible to display it in \(log_2\) space.

blandAltmanLehPlot(
  DDLSToy, 
  color.by = "CellType",
  log.2 = FALSE,
  size.point = 1,
  filter.sc = TRUE,
  density = TRUE,
)

In addition, this function has the same behaviour as previous ones, as it is possible to split plots:

blandAltmanLehPlot(
  DDLSToy, 
  color.by = "nCellTypes",
  facet.by = "nCellTypes",
  log.2 = FALSE,
  size.point = 1,
  filter.sc = TRUE,
  density = FALSE
)

Loading and deconvolution of new bulk RNA-Seq samples

Once the model has been evaluated, we can deconvolute the bulk RNA-seq data already loaded at the beginning of the pipeline. We could load new data using the loadDeconvData function, but we recommend doing it by using the createDDLSobject function, since it will take only those genes shared between the two kinds of data, so that only features relevant for the deconvolution process will be used. Anyhow, to load new data, the call function would be as follows:

suppressMessages(library(SummarizedExperiment, quietly = TRUE))
seExample <- SummarizedExperiment(assay = list(counts = countsBulk))

DDLSToy <- loadDeconvData(
  object = DDLSToy,
  data = seExample, 
  name.data = "Simulated.example"
)

In our case, we can directly use the deconvDigitalDLSorterObj function, which will deconvolute the cell proportions of the cell types considered by the model. The predicted proportions can be represented by the barPlotCellTypes function. The cell composition matrix is stored in the deconv.results slot.

DDLSToy <- deconvDDLSObj(
  object = DDLSToy, 
  name.data = "Bulk.DT",
  normalize = TRUE,
  scaling = "standardize",
  verbose = FALSE
)
## plot results
barPlotCellTypes(
  DDLSToy, name.data = "Bulk.DT", 
  rm.x.text = TRUE, color.line = "black"
)

Saving `DigitalDLSorter` object and trained models

digitalDLSorteR provides different ways to save models on disk and to retrieve them in the DigitalDLSorter object. First, you can save DigitalDLSorter objects as RDS files. Since this file format only accepts native R objects, they are not able to store complex data structures such as keras Python objects (keras.engine.sequential.Sequential class). To make it possible, digitalDLSorteR implements a saveRDS generic function that converts the keras model object into a list with network architecture and weights after training. These two pieces of information are the minimal part needed to perform new predictions. When the model is to be used, the model is compiled back to a keras.engine.sequential.Sequential object.

## this code will not be run
saveRDS(object = DDLSToy, file = "valid/path")

However, the optimizer state is not saved in this way. To offer the possibility to also save the optimizer, digitalDLSorteR offers the saveTrainedModelAsH5 function to save on disk the whole neural network, and loadTrainedModelFromH5 to load-back models into DigitalDLSorter objects. Note that in this way just the keras model is saved as an HDF5 file.

## this code will not be run
saveTrainedModelAsH5(DDLSToy, file.path = "valid/path")
DDLSToy <- loadTrainedModelFromH5(DDLSToy)

References

Risso, D., F. Perraudeau, S. Gribkova, S. Dudoit, and J. P. Vert. 2018. “A general and flexible method for signal extraction from single-cell RNA-seq data.” Nat Commun 9 (1): 284.