dreamlet()
evaluates precision-weighted linear (mixed) models on each gene that passes standard filters. The linear mixed model used by dream()
can be a little fragile for small sample sizes and correlated covariates. dreamlet()
runs variancePartition::dream()
in the backend for each cell cluster. dream()
reports model failures for each cell cluster and dreamlet()
reports these failures to the user. dreamlet()
returns all successful model fits to be used for downstream analysis.
See details from variancePartition
error page.
Due to a recent bug in the dependency Matrix
package, all random effects models may fail for technical reasons. If your random effects analysis is failing for all genes in cases with no good explanation, this bug may be responsible. This case can be detected and resolved as follows:
library(lme4)
# Fit simple mixed model
lmer(Reaction ~ (1 | Subject), sleepstudy)
# Error in initializePtr() :
# function 'chm_factor_ldetL2' not provided by package 'Matrix'
This error indicates incompatible installs of Matrix
and lme4
. This can be solved with
install.packages("lme4", type = "source")
followed by restarting R.
The most common issue is when dreamlet()
analysis succeeds for most genes, but a handful of genes fail in each cell cluster. These genes can fail if the iterative process of fitting the linear mixed model does not converge, or if the estimated covariance matrix that is supposed be positive definite has an eigen-value that is negative or too close to zero due to rounding errors in floating point arithmetic.
In these cases, dreamlet()
stores a summary of these failures for all cell clusters that is accessible with details()
. Specific failure messages for each cell cluster and gene can be extracted using seeErrors()
Here we demonstrate how dreamlet()
handles model failures:
library(dreamlet)
library(muscat)
library(SingleCellExperiment)
data(example_sce)
# create pseudobulk for each sample and cell cluster
pb <- aggregateToPseudoBulk(example_sce,
assay = "counts",
cluster_id = "cluster_id",
sample_id = "sample_id",
verbose = FALSE
)
# voom-style normalization for each cell cluster
res.proc <- processAssays(
pb[1:300, ],
~group_id
)
# Redundant formula
# This example is an extreme example of redundancy
# but more subtle cases often show up in real data
form <- ~ group_id + (1 | group_id)
# fit dreamlet model
res.dl <- dreamlet(res.proc, form)
## B cells...7.9 secs
## CD14+ Monocytes...10 secs
## CD4 T cells...9 secs
## CD8 T cells...4.4 secs
## FCGR3A+ Monocytes...11 secs
##
## Of 1,062 models fit across all assays, 96.2% failed
# summary of models
res.dl
## class: dreamletResult
## assays(5): B cells CD14+ Monocytes CD4 T cells CD8 T cells FCGR3A+ Monocytes
## Genes:
## min: 3
## max: 11
## details(7): assay n_retain ... n_errors error_initial
## coefNames(2): (Intercept) group_idstim
##
## Of 1,062 models fit across all assays, 96.2% failed
# summary of models for each cell cluster
details(res.dl)
## assay n_retain formula formDropsTerms n_genes n_errors error_initial
## 1 B cells 4 ~group_id + (1 | group_id) FALSE 201 190 FALSE
## 2 CD14+ Monocytes 4 ~group_id + (1 | group_id) FALSE 269 263 FALSE
## 3 CD4 T cells 4 ~group_id + (1 | group_id) FALSE 216 207 FALSE
## 4 CD8 T cells 4 ~group_id + (1 | group_id) FALSE 118 115 FALSE
## 5 FCGR3A+ Monocytes 4 ~group_id + (1 | group_id) FALSE 258 247 FALSE
assay
: cell typen_retain
: number of samples retainedformula
: regression formula used after variable filteringformDropsTerms
: whether a variable was dropped from the formula for having zero variance following filteringn_genes
: number of genes analyzedn_errors
: number of genes with errorserror_initial
: indicator for assay-level errorBefore the full dataset is analyzed, dreamlet()
runs a test for each assay to see if the model succeeds. If the model fails, its does not continue analysis for that assay. These assay-level errors are reported above in the error_initial
column, and details are returned here.
# Extract errors as a tibble
res.err = seeErrors(res.dl)
## Assay-level errors: 0
## Gene-level errors: 1038
# No errors at the assay level
res.err$assayLevel
# the most common error is:
"Some predictor variables are on very different scales: consider rescaling"
This indicates that the scale of the predictor variables are very different and can affect the numerical stability of the iterative algorithm. This can be solved by running scale()
on each variable in the formula:
form = ~ scale(x) + scale(y) + ...
A model can fail for a single gene if covariates are too correlated, or for other numerical issues. Failed models are reported here and are not included in downstream analysis.
# See gene-level errors for each assay
res.err$geneLevel[1:2,]
## # A tibble: 2 × 3
## assay feature errorText
## <chr> <chr> <chr>
## B cells ISG15 "Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol):…
## B cells AURKAIP1 "Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol):…
# See full error message text
res.err$geneLevel$errorText[1]
"Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol): (converted from warning)
Model may not have converged with 1 eigenvalue close to zero: 1.4e-09\n"
This message indicates that the model was numerically unstable likely because the variables are closely correlated.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /media/volume/teran2_disk/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] GO.db_3.20.0 org.Hs.eg.db_3.20.0
## [3] AnnotationDbi_1.68.0 zenith_1.8.0
## [5] muscData_1.20.0 scater_1.34.0
## [7] scuttle_1.16.0 ExperimentHub_2.14.0
## [9] AnnotationHub_3.14.0 BiocFileCache_2.14.0
## [11] dbplyr_2.5.0 muscat_1.20.0
## [13] dreamlet_1.4.1 SingleCellExperiment_1.28.0
## [15] SummarizedExperiment_1.36.0 Biobase_2.66.0
## [17] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
## [19] IRanges_2.40.0 S4Vectors_0.44.0
## [21] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
## [23] matrixStats_1.4.1 variancePartition_1.36.1
## [25] BiocParallel_1.40.0 limma_3.62.1
## [27] ggplot2_3.5.1 BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-9 httr_1.4.7
## [3] RColorBrewer_1.1-3 doParallel_1.0.17
## [5] Rgraphviz_2.50.0 numDeriv_2016.8-1.1
## [7] sctransform_0.4.1 tools_4.4.1
## [9] backports_1.5.0 utf8_1.2.4
## [11] R6_2.5.1 metafor_4.6-0
## [13] mgcv_1.9-1 GetoptLong_1.0.5
## [15] withr_3.0.2 prettyunits_1.2.0
## [17] gridExtra_2.3 cli_3.6.3
## [19] sandwich_3.1-1 labeling_0.4.3
## [21] sass_0.4.9 KEGGgraph_1.66.0
## [23] SQUAREM_2021.1 mvtnorm_1.3-2
## [25] blme_1.0-6 mixsqp_0.3-54
## [27] parallelly_1.38.0 invgamma_1.1
## [29] RSQLite_2.3.7 generics_0.1.3
## [31] shape_1.4.6.1 gtools_3.9.5
## [33] dplyr_1.1.4 Matrix_1.7-1
## [35] metadat_1.2-0 ggbeeswarm_0.7.2
## [37] fansi_1.0.6 abind_1.4-8
## [39] lifecycle_1.0.4 multcomp_1.4-26
## [41] yaml_2.3.10 edgeR_4.4.0
## [43] mathjaxr_1.6-0 gplots_3.2.0
## [45] SparseArray_1.6.0 grid_4.4.1
## [47] blob_1.2.4 crayon_1.5.3
## [49] lattice_0.22-6 beachmat_2.22.0
## [51] msigdbr_7.5.1 annotate_1.84.0
## [53] KEGGREST_1.46.0 magick_2.8.5
## [55] pillar_1.9.0 knitr_1.48
## [57] ComplexHeatmap_2.22.0 rjson_0.2.23
## [59] boot_1.3-31 estimability_1.5.1
## [61] corpcor_1.6.10 future.apply_1.11.3
## [63] codetools_0.2-20 glue_1.8.0
## [65] data.table_1.16.2 vctrs_0.6.5
## [67] png_0.1-8 Rdpack_2.6.1
## [69] gtable_0.3.6 assertthat_0.2.1
## [71] cachem_1.1.0 xfun_0.49
## [73] mime_0.12 rbibutils_2.3
## [75] S4Arrays_1.6.0 Rfast_2.1.0
## [77] coda_0.19-4.1 reformulas_0.4.0
## [79] survival_3.7-0 iterators_1.0.14
## [81] tinytex_0.54 statmod_1.5.0
## [83] TH.data_1.1-2 nlme_3.1-166
## [85] pbkrtest_0.5.3 bit64_4.5.2
## [87] filelock_1.0.3 progress_1.2.3
## [89] EnvStats_3.0.0 bslib_0.8.0
## [91] TMB_1.9.15 irlba_2.3.5.1
## [93] vipor_0.4.7 KernSmooth_2.23-24
## [95] colorspace_2.1-1 rmeta_3.0
## [97] DBI_1.2.3 DESeq2_1.46.0
## [99] tidyselect_1.2.1 emmeans_1.10.5
## [101] curl_6.0.0 bit_4.5.0
## [103] compiler_4.4.1 graph_1.84.0
## [105] BiocNeighbors_2.0.0 DelayedArray_0.32.0
## [107] bookdown_0.41 scales_1.3.0
## [109] caTools_1.18.3 remaCor_0.0.18
## [111] rappdirs_0.3.3 stringr_1.5.1
## [113] digest_0.6.37 minqa_1.2.8
## [115] rmarkdown_2.29 aod_1.3.3
## [117] XVector_0.46.0 RhpcBLASctl_0.23-42
## [119] htmltools_0.5.8.1 pkgconfig_2.0.3
## [121] lme4_1.1-35.5 sparseMatrixStats_1.18.0
## [123] highr_0.11 mashr_0.2.79
## [125] fastmap_1.2.0 rlang_1.1.4
## [127] GlobalOptions_0.1.2 UCSC.utils_1.2.0
## [129] DelayedMatrixStats_1.28.0 farver_2.1.2
## [131] jquerylib_0.1.4 zoo_1.8-12
## [133] jsonlite_1.8.9 BiocSingular_1.22.0
## [135] RCurl_1.98-1.16 magrittr_2.0.3
## [137] GenomeInfoDbData_1.2.13 munsell_0.5.1
## [139] Rcpp_1.0.13-1 babelgene_22.9
## [141] viridis_0.6.5 EnrichmentBrowser_2.36.0
## [143] RcppZiggurat_0.1.6 stringi_1.8.4
## [145] zlibbioc_1.52.0 MASS_7.3-61
## [147] plyr_1.8.9 listenv_0.9.1
## [149] parallel_4.4.1 ggrepel_0.9.6
## [151] Biostrings_2.74.0 splines_4.4.1
## [153] hms_1.1.3 circlize_0.4.16
## [155] locfit_1.5-9.10 reshape2_1.4.4
## [157] ScaledMatrix_1.14.0 BiocVersion_3.20.0
## [159] XML_3.99-0.17 evaluate_1.0.1
## [161] RcppParallel_5.1.9 BiocManager_1.30.25
## [163] nloptr_2.1.1 foreach_1.5.2
## [165] tidyr_1.3.1 purrr_1.0.2
## [167] future_1.34.0 clue_0.3-65
## [169] scattermore_1.2 ashr_2.2-63
## [171] rsvd_1.0.5 broom_1.0.7
## [173] xtable_1.8-4 fANCOVA_0.6-1
## [175] viridisLite_0.4.2 truncnorm_1.0-9
## [177] tibble_3.2.1 lmerTest_3.1-3
## [179] glmmTMB_1.1.10 memoise_2.0.1
## [181] beeswarm_0.4.0 cluster_2.1.6
## [183] globals_0.16.3 GSEABase_1.68.0