Tool: Scater
A high number of detected genes can potentially indicate doublets. However, depending on the celltype composition in your sample, it may also reflect true biological variation among cell types.
Low quality /dying cells often exhibit extensive mitochondrial contamination. Cells with a high proportion of mitochondrial reads will be removed.
Table: Median of the proportion of mitochondrial reads in each sample
median [%] | |
---|---|
C_F | 1.948843 |
C_M_2 | 1.831965 |
H_F_2_2 | 2.043343 |
H_Fi_2 | 2.026529 |
H_M2 | 2.004468 |
H_MI_2 | 1.952511 |
The fraction of reads from ribosomal proteins varies based on cell type and overall cell health. Higher levels of RNA degradation may lead to more templating of ribosomal proteins.
Table: Median of the proportion of reads from ribosomal proteins in each sample
median [%] | |
---|---|
C_F | 40.95304 |
C_M_2 | 35.67937 |
H_F_2_2 | 37.04388 |
H_Fi_2 | 32.13078 |
H_M2 | 36.05118 |
H_MI_2 | 41.86038 |
Tool: Scater
Scater provides an approach to automatically remove low-quality cells based on experiment-specific aspects of the data.
Table: Number of cells removed due to low library size (low_lib_size), low number of expressed genes (low_n_features), high proportion of mitochondrial reads (high_subset_Mito_percent) and in total (discard) in each sample.
low_lib_size | low_n_features | high_subsets_Mito_percent | discard | |
---|---|---|---|---|
C_F | 0 | 0 | 73 | 73 |
C_M_2 | 0 | 0 | 70 | 70 |
H_F_2_2 | 0 | 0 | 54 | 54 |
H_Fi_2 | 0 | 2 | 27 | 27 |
H_M2 | 0 | 0 | 118 | 118 |
H_MI_2 | 0 | 0 | 39 | 39 |
Plots showing the top 50 most highly expressed genes in each sample. Each row corresponds to a gene, and each boxplot corresponds to the expression of a gene (i.e. number of reads) in a single cell. The vertical line in within the box indicates the median expression of each gene across all cells. Genes are sorted in decreasing order based on median expression.
Sometimes individual genes may have very high expression and should be removed to avoid problems at the normalization step. In particular, look out for MALAT1 and other nuclear lincRNAs, mitochondrial genes (prefix mt-), ribosomal proteins (starting with rp), actin and hemoglobin.
Plots after removing specific highly expressed genes: Removed the following genes: CR383676.1 CR936442.1
Before filtering | After filtering | |
---|---|---|
C_F | 1183 | 1110 |
C_M_2 | 1779 | 1709 |
H_F_2_2 | 846 | 792 |
H_Fi_2 | 515 | 488 |
H_M2 | 1400 | 1282 |
H_MI_2 | 750 | 711 |
Tool: Seurat
Normalization is applied to correct for differences in library size between samples. Biological heterogeneity in single-cell RNA-seq data is often confounded by technical factors including sequencing depth. The number of molecules detected in each cell can vary significantly between cells, even within the same celltype.
Therefore, we apply sctransform normalization (Hafemeister and Satija, Genome Biology 2019), which builds regularized negative binomial models of gene expression in order to account for technical artifacts while preserving biological variance. During the normalization, we also remove confounding sources of variation (mitochondrial and ribosomal mapping percentage).
R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux bullseye/sid
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base
other attached packages: [1] knitr_1.33 ggpubr_0.4.0
[3] kableExtra_1.3.4 scran_1.20.1
[5] umap_0.2.7.0 scater_1.20.0
[7] ggplot2_3.3.3 scuttle_1.2.0
[9] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0 [11] Biobase_2.52.0 GenomicRanges_1.44.0
[13] GenomeInfoDb_1.28.0 IRanges_2.26.0
[15] S4Vectors_0.30.0 BiocGenerics_0.38.0
[17] MatrixGenerics_1.4.0 matrixStats_0.59.0
[19] SeuratObject_4.0.2 Seurat_4.0.5
loaded via a namespace (and not attached): [1] utf8_1.2.1 reticulate_1.20
[3] tidyselect_1.1.1 htmlwidgets_1.5.3
[5] grid_4.1.0 BiocParallel_1.26.0
[7] Rtsne_0.15 munsell_0.5.0
[9] ScaledMatrix_1.0.0 codetools_0.2-18
[11] ica_1.0-2 statmod_1.4.36
[13] future_1.23.0 miniUI_0.1.1.1
[15] withr_2.4.2 colorspace_2.0-1
[17] highr_0.9 rstudioapi_0.13
[19] ROCR_1.0-11 ggsignif_0.6.1
[21] tensor_1.5 listenv_0.8.0
[23] labeling_0.4.2 GenomeInfoDbData_1.2.6
[25] polyclip_1.10-0 farver_2.1.0
[27] parallelly_1.28.1 vctrs_0.3.8
[29] generics_0.1.0 xfun_0.23
[31] R6_2.5.0 ggbeeswarm_0.6.0
[33] rsvd_1.0.5 locfit_1.5-9.4
[35] bitops_1.0-7 spatstat.utils_2.1-0
[37] DelayedArray_0.18.0 assertthat_0.2.1
[39] promises_1.2.0.1 scales_1.1.1
[41] beeswarm_0.4.0 gtable_0.3.0
[43] beachmat_2.8.0 globals_0.14.0
[45] goftest_1.2-2 rlang_0.4.11
[47] systemfonts_1.0.2 splines_4.1.0
[49] rstatix_0.7.0 lazyeval_0.2.2
[51] spatstat.geom_2.1-0 broom_0.7.6
[53] yaml_2.2.1 reshape2_1.4.4
[55] abind_1.4-5 backports_1.2.1
[57] httpuv_1.6.1 tools_4.1.0
[59] ellipsis_0.3.2 spatstat.core_2.1-2
[61] jquerylib_0.1.4 RColorBrewer_1.1-2
[63] ggridges_0.5.3 Rcpp_1.0.7
[65] plyr_1.8.6 sparseMatrixStats_1.4.0
[67] zlibbioc_1.38.0 purrr_0.3.4
[69] RCurl_1.98-1.3 rpart_4.1-15
[71] openssl_1.4.4 deldir_0.2-10
[73] pbapply_1.4-3 viridis_0.6.1
[75] cowplot_1.1.1 zoo_1.8-9
[77] haven_2.4.1 ggrepel_0.9.1
[79] cluster_2.1.2 magrittr_2.0.1
[81] data.table_1.14.0 RSpectra_0.16-0
[83] scattermore_0.7 openxlsx_4.2.3
[85] lmtest_0.9-38 RANN_2.6.1
[87] fitdistrplus_1.1-5 hms_1.1.0
[89] patchwork_1.1.1 mime_0.10
[91] evaluate_0.14 xtable_1.8-4
[93] rio_0.5.26 readxl_1.3.1
[95] gridExtra_2.3 compiler_4.1.0
[97] tibble_3.1.2 KernSmooth_2.23-20
[99] crayon_1.4.1 htmltools_0.5.1.1
[101] mgcv_1.8-36 later_1.2.0
[103] tidyr_1.1.3 DBI_1.1.1
[105] MASS_7.3-54 Matrix_1.3-4
[107] car_3.0-10 metapod_1.0.0
[109] igraph_1.2.6 forcats_0.5.1
[111] pkgconfig_2.0.3 foreign_0.8-81
[113] plotly_4.9.3 spatstat.sparse_2.0-0
[115] xml2_1.3.2 svglite_2.0.0
[117] vipor_0.4.5 bslib_0.2.5.1
[119] dqrng_0.3.0 webshot_0.5.2
[121] XVector_0.32.0 rvest_1.0.0
[123] stringr_1.4.0 digest_0.6.27
[125] sctransform_0.3.2 RcppAnnoy_0.0.18
[127] spatstat.data_2.1-0 cellranger_1.1.0
[129] rmarkdown_2.8 leiden_0.3.8
[131] uwot_0.1.10 edgeR_3.34.0
[133] DelayedMatrixStats_1.14.0 curl_4.3.1
[135] shiny_1.6.0 lifecycle_1.0.0
[137] nlme_3.1-152 jsonlite_1.7.2
[139] carData_3.0-4 BiocNeighbors_1.10.0
[141] viridisLite_0.4.0 askpass_1.1
[143] limma_3.48.0 fansi_0.5.0
[145] pillar_1.6.1 lattice_0.20-44
[147] fastmap_1.1.0 httr_1.4.2
[149] survival_3.2-11 glue_1.4.2
[151] zip_2.2.0 png_0.1-7
[153] bluster_1.2.1 stringi_1.6.2
[155] sass_0.4.0 BiocSingular_1.8.0
[157] dplyr_1.0.6 irlba_2.3.3
[159] future.apply_1.7.0