1. Guidelines for authorship in collaborations

Service Mode

These are usually comprised of straightforward analysis with simple experimental designs using one or more of our semi-automated and highly reproducible analysis pipelines. This also includes standard assistance with the manuscript writing (i.e. related Materials & Methods section). In such cases, usually an acknowledgement to IBU in general and, potentially but not necessarily, to the responsible bioinformatician(s) is appropriate.

Example sentence for acknowledgements: The Interfaculty Bioinformatics Unit (IBU), University of Bern provided computational infrastructure and support with bioinformatic analyses.

Research Mode

These apply to all analyses not covered in the previous section. It includes i) custom pipelines due to specific data or analysis needs, ii) analysis of data from emerging technologies not covered in our bundles that will require the design and implementation of a workflow, and iii) custom downstream analysis and visualization of data from service mode. Usually, this is considered to be a substantial contribution to the study upon publication and as such leads to a co-authorship. Exceptions to this rule need to be discussed on a per-project basis.

Example Affiliation: Interfaculty Bioinformatics Unit (IBU) and Swiss Institute of Bioinformatics (SIB), University of Bern, Bern, Switzerland

→ For more details please have a look at our website https://www.bioinformatics.unibe.ch

2. Quality control

Tool: Scater

Library size versus detected genes

A high number of detected genes can potentially indicate doublets. However, depending on the celltype composition in your sample, it may also reflect true biological variation among cell types.

sum: total number of counts for a cell (i.e. library size)
detected: total number of genes expressed in a cell

Percent of mitochondrial reads

Low quality /dying cells often exhibit extensive mitochondrial contamination. Cells with a high proportion of mitochondrial reads will be removed.

detected: total number of genes expressed in a cell
subsets_Mito_percent: percentage of all counts that come from the mitochondrial genome

Table: Median of the proportion of mitochondrial reads in each sample

	median [%]
C_F	1.948843
C_M_2	1.831965
H_F_2_2	2.043343
H_Fi_2	2.026529
H_M2	2.004468
H_MI_2	1.952511

Percent of ribosomal protein reads

The fraction of reads from ribosomal proteins varies based on cell type and overall cell health. Higher levels of RNA degradation may lead to more templating of ribosomal proteins.

Table: Median of the proportion of reads from ribosomal proteins in each sample

	median [%]
C_F	40.95304
C_M_2	35.67937
H_F_2_2	37.04388
H_Fi_2	32.13078
H_M2	36.05118
H_MI_2	41.86038

3. Filtering

Tool: Scater

Automatically identifying low-quality cells

Scater provides an approach to automatically remove low-quality cells based on experiment-specific aspects of the data.

Table: Number of cells removed due to low library size (low_lib_size), low number of expressed genes (low_n_features), high proportion of mitochondrial reads (high_subset_Mito_percent) and in total (discard) in each sample.

	low_n_features	high_subsets_Mito_percent	discard
C_F	0	73	73
C_M_2	0	70	70
H_F_2_2	0	54	54
H_Fi_2	2	27	27
H_M2	0	118	118
H_MI_2	0	39	39

Gene-level QC

Plots showing the top 50 most highly expressed genes in each sample. Each row corresponds to a gene, and each boxplot corresponds to the expression of a gene (i.e. number of reads) in a single cell. The vertical line in within the box indicates the median expression of each gene across all cells. Genes are sorted in decreasing order based on median expression.

Sometimes individual genes may have very high expression and should be removed to avoid problems at the normalization step. In particular, look out for MALAT1 and other nuclear lincRNAs, mitochondrial genes (prefix mt-), ribosomal proteins (starting with rp), actin and hemoglobin.

Plots after removing specific highly expressed genes: Removed the following genes: CR383676.1 CR936442.1

Statistics after filtering

Number of cells

	Before filtering	After filtering
C_F	1183	1110
C_M_2	1779	1709
H_F_2_2	846	792
H_Fi_2	515	488
H_M2	1400	1282
H_MI_2	750	711

Library size versus detected genes after filtering

Percent of mitochondrial reads after filtering

Percent of ribosomal protein reads after filtering

4. Normalization

Tool: Seurat

Normalization is applied to correct for differences in library size between samples. Biological heterogeneity in single-cell RNA-seq data is often confounded by technical factors including sequencing depth. The number of molecules detected in each cell can vary significantly between cells, even within the same celltype.

Therefore, we apply sctransform normalization (Hafemeister and Satija, Genome Biology 2019), which builds regularized negative binomial models of gene expression in order to account for technical artifacts while preserving biological variance. During the normalization, we also remove confounding sources of variation (mitochondrial and ribosomal mapping percentage).

before normalization:

after normalization:

Session Information

R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux bullseye/sid

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] knitr_1.33 ggpubr_0.4.0
[3] kableExtra_1.3.4 scran_1.20.1
[5] umap_0.2.7.0 scater_1.20.0
[7] ggplot2_3.3.3 scuttle_1.2.0
[9] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0 [11] Biobase_2.52.0 GenomicRanges_1.44.0
[13] GenomeInfoDb_1.28.0 IRanges_2.26.0
[15] S4Vectors_0.30.0 BiocGenerics_0.38.0
[17] MatrixGenerics_1.4.0 matrixStats_0.59.0
[19] SeuratObject_4.0.2 Seurat_4.0.5

loaded via a namespace (and not attached): [1] utf8_1.2.1 reticulate_1.20
[3] tidyselect_1.1.1 htmlwidgets_1.5.3
[5] grid_4.1.0 BiocParallel_1.26.0
[7] Rtsne_0.15 munsell_0.5.0
[9] ScaledMatrix_1.0.0 codetools_0.2-18
[11] ica_1.0-2 statmod_1.4.36
[13] future_1.23.0 miniUI_0.1.1.1
[15] withr_2.4.2 colorspace_2.0-1
[17] highr_0.9 rstudioapi_0.13
[19] ROCR_1.0-11 ggsignif_0.6.1
[21] tensor_1.5 listenv_0.8.0
[23] labeling_0.4.2 GenomeInfoDbData_1.2.6
[25] polyclip_1.10-0 farver_2.1.0
[27] parallelly_1.28.1 vctrs_0.3.8
[29] generics_0.1.0 xfun_0.23
[31] R6_2.5.0 ggbeeswarm_0.6.0
[33] rsvd_1.0.5 locfit_1.5-9.4
[35] bitops_1.0-7 spatstat.utils_2.1-0
[37] DelayedArray_0.18.0 assertthat_0.2.1
[39] promises_1.2.0.1 scales_1.1.1
[41] beeswarm_0.4.0 gtable_0.3.0
[43] beachmat_2.8.0 globals_0.14.0
[45] goftest_1.2-2 rlang_0.4.11
[47] systemfonts_1.0.2 splines_4.1.0
[49] rstatix_0.7.0 lazyeval_0.2.2
[51] spatstat.geom_2.1-0 broom_0.7.6
[53] yaml_2.2.1 reshape2_1.4.4
[55] abind_1.4-5 backports_1.2.1
[57] httpuv_1.6.1 tools_4.1.0
[59] ellipsis_0.3.2 spatstat.core_2.1-2
[61] jquerylib_0.1.4 RColorBrewer_1.1-2
[63] ggridges_0.5.3 Rcpp_1.0.7
[65] plyr_1.8.6 sparseMatrixStats_1.4.0
[67] zlibbioc_1.38.0 purrr_0.3.4
[69] RCurl_1.98-1.3 rpart_4.1-15
[71] openssl_1.4.4 deldir_0.2-10
[73] pbapply_1.4-3 viridis_0.6.1
[75] cowplot_1.1.1 zoo_1.8-9
[77] haven_2.4.1 ggrepel_0.9.1
[79] cluster_2.1.2 magrittr_2.0.1
[81] data.table_1.14.0 RSpectra_0.16-0
[83] scattermore_0.7 openxlsx_4.2.3
[85] lmtest_0.9-38 RANN_2.6.1
[87] fitdistrplus_1.1-5 hms_1.1.0
[89] patchwork_1.1.1 mime_0.10
[91] evaluate_0.14 xtable_1.8-4
[93] rio_0.5.26 readxl_1.3.1
[95] gridExtra_2.3 compiler_4.1.0
[97] tibble_3.1.2 KernSmooth_2.23-20
[99] crayon_1.4.1 htmltools_0.5.1.1
[101] mgcv_1.8-36 later_1.2.0
[103] tidyr_1.1.3 DBI_1.1.1
[105] MASS_7.3-54 Matrix_1.3-4
[107] car_3.0-10 metapod_1.0.0
[109] igraph_1.2.6 forcats_0.5.1
[111] pkgconfig_2.0.3 foreign_0.8-81
[113] plotly_4.9.3 spatstat.sparse_2.0-0
[115] xml2_1.3.2 svglite_2.0.0
[117] vipor_0.4.5 bslib_0.2.5.1
[119] dqrng_0.3.0 webshot_0.5.2
[121] XVector_0.32.0 rvest_1.0.0
[123] stringr_1.4.0 digest_0.6.27
[125] sctransform_0.3.2 RcppAnnoy_0.0.18
[127] spatstat.data_2.1-0 cellranger_1.1.0
[129] rmarkdown_2.8 leiden_0.3.8
[131] uwot_0.1.10 edgeR_3.34.0
[133] DelayedMatrixStats_1.14.0 curl_4.3.1
[135] shiny_1.6.0 lifecycle_1.0.0
[137] nlme_3.1-152 jsonlite_1.7.2
[139] carData_3.0-4 BiocNeighbors_1.10.0
[141] viridisLite_0.4.0 askpass_1.1
[143] limma_3.48.0 fansi_0.5.0
[145] pillar_1.6.1 lattice_0.20-44
[147] fastmap_1.1.0 httr_1.4.2
[149] survival_3.2-11 glue_1.4.2
[151] zip_2.2.0 png_0.1-7
[153] bluster_1.2.1 stringi_1.6.2
[155] sass_0.4.0 BiocSingular_1.8.0
[157] dplyr_1.0.6 irlba_2.3.3
[159] future.apply_1.7.0

scRNAseq_qc-filtering-normalization

Heidi Tschanz-Lischer

04.01.2022