1. Guidelines for authorship in collaborations

Service Mode

These are usually comprised of straightforward analysis with simple experimental designs using one or more of our semi-automated and highly reproducible analysis pipelines. This also includes standard assistance with the manuscript writing (i.e. related Materials & Methods section). In such cases, usually an acknowledgement to IBU in general and, potentially but not necessarily, to the responsible bioinformatician(s) is appropriate.

Example sentence for acknowledgements: The Interfaculty Bioinformatics Unit (IBU), University of Bern provided computational infrastructure and support with bioinformatic analyses.

Research Mode

These apply to all analyses not covered in the previous section. It includes i) custom pipelines due to specific data or analysis needs, ii) analysis of data from emerging technologies not covered in our bundles that will require the design and implementation of a workflow, and iii) custom downstream analysis and visualization of data from service mode. Usually, this is considered to be a substantial contribution to the study upon publication and as such leads to a co-authorship. Exceptions to this rule need to be discussed on a per-project basis.

Example Affiliation: Interfaculty Bioinformatics Unit (IBU) and Swiss Institute of Bioinformatics (SIB), University of Bern, Bern, Switzerland

→ For more details please have a look at our website https://www.bioinformatics.unibe.ch


2. Quality control

Tool: Scater

Library size versus detected genes

A high number of detected genes can potentially indicate doublets. However, depending on the celltype composition in your sample, it may also reflect true biological variation among cell types.

  • sum: total number of counts for a cell (i.e. library size)
  • detected: total number of genes expressed in a cell

Percent of mitochondrial reads

Low quality /dying cells often exhibit extensive mitochondrial contamination. Cells with a high proportion of mitochondrial reads will be removed.

  • detected: total number of genes expressed in a cell
  • subsets_Mito_percent: percentage of all counts that come from the mitochondrial genome

Table: Median of the proportion of mitochondrial reads in each sample

median [%]
C_F 1.948843
C_M_2 1.831965
H_F_2_2 2.043343
H_Fi_2 2.026529
H_M2 2.004468
H_MI_2 1.952511

Percent of ribosomal protein reads

The fraction of reads from ribosomal proteins varies based on cell type and overall cell health. Higher levels of RNA degradation may lead to more templating of ribosomal proteins.

Table: Median of the proportion of reads from ribosomal proteins in each sample

median [%]
C_F 40.95304
C_M_2 35.67937
H_F_2_2 37.04388
H_Fi_2 32.13078
H_M2 36.05118
H_MI_2 41.86038


3. Filtering

Tool: Scater

Automatically identifying low-quality cells

Scater provides an approach to automatically remove low-quality cells based on experiment-specific aspects of the data.

Table: Number of cells removed due to low library size (low_lib_size), low number of expressed genes (low_n_features), high proportion of mitochondrial reads (high_subset_Mito_percent) and in total (discard) in each sample.

low_lib_size low_n_features high_subsets_Mito_percent discard
C_F 0 0 73 73
C_M_2 0 0 70 70
H_F_2_2 0 0 54 54
H_Fi_2 0 2 27 27
H_M2 0 0 118 118
H_MI_2 0 0 39 39

Gene-level QC

Plots showing the top 50 most highly expressed genes in each sample. Each row corresponds to a gene, and each boxplot corresponds to the expression of a gene (i.e. number of reads) in a single cell. The vertical line in within the box indicates the median expression of each gene across all cells. Genes are sorted in decreasing order based on median expression.

Sometimes individual genes may have very high expression and should be removed to avoid problems at the normalization step. In particular, look out for MALAT1 and other nuclear lincRNAs, mitochondrial genes (prefix mt-), ribosomal proteins (starting with rp), actin and hemoglobin.

Plots after removing specific highly expressed genes: Removed the following genes: CR383676.1 CR936442.1

Statistics after filtering

Number of cells

Before filtering After filtering
C_F 1183 1110
C_M_2 1779 1709
H_F_2_2 846 792
H_Fi_2 515 488
H_M2 1400 1282
H_MI_2 750 711

Library size versus detected genes after filtering

Percent of mitochondrial reads after filtering

Percent of ribosomal protein reads after filtering


4. Normalization

Tool: Seurat

Normalization is applied to correct for differences in library size between samples. Biological heterogeneity in single-cell RNA-seq data is often confounded by technical factors including sequencing depth. The number of molecules detected in each cell can vary significantly between cells, even within the same celltype.

Therefore, we apply sctransform normalization (Hafemeister and Satija, Genome Biology 2019), which builds regularized negative binomial models of gene expression in order to account for technical artifacts while preserving biological variance. During the normalization, we also remove confounding sources of variation (mitochondrial and ribosomal mapping percentage).

before normalization:

after normalization:


Session Information

R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux bullseye/sid

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] knitr_1.33 ggpubr_0.4.0
[3] kableExtra_1.3.4 scran_1.20.1
[5] umap_0.2.7.0 scater_1.20.0
[7] ggplot2_3.3.3 scuttle_1.2.0
[9] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0 [11] Biobase_2.52.0 GenomicRanges_1.44.0
[13] GenomeInfoDb_1.28.0 IRanges_2.26.0
[15] S4Vectors_0.30.0 BiocGenerics_0.38.0
[17] MatrixGenerics_1.4.0 matrixStats_0.59.0
[19] SeuratObject_4.0.2 Seurat_4.0.5

loaded via a namespace (and not attached): [1] utf8_1.2.1 reticulate_1.20
[3] tidyselect_1.1.1 htmlwidgets_1.5.3
[5] grid_4.1.0 BiocParallel_1.26.0
[7] Rtsne_0.15 munsell_0.5.0
[9] ScaledMatrix_1.0.0 codetools_0.2-18
[11] ica_1.0-2 statmod_1.4.36
[13] future_1.23.0 miniUI_0.1.1.1
[15] withr_2.4.2 colorspace_2.0-1
[17] highr_0.9 rstudioapi_0.13
[19] ROCR_1.0-11 ggsignif_0.6.1
[21] tensor_1.5 listenv_0.8.0
[23] labeling_0.4.2 GenomeInfoDbData_1.2.6
[25] polyclip_1.10-0 farver_2.1.0
[27] parallelly_1.28.1 vctrs_0.3.8
[29] generics_0.1.0 xfun_0.23
[31] R6_2.5.0 ggbeeswarm_0.6.0
[33] rsvd_1.0.5 locfit_1.5-9.4
[35] bitops_1.0-7 spatstat.utils_2.1-0
[37] DelayedArray_0.18.0 assertthat_0.2.1
[39] promises_1.2.0.1 scales_1.1.1
[41] beeswarm_0.4.0 gtable_0.3.0
[43] beachmat_2.8.0 globals_0.14.0
[45] goftest_1.2-2 rlang_0.4.11
[47] systemfonts_1.0.2 splines_4.1.0
[49] rstatix_0.7.0 lazyeval_0.2.2
[51] spatstat.geom_2.1-0 broom_0.7.6
[53] yaml_2.2.1 reshape2_1.4.4
[55] abind_1.4-5 backports_1.2.1
[57] httpuv_1.6.1 tools_4.1.0
[59] ellipsis_0.3.2 spatstat.core_2.1-2
[61] jquerylib_0.1.4 RColorBrewer_1.1-2
[63] ggridges_0.5.3 Rcpp_1.0.7
[65] plyr_1.8.6 sparseMatrixStats_1.4.0
[67] zlibbioc_1.38.0 purrr_0.3.4
[69] RCurl_1.98-1.3 rpart_4.1-15
[71] openssl_1.4.4 deldir_0.2-10
[73] pbapply_1.4-3 viridis_0.6.1
[75] cowplot_1.1.1 zoo_1.8-9
[77] haven_2.4.1 ggrepel_0.9.1
[79] cluster_2.1.2 magrittr_2.0.1
[81] data.table_1.14.0 RSpectra_0.16-0
[83] scattermore_0.7 openxlsx_4.2.3
[85] lmtest_0.9-38 RANN_2.6.1
[87] fitdistrplus_1.1-5 hms_1.1.0
[89] patchwork_1.1.1 mime_0.10
[91] evaluate_0.14 xtable_1.8-4
[93] rio_0.5.26 readxl_1.3.1
[95] gridExtra_2.3 compiler_4.1.0
[97] tibble_3.1.2 KernSmooth_2.23-20
[99] crayon_1.4.1 htmltools_0.5.1.1
[101] mgcv_1.8-36 later_1.2.0
[103] tidyr_1.1.3 DBI_1.1.1
[105] MASS_7.3-54 Matrix_1.3-4
[107] car_3.0-10 metapod_1.0.0
[109] igraph_1.2.6 forcats_0.5.1
[111] pkgconfig_2.0.3 foreign_0.8-81
[113] plotly_4.9.3 spatstat.sparse_2.0-0
[115] xml2_1.3.2 svglite_2.0.0
[117] vipor_0.4.5 bslib_0.2.5.1
[119] dqrng_0.3.0 webshot_0.5.2
[121] XVector_0.32.0 rvest_1.0.0
[123] stringr_1.4.0 digest_0.6.27
[125] sctransform_0.3.2 RcppAnnoy_0.0.18
[127] spatstat.data_2.1-0 cellranger_1.1.0
[129] rmarkdown_2.8 leiden_0.3.8
[131] uwot_0.1.10 edgeR_3.34.0
[133] DelayedMatrixStats_1.14.0 curl_4.3.1
[135] shiny_1.6.0 lifecycle_1.0.0
[137] nlme_3.1-152 jsonlite_1.7.2
[139] carData_3.0-4 BiocNeighbors_1.10.0
[141] viridisLite_0.4.0 askpass_1.1
[143] limma_3.48.0 fansi_0.5.0
[145] pillar_1.6.1 lattice_0.20-44
[147] fastmap_1.1.0 httr_1.4.2
[149] survival_3.2-11 glue_1.4.2
[151] zip_2.2.0 png_0.1-7
[153] bluster_1.2.1 stringi_1.6.2
[155] sass_0.4.0 BiocSingular_1.8.0
[157] dplyr_1.0.6 irlba_2.3.3
[159] future.apply_1.7.0