Statistical analysis and visualisation

Understanding a principal component analysis (PCA) plot

In omics applications, we often collect data for a large number of variables per sample. PCA is a mathematical algorithm that reduces the dimensionality of the data while retaining as much of the variation as possible. This then allows us to represent our samples in fewer dimensions, and visualize similarities and differences for example in a 2-D scatterplot. Such PCA plots typically show the first principal component (PC1) along the x-axis and the second principal component (PC2) along the y-axis. Because the principal components are ordered by decreasing importance, differences along PC1 are larger than differences along PC2.

If you want to learn more about PCA, good places to start are this 5-minute video or this primer.

P-values and their interpretation

This video from StatQuest uses a drug trial as an example to explain hypothesis testing and P-values.

The why and how of false discovery rate (FDR) adjustment

This video by Josh Starmer at StatQuest provides an excellent overview of why we do a false discovery rate adjustment when we perform many statistical tests. This is the case, for example in RNA-seq experiments where we perform tens of thousands of tests for differential expression, i.e. one for each gene.

In our standard RNA-seq analysis, we used the Benjamini Hochberg procedure for FDR adjustment, which is explained ca. 14min 45s into the video. You will also see why it can sometimes happen that many adjusted P-values are the same.

To calculate Benjamini-Hochberg adjusted P-values, you can use, for example, this online tool or the function p.adjust() in R.