Why PCA on bulk RNA-Seq and t-SNE on scRNA-Seq?

t-SNE is getting popular for single cell RNA-Seq data analysis. But the explanations why t-SNE on scRNA-Seq and PCA on bulk RNA-Seq work well makes no sense to me at all.

How to Use t-SNE Effectively is the best site to learn what t-SNE is. Let's see the points.

1. Those hyperparameters really matter

You see that t-SNE can produce very different outputs by selection of parameters. And this consequently results in the following points.

2. Cluster sizes in a t-SNE plot mean nothing
3. Distances between clusters might not mean anything

You know that such information is essential to interpret the result of PCA. You should know such information is lost in t-SNE outputs. And the direction information, which is also very important in the biological context, is also lost. Without knowing these points, you definitely misunderstand the charts generated by t-SNE.

By the way, you can plot other samples in the fixed PCA space, but you can't do it with t-SNE. This is another disadvantage.

4. Random noise doesn’t always look random

Researchers reading papers with the t-SNE figure are likely not knowing this characteristic. If they see them without the knowledge, they think the clusters have some meaning in the biological context, of course. But actually, they can be generated from noise.

Now you understand that if PCA works fine, t-SNE must work fine. But the opposite is not true.

So why they prefer t-SNE on scRNA-Seq data? I guess the reason is that the data from scRNA-Seq is pretty much noisy due to the intrinsic difficulty, and PCA doesn't generate the good-looking chart. But t-SNE can generate the perfect figure. If you see t-SNE chart on papers, maybe you'd better to doubt the quality of raw data, I think.

Support

Help - Theory & Case Study

Why PCA on bulk RNA-Seq and t-SNE on scRNA-Seq?