Why PCA on bulk RNA-Seq and t-SNE on scRNA-Seq?

  • Gene Expression
  • High-Throughput Sequencing

t-SNE is now widely used in single-cell RNA-Seq data analysis. However, explanations of why PCA is commonly used for bulk RNA-Seq, while t-SNE or UMAP is often used for single-cell RNA-Seq, are not always easy to find in a convincing form.

How to Use t-SNE Effectively is a useful article for understanding common pitfalls in interpreting t-SNE plots. Here, we will consider how PCA and t-SNE can be used differently in RNA-Seq data analysis, based on the characteristics discussed in that article.

1. Hyperparameters really matter

The output of t-SNE can change substantially depending on the parameter settings. This property is reflected in the following characteristics.

2. Cluster sizes in a t-SNE plot may not be meaningful
3. Distances between clusters may not mean anything

When interpreting PCA results, distances between samples, directions, and the size of variation are very important pieces of information. In t-SNE, however, these types of information are not necessarily preserved in a straightforward way. If this point is overlooked, it is easy to read too much meaning into the distances or arrangement of clusters in a t-SNE plot.

In addition, with PCA, it is possible to project new samples onto an already created PCA space. With t-SNE, the same kind of handling can be difficult. In analyses such as bulk RNA-Seq, where we often want to examine relationships among samples or understand where new samples are positioned, this difference can also be important.

4. Random noise may not always look random

With t-SNE, even when the underlying data structure is not clear, cluster-like patterns may appear visually. Therefore, clusters observed in a t-SNE plot should not be interpreted immediately as biologically meaningful groups. They should be checked together with the original data, expression patterns, sample information, and experimental design.

Understanding these characteristics helps clarify that PCA and t-SNE are not simply better or worse than each other. Rather, they are visualization methods suited to different purposes.

In bulk RNA-Seq, the number of samples is usually relatively small, and each sample reflects experimental conditions, individual differences, possible batch effects, and other sample-level factors. For this reason, PCA, which allows us to examine distances, directions, and the size of variation among samples, is highly useful for understanding the overall state of the data. With PCA, potential outliers, differences between conditions, and possible batch effects can often be assessed in a relatively intuitive way.

In contrast, in single-cell RNA-Seq, the number of cells is very large, and the main purpose is often to explore cell population structures and subclusters. In such cases, nonlinear dimensionality reduction methods such as t-SNE and UMAP are useful for visually capturing local structures among cell populations. However, in a t-SNE plot, distances between clusters, directions, and cluster sizes do not necessarily have direct biological interpretations.

The important point is not to judge a plot only by whether the groups appear to be clearly separated. It is necessary to understand what each visualization method preserves and what it may lose, and then interpret the result together with expression patterns, sample information, batch information, and the experimental design.

In RNA-Seq data analysis, PCA is not just a figure. It is an important step for checking the state of the data and deciding how to proceed with the analysis. The following tutorial also explains how PCA can be used within the RNA-Seq data analysis workflow in Subio Platform.

RNA-Seq Data Analysis Tutorial: Checking the Overall Data Structure with PCA and Clustering

Data Analysis Service

Subio also provides reanalysis of public datasets and analysis support for your own data. Rather than accepting published conclusions or analysis results as they are, we help you visualize the data so that the results can be critically examined and interpreted.