Comparing edgeR, DESeq2, and the t-test for RNA-Seq Differential Expression Analysis (4)｜What Happens in Low-Input RNA-Seq?

In the previous articles, we compared the differences in genes identified by edgeR, DESeq2, and the t-test in RNA-Seq differential expression analysis. In Case Study 1, we examined low-variance, small-sample in vitro data. In Case Study 2, we looked at a medium-sized biopsy-derived dataset. In Case Study 3, we examined the effect of accounting for a paired design.

In this article, we use low-input RNA-Seq data to examine how the data structure underlying differential expression analysis changes as the number of input cells decreases.

In low-input RNA-Seq, ultra-low-input RNA-Seq, and pico-input RNA-Seq, protocols and kits such as SMART-Seq, SMART-Seq2, SMART-Seq v4, SMARTer-based kits, and the NEBNext Low Input RNA Library Prep Kit are used to prepare libraries from very small amounts of RNA.

These methods have made it easier to obtain RNA-Seq data from small numbers of cells or very small amounts of RNA. However, being able to obtain data is not the same as being able to perform stable differential expression analysis from that data. In this article, we compare what kinds of genes are identified when low-input RNA-Seq data are analyzed using edgeR, DESeq2, and the t-test, and consider how the resulting DEG lists should be interpreted.

Dataset used in this analysis

In this article, we used the Gene Counts data from GSE130882. This dataset includes RNA-Seq data prepared from different numbers of input cells. Here, we used the SMART_Nxt data and compared CD3+B7-1 Fc stimulation against CD3 stimulation as the control, using input cell numbers of 100 cells, 1k cells, and 100k cells. Each condition has two replicates.

Raw Gene Counts were used to calculate p-values with edgeR and DESeq2. On the other hand, normalized Gene Counts, adjusted for differences in data amount among samples, were used in Subio Platform to display scatter plots, check fold change criteria, and visualize signal regions.

In 100 cells, replicate reproducibility is already disrupted before comparing conditions

First, we examined how well the replicate samples within the same condition agreed with each other in 100 cells and 1k cells (Fig1 left and middle columns).

CaseStudy428 Fig1: 100cells 1kcells replicates

Fig1: Replicate comparisons and condition mean comparisons in 100 cells and 1k cells. The left column compares the CD3 replicates, the middle column compares the CD3+B7-1 Fc replicates, and the right column compares the mean values of CD3 and CD3+B7-1 Fc. The scatter plots are displayed using normalized Gene Counts (Processed Signal).

In 100 cells, large variability is observed in the low-to-middle signal range even when comparing CD3 replicates with each other or CD3+B7-1 Fc replicates with each other. In particular, many genes remain at low values in one sample but appear as values of several hundred to several thousand in the other sample.

This distribution suggests that the observed values are strongly affected by whether a small number of RNA molecules are captured in the early steps of reverse transcription and PCR amplification, rather than being measured as continuous expression levels.

In contrast, in 1k cells, the agreement between replicates is greatly improved compared with 100 cells. Although variability still remains in the low-signal range, a continuous distribution begins to form in the signal region, and some regions appear more suitable for comparative analysis.

100k cells can be used as a high-input reference pattern

Next, we examined the 100k-cell data. In 100k cells, replicate comparisons within the same condition fall almost along the diagonal line, showing a data structure similar to the low-variance, small-sample data in Case Study (1).

CaseStudy428 Fig2: 100kcells reference

Fig2: Replicate comparisons, condition mean comparison, and a three-method Venn diagram in 100k cells. The upper-right scatter plot compares the mean values of CD3 and CD3+B7-1 Fc. The Venn diagram shows the overlap of genes with p < 0.05 by edgeR, DESeq2, and t-test in 100k cells. Among the 6039 genes in the central overlap, genes that additionally showed a fold change of 1.4 or greater are shown as black dots in the upper-right scatter plot.

In 100k cells, the common set of genes with p < 0.05 by edgeR, DESeq2, and t-test is large. By combining this with a fold change criterion, we can define a set of stable differential expression candidates under the high-input condition (Fig2 right).

Here, we do not treat this gene set as the “true DEG” set. Instead, we use it as a high-input reference for interpreting the results from 100 cells and 1k cells. Because ordinary experiments usually do not allow comparison with such a high-input condition, datasets such as GSE130882 are very useful for understanding the behavior of low-input RNA-Seq.

How do the reference genes from 100k cells appear in 100 cells and 1k cells?

In Fig3, the differential expression candidates defined in 100k cells are shown as black dots, and we examined how they appear on the scatter plots for 100 cells and 1k cells.

CaseStudy428 Fig3: reference genes in 100cells 1kcells

Fig3: High-input reference genes defined in 100k cells displayed on the CD3/CD3+B7-1 Fc comparisons for 100 cells and 1k cells. Black dots indicate genes that were common to all three methods in 100k cells and also showed a fold change of 1.4 or greater. The orange lines indicate empirical fold change and signal thresholds set to extract the black-dot genes with relatively high purity under low-input conditions.

In 100 cells, even genes defined as differential expression candidates in 100k cells are widely scattered, from regions with almost no expression change to regions showing large expression changes. If DEG candidates are to be extracted from this type of data while minimizing false positives, it is effective to use an ON/OFF-like extraction criterion that combines a larger fold change threshold with a higher signal threshold (Fig3 left).

In contrast, in 1k cells, the region converging toward the diagonal line expands overall. Compared with 100 cells, the structure reflecting expression changes induced by CD3+B7-1 Fc becomes visible in 1k cells, especially in the region above 5000 (Fig3 right).

However, in most regions below that level, it is difficult to stably extract genes as ordinary DEGs. Even in 1k cells, it is useful to combine broader fold change criteria and higher signal thresholds with an ON/OFF-like extraction strategy.

When viewed only by p-values, 100 cells produce unstable gene sets that differ by method

Next, for 100 cells and 1k cells, we displayed genes with p < 0.05 by edgeR, DESeq2, and t-test as black dots.

CaseStudy428 Fig4: pvalue genes 100cells 1kcells

Fig4: Distribution of genes with p < 0.05 by edgeR, DESeq2, and t-test in 100 cells and 1k cells. Black dots indicate genes with p < 0.05 by each method. The title of each panel shows the number of genes judged significant. The scatter plots are displayed using normalized Gene Counts (Processed Signal).

CaseStudy428 Fig5: overlap with 100k reference

Fig5: Overlap between genes with p < 0.05 in 100 cells or 1k cells and the high-input reference gene set defined in 100k cells. The high-input reference gene set consists of genes that were commonly detected with p < 0.05 by edgeR, DESeq2, and t-test in 100k cells, and that additionally satisfied a fold change of 1.4 or greater.

In 100 cells, many genes with p < 0.05 by edgeR or DESeq2 are found not in a continuous signal region, but in regions where one condition appears to be amplified while the other condition is almost undetected (Fig4 upper row, left and middle panels).
In contrast, with the t-test, a relatively broad range of genes is judged significant even in 100 cells (Fig4 upper row, right panel).
However, for all three methods, the overlap with the high-input reference gene set defined in 100k cells was only about 30% (Fig5 upper row).

In 1k cells, the region that appears to behave like a signal region expands much more clearly than in 100 cells. At first glance, this may suggest that ordinary DEG analysis is possible. Indeed, if we look only at the calculated p-values, the differential expression analysis appears to work (Fig4 lower row).
However, compared with the high-input reference in Fig3, the overlap between the p-value-based gene sets and the reference gene set was only about 40–60% (Fig5 lower row).

In real research datasets, there is usually no answer-checking dataset equivalent to 100k cells. Therefore, the fact that the results “look plausible” is itself an interpretive risk. In other words, low-input RNA-Seq data require more cautious interpretation than the appearance of the data may suggest.

Based on these observations, applying statistical models to low-input RNA-Seq data for differential expression analysis carries substantial risk. Possible analysis strategies include the following.

Combine a broader fold change criterion with a higher signal threshold using an ON/OFF-like extraction strategy. However, the number of false negatives will become extremely large.
Relax the thresholds further toward the inside of the scatter plot. In exchange, one must accept that many false positives may be included.

The problem is not the statistical method, but the data structure

These results do not show that edgeR, DESeq2, or t-test is inappropriate as a method. The problem is that, especially in 100 cells, the data structure itself is not suitable for applying and interpreting ordinary differential expression tests.

In 100 cells, values in the low-to-middle signal range do not appear to be measured as stable, continuous expression levels. Rather, they appear to change greatly depending on whether a small number of molecules enter the amplification process. Therefore, in ultra-low input RNA-Seq such as 100 cells, even if ordinary DEG extraction is performed based on p-values, it is very difficult to interpret the results as stable expression differences.

A related issue in scRNA-Seq analysis

These results also have implications for statistical methods used in scRNA-Seq data analysis. In scRNA-Seq, the amount of RNA per cell is even smaller, and the observed values are strongly affected by detection or non-detection, capture efficiency, amplification, library complexity, cell state, and cell-type composition.

Statistical models and workflows developed specifically for scRNA-Seq are designed with this type of data structure in mind. However, they do not eliminate the instability of the data generation process itself. Using a single-cell-specific method does not turn single-cell data into stable expression-level data like bulk RNA-Seq. Therefore, scRNA-Seq requires even more cautious interpretation than the low-input RNA-Seq data examined in this article.

Support

Help - Theory & Case Study

Comparing edgeR, DESeq2, and the t-test for RNA-Seq Differential Expression Analysis (4)｜What Happens in Low-Input RNA-Seq?