In the previous article, “A Case Study Comparing edgeR, DESeq2, and t-test in RNA-Seq Differential Expression Analysis (2) | A Medium-Sized Biopsy Dataset,” we compared the genes extracted by edgeR, DESeq2, and t-test. The results showed that many of the differences among the methods were concentrated in the low-count region. In regions with sufficient signal, the results of the three methods in the independent two-group comparison did not differ greatly. In particular, the intersection of the three methods showed clear expression differences. However, in that analysis, the non-lesional and lesional groups were treated as two independent groups.
On the other hand, by taking into account the paired structure of non-lesional and lesional samples obtained from the same patient, we would expect to subtract individual differences and evaluate the change from non-lesional to lesional more directly. In general statistical analysis, this corresponds to a paired t-test. In edgeR and DESeq2, patient-to-patient differences can be treated as a blocking factor by including patient in the design formula. Intuitively, using patient as a blocking factor seems likely to increase detection power, because individual differences are subtracted and the analysis focuses on the change from non-lesional to lesional. However, the statistical models used by edgeR and DESeq2 are not necessarily that simple.
Let us compare DEG lists calculated while taking the paired structure into account.
Scope of this article: comparison limited to the signal region
As shown in the previous article, in RNA-Seq differential expression analysis, ordinary p-value-based test results are mainly easier to interpret for genes in the signal region. In the low-count region, unstable measurements and ON/OFF-type expression patterns are mixed together, making ordinary test results difficult to interpret directly.
In this article, genes with Gene Counts of 20 or higher in at least half of the samples in both the non-lesional and lesional groups were defined as the QC1 list. This QC1 list is treated as the gene set representing the signal region.
However, when running edgeR and DESeq2, p-values were calculated using all genes. The resulting p-value lists were then imported into Subio Platform and combined with the QC1 list to extract only the significant genes that belonged to the signal region.
The reason for this procedure is that filtering the gene set before calculating p-values with edgeR or DESeq2 changes the population of genes included in the analysis, which can also change the assumptions for dispersion estimation and multiple testing. Filtering low-count genes is a useful preprocessing step in practice, but it is also a process that changes the set of genes being analyzed.
The purpose of this article is not to create a standard final DEG list using edgeR or DESeq2. Rather, the purpose is to compare which genes each method calls significant when applied to the same GSE121212 data. For this reason, p-values for edgeR and DESeq2 were calculated using all genes, and the interpretation was then limited to the signal region.
In addition, to compare the tendency of each method to call genes significant, this article uses the p<0.05 lists from each method. FDR is important when narrowing down final DEG candidates, but here the first goal is to visualize and compare how the significance calls differ among methods.
Paired t-test almost included the significant genes from edgeR and DESeq2
We compared the results of paired t-test, edgeR including patient, and DESeq2 including patient. Genes with p<0.05 in edgeR and DESeq2 were almost all also significant in the paired t-test. On the other hand, the paired t-test extracted many genes that were not significant in edgeR or DESeq2.
Fig1: Comparison of edgeR, DESeq2, and paired t-test while accounting for patient. When the analysis was limited to the QC1 signal region, genes that were significant in edgeR and DESeq2 were almost all also significant in the paired t-test. In contrast, the paired t-test called a broader set of genes significant, including genes with smaller changes.
In the scatter plot, genes significant by paired t-test were widely distributed even near the diagonal line. This indicates that the paired t-test is sensitive not only to large fold changes, but also to small changes that are consistent within patients.
However, this result should not be interpreted simply as showing that the paired t-test is superior to edgeR/DESeq2, or that edgeR/DESeq2 missed those genes. Both paired t-test and edgeR/DESeq2 can account for paired structure, but they evaluate different data types and use different noise models.
With paired design, t-test and edgeR/DESeq2 changed in opposite directions
To understand the difference described above, we compared the two-group comparison from the previous article with the paired design used in this article.
In the t-test, changing from a two-group t-test to a paired t-test greatly increased the number of significant genes. This is likely because subtracting patient-to-patient baseline expression differences increased sensitivity to genes whose changes from non-lesional to lesional were consistent within patients.
In contrast, in edgeR and DESeq2, including patient in the design formula for a paired design reduced the number of significant genes compared with the two-group comparison.
Fig2: Comparison between the two-group comparison and the paired design. In the t-test, using a paired t-test greatly increased the number of significant genes. In contrast, in edgeR and DESeq2, including patient in the design formula reduced the number of significant genes compared with the two-group comparison.
This result shows that paired design is not simply a procedure that increases detection power. edgeR and DESeq2 model Gene Counts using a negative binomial distribution and perform testing while considering mean counts, dispersion, library size, gene-specific variability, and the uncertainty of a GLM model that includes the patient term. Therefore, even when the direction of change is consistent within patients, genes may not become significant if the difference is not evaluated as sufficiently stable count-based evidence.
Are paired t-test only genes merely noise?
Genes that were significant only in the paired t-test were not called significant by edgeR or DESeq2. Therefore, they should be treated with caution as final strong DEG candidates.
However, this does not necessarily mean that these genes should be completely discarded. As shown in Fig1, although these genes have small changes, they belong to the QC1 signal region and were detected by the paired t-test as consistent within-patient changes.
We therefore calculated the log2 ratio of lesional to non-lesional for each matched patient and examined the result using a heatmap. Each column represents a patient, and each cell shows the lesional/non-lesional log2 ratio for that patient.
Fig3: Heatmap using the log2 ratio of lesional to non-lesional samples from the same patient. The color represents the lesional/non-lesional log2 ratio for each patient. The intersection genes showed clear upregulated and downregulated patterns. The paired t-test only genes showed smaller changes, but the direction of change from non-lesional to lesional was consistent.
For the intersection genes that were significant in edgeR, DESeq2, and paired t-test, the within-patient changes were large, and clear upregulated and downregulated patterns were visible in the heatmap.
In contrast, for the paired t-test only genes, the magnitude of change was small, but many genes showed a consistent direction of change from non-lesional to lesional. This suggests that the paired t-test only genes are not merely a random set of noisy genes, but include genes with small but consistent expression changes.
Subpatterns within the paired t-test only genes
When the paired t-test only genes were examined in more detail, they appeared to separate into several subclusters, including broad upregulated and downregulated patterns.
Fig4: Clustering patterns of the paired t-test only genes. Although the magnitude of change was small, the genes separated into subpatterns including upregulation, downregulation, and patient-to-patient differences in the magnitude of change.
Cluster 1 and Cluster 2 appeared to be gene groups upregulated in AD lesional samples. In contrast, Cluster 3, 4, and 5 appeared to be gene groups downregulated in AD lesional samples. In particular, the downregulated genes did not appear to form a single homogeneous pattern, but instead separated into multiple subclusters.
This classification is not intended as strict cell-type identification. It is an exploratory organization of representative expression patterns within the paired t-test only genes, based on the clustering results in the heatmap.
Paired t-test only genes also showed functional coherence
The table below is a direct reproduction of the result obtained by inputting the representative genes for each cluster into ChatGPT, together with the information that, in the comparison between AD non-lesional and lesional samples, Cluster 1 and Cluster 2 were upregulated in lesional samples, whereas Cluster 3, 4, and 5 were downregulated in lesional samples.
This table summarizes functional tendencies based on representative gene names. It does not directly identify cell types. In bulk RNA-Seq, changes in cell-type composition, cell state, and expression programs are observed together. Therefore, the table should be treated here as a set of candidate interpretations for the clusters.
| Cluster | Change in AD lesional | Main representative genes | Suggested functional tendency | Suggested related cellular components or states |
|---|---|---|---|---|
| 1 | ↑ | HNRNPC, HNRNPK, U2AF1, SRSF2, PSMA/PSMB group, NUP group, ATP5F1 group, NDUFS group | Increased RNA processing, translation, nuclear transport, protein degradation, and energy metabolism | Activated keratinocytes, proliferative epidermis |
| 2 | ↑ | Gene group close to Cluster 1 | Housekeeping and metabolic activation program similar to Cluster 1 | Activated keratinocytes, proliferative epidermis |
| 3 | ↓ | IL18, KIT, SOX10, EPAS1, EFNB2, FILIP1L, SASH1, PLXDC2 | Vascular-related programs, hypoxia response, immune regulation, and specialized mesenchymal programs | Possible perivascular cells, specialized stroma, and melanocyte-related components |
| 4 | ↓ | YY1, APC, FBXW7, AGO1, SMAD2, LATS1, ATM, SETD2, CREBBP, PCDH group | Programs related to differentiation maintenance, transcriptional regulation, cell adhesion, and tissue homeostasis | Differentiated dermal cells, neural-related signaling, homeostasis-maintenance programs |
| 5 | ↓ | EGFL7, PROX1, AQP1, ACKR3, PDGFRA, HGF, VCAN, SVEP1, ABI3BP, PTGIS | Vascular and lymphatic function, ECM maintenance, and tissue homeostasis | Lymphatic endothelial cells, vascular endothelial cells, fibroblasts, and perivascular stroma |
In the upregulated Cluster 1 and Cluster 2, many genes were related to RNA processing, translation, protein degradation, and energy metabolism. This can be interpreted as a change related to activated keratinocytes and proliferative epidermis in AD lesional samples.
In contrast, in the downregulated Cluster 3, 4, and 5, genes related to blood vessels, lymphatic vessels, ECM, tissue homeostasis, and differentiation maintenance were prominent. These changes may reflect alterations in tissue structure and cell states in lesional skin.
Thus, although the paired t-test only genes showed small fold changes, they had a structure that would be difficult to dismiss completely as noise. Even though they were not called significant by edgeR or DESeq2, they showed consistent within-patient changes in the signal region, and functional coherence was visible at the cluster level. Therefore, they are worth considering when interpreting the analysis results.