Why TPM, FPKM, and RPKM Should Not Be Used for RNA-Seq Differential Expression Analysis - DEG Analysis Should Start from Gene Counts

When working with RNA-Seq data, you may wonder whether expression values such as TPM, FPKM, or RPKM can be used directly for differential expression analysis.

Many people have questions such as:

Can TPM be used for differential expression analysis?
Can t-tests be applied to FPKM or RPKM?
Can TPM or FPKM be used as input for DESeq2?
How should we understand the differences among TPM, FPKM, RPKM, and Gene Counts?
Can TPM or FPKM be used for visualization such as PCA or clustering?

In older RNA-Seq papers and public datasets, expression values reported as RPKM, FPKM, or TPM are also commonly found. For this reason, when looking at those values, it may appear as if performing differential expression analysis using RPKM, FPKM, or TPM is a standard RNA-Seq analysis workflow.

However, the fact that these values were widely used in the past does not mean that they are appropriate input data for differential expression analysis.
DESeq2 and edgeR, which are widely used for RNA-Seq differential expression analysis today, are designed to use Gene Counts as input, not TPM, FPKM, or RPKM.

In other words, at least in current standard differential expression analysis using DESeq2 or edgeR, Gene Counts should be used as the starting point.

The main reason to use TPM or FPKM, if any, is to compare relative expression levels among genes within the same sample. On the other hand, when sample-to-sample comparison, DEG analysis, and downstream interpretation are handled as a consistent analysis workflow, there is little reason to use TPM or FPKM.
The reasons are explained below.

TPM, FPKM, and RPKM Are Values for Viewing Expression Levels, Not Standard Input for Differential Expression Analysis

TPM, FPKM, and RPKM are all expression measures calculated from RNA-Seq read counts while taking gene length and library size into account.

In raw Gene Counts, genes with low counts have small values, making it relatively easy to recognize that they are more affected by measurement variability and sampling noise. (as shown in the left panel below)

However, TPM, FPKM, and RPKM include a correction step for gene length.
As a result, especially for short genes, even a small original count can become a relatively large transformed value. Conversely, for long genes, even a moderately large count can become a relatively small value. This can cause low-count genes to spread along the diagonal direction in scatter plots, making values that may have low reliability appear like ordinary expression data. (as shown in the middle and right panels below)

You may have seen the explanation that FPKM or TPM should not be used with edgeR or DESeq2 because they “disrupt the variance structure” of count data. As the figure below shows, in Gene Counts, low-count regions tend to show larger relative variability, and the relationship between expression level and variance is relatively clear. However, in FPKM and TPM, unstable values derived from low Counts are spread upward by gene-length normalization, making this relationship much harder to see. For this reason, using FPKM or TPM as input for edgeR or DESeq2 means that the data no longer follow the count-based variance structure assumed by these methods.

Gene Counts vs FPKM vs TPM

same amount looks larger

Can TPM Be Used for Differential Expression Analysis?

TPM is sometimes described as an expression measure that is easier to compare across samples than FPKM or RPKM. For this reason, one might think that TPM, unlike FPKM or RPKM, could be used for differential expression analysis.

However, TPM does not fundamentally solve this problem. In TPM, values are first corrected for gene length and then scaled so that the total expression within each sample becomes constant. Therefore, the instability derived from low counts does not disappear.

In addition, TPM has another important limitation. TPM is scaled so that the total expression within each sample is constant. As a result, when the TPM value of one gene becomes larger, the TPM values of other genes may appear relatively smaller.

In other words, TPM represents relative proportions within a sample. It cannot be treated as a value in which the expression level of each gene changes independently. This property is known as a compositional data problem.

The Main Reason to Use TPM, FPKM, or RPKM Is to Compare Genes Within the Same Sample

So what should TPM, FPKM, and RPKM be used for?

The main reason to actively use these values is to compare relative expression levels among genes within the same sample. For example, if you want to see whether gene A or gene B is relatively more highly expressed within a single sample, TPM or FPKM, which account for gene length and library size, can be useful.

However, this does not mean that TPM, FPKM, or RPKM makes gene-to-gene comparison completely reliable. These values are still observed values affected by measurement conditions, mapping, annotation, gene-specific mappability, and data processing. Therefore, even if a value is displayed as TPM or FPKM, it should not be treated as an absolute value that directly represents the true expression level.

TPM or FPKM Should Not Be Used as Input for DESeq2 or edgeR

DESeq2 and edgeR are widely used statistical methods for RNA-Seq differential expression analysis. They are designed based on RNA-Seq Gene Counts. Therefore, if TPM or FPKM is used as input for DESeq2 or edgeR, the data no longer match the count-data properties assumed by the model.

Can TPM or FPKM Be Used with t-tests?

TPM or FPKM that still contains instability from low-count regions is also unsuitable for t-tests. With Gene Counts, this instability is mainly concentrated in the low-count region, so preprocessing and filtering can be used to prepare the data for statistical testing.

However, in TPM or FPKM, instability derived from low counts becomes dispersed because of gene-length correction, making it difficult to handle through preprocessing or filtering. As a result, fluctuations near the detection limit or unstable differences derived from low counts may be detected as statistically meaningful differences.

Can TPM or FPKM Be Used for PCA or Clustering?

TPM or FPKM is sometimes used for displaying expression levels or for visualization such as PCA and clustering. However, this does not mean that it is always recommended to use Gene Counts for differential expression analysis and TPM or FPKM for PCA and clustering.

As described above, differential expression analysis is performed based on Gene Counts. If PCA or clustering used to validate and interpret those results is performed using TPM or FPKM, it becomes more difficult to integrate and interpret the analysis results as a whole. In other words, it is more consistent to use Gene Counts as the starting point from differential expression analysis through result validation.

For those who want to analyze RNA-Seq data consistently starting from Gene Counts

Subio Platform allows you to import Gene Counts and proceed through normalization appropriate for count-based data, filtering, PCA, clustering, expression pattern review, and differential expression analysis, while visually checking the data throughout the workflow.

A step-by-step workflow using real GEO data is explained in our RNA-Seq Data Analysis Tutorial. For an overview of the software, please visit the Subio Platform page.

What If Gene Counts Are Not Available?

CPM is a value corrected only for library size, and it is not affected by gene-length correction or by the compositional constraint specific to TPM. Therefore, although CPM cannot fully reproduce the statistical model of Gene Counts, it retains properties that are closer to the original Gene Counts than TPM, FPKM, or RPKM do.

CPM is not recommended as input for DESeq2 or edgeR. However, with appropriate normalization, preprocessing, and filtering, it can be easier to use for exploratory analyses such as t-tests, PCA, and clustering.

Therefore, if Gene Counts are not available, starting from CPM is a more appropriate option.

Summary

TPM, FPKM, and RPKM are expression measures that are often encountered when working with RNA-Seq data. However, they should not be used as standard input for differential expression analysis.

TPM, FPKM, and RPKM are mainly values for comparing relative expression levels among genes within the same sample.
TPM does not fundamentally solve problems related to low-count instability or variance.
Differential expression analysis should be based on Gene Counts.
For PCA and clustering, it is also more consistent to use values derived from Gene Counts.
If Gene Counts are not available, CPM is a more appropriate option.

Support

Help - Theory & Case Study

Why TPM, FPKM, and RPKM Should Not Be Used for RNA-Seq Differential Expression Analysis - DEG Analysis Should Start from Gene Counts