RNA-Seq Is Not a Simple Upgrade from Microarrays in Gene Expression Analysis - Understanding the Strengths and Limitations of Each Approach

In gene expression analysis, RNA-Seq and microarray are sometimes compared in terms of “which method is better.” RNA-Seq is also often described as a newer and more powerful method than microarray. Indeed, RNA-Seq has many advantages, such as not depending on probe design, covering a broader range of transcripts, and allowing greater flexibility for re-analysis.

However, RNA-Seq is not simply an upgraded replacement for microarray. Because RNA-Seq and microarray are based on different measurement principles, they differ in which genes are easier or harder to observe, and in which ranges can be compared and interpreted with confidence.

Therefore, a simple comparison of “which method is better” is not enough to correctly understand the differences between the two in gene expression analysis. In this page, we review the differences in measurement principles between RNA-Seq and microarray, and consider how to use them appropriately and interpret the results based on their respective characteristics.

There are multiple microarray platforms. In this page, we use Agilent microarrays as an example, as they are still used as of 2026 in some studies and when comparing new data with existing data assets. Unless otherwise noted, the term “microarray” in this page refers to Agilent microarrays.

Illustration of carefully comparing RNA-Seq and microarray on a balance scale

The Difference Between RNA-Seq and Microarray Lies in Measurement Principles

In RNA-Seq, a library prepared from RNA is sequenced, and the resulting reads are assigned to a genome or transcriptome to estimate expression levels for each gene. As a result, read depth, library preparation, gene length, sequence mappability, the handling of multi-mapping reads, annotation, and expression estimation methods can all affect the results.

In contrast, microarrays measure signals obtained when nucleic acids from a sample hybridize to pre-designed probes. The targets are limited to genes or regions for which probes have been designed. However, within the same platform, the fixed probe set can make comparisons relatively stable and easy to interpret.

In other words, although RNA-Seq and microarray both appear to measure “gene expression,” they estimate expression levels based on different measurement principles. Therefore, when interpreting differences between results, it is necessary to check the target sequence, read mappability, the possibility of multi-mapping, and the region measured by the probe.

Early RNA-Seq Comparison Papers Should Be Read Carefully Today

From the early to mid-2010s, many papers compared RNA-Seq and microarray. These papers often stated that RNA-Seq could detect more genes, was better for low-expression genes, and had a wider dynamic range than microarray.

However, many comparisons during that period focused on demonstrating the advantages of RNA-Seq as a new technology. They tended to emphasize the number of detected genes and the broad range of expression values seen with FPKM or RPKM. From today’s perspective, “genes that are detected” and “genes whose measurements are stable enough to be trusted for analysis” were not always clearly distinguished.

In RNA-Seq, even genes assigned only a small number of reads may be counted as detected genes. However, genes with only a few reads cannot necessarily be interpreted as having stable expression differences or expression patterns across samples.

What matters in gene expression analysis is not simply how many genes are detected, but which genes can be used for sample-to-sample comparison and with what level of reliability. It is risky to judge that RNA-Seq always provides more useful information based only on the number of detected genes.

Be Careful When Interpreting the Wide Dynamic Range Seen with FPKM/RPKM

RNA-Seq is often described as having a wider dynamic range than microarray. However, in early RNA-Seq comparison papers, the wide dynamic range of RNA-Seq was sometimes emphasized even though the number of reads obtained by sequencers at that time was lower than it is today.

In particular, RNA-Seq data generated using platforms such as the Genome Analyzer II (GAII) often had limited read depth compared with current bulk RNA-Seq data. In such data, low-expression genes may not have had enough read counts. Therefore, claims that RNA-Seq can broadly quantify low-expression regions need to be read with caution.

Today, deeper sequencing and methods using UMI (Unique Molecular Identifier) can sometimes reduce the effects of PCR bias and duplicate reads. However, even when UMIs are used, the number of observed molecules itself is small in low-expression regions. Therefore, caution is still needed when deciding whether differences between samples can be interpreted as stable expression differences.

Earlier comparison papers often discussed the wide dynamic range of RNA-Seq using normalized values such as FPKM or RPKM. FPKM and RPKM are values in which read counts are adjusted for library size and gene length. As a result, even low-count genes can appear as continuous expression values after normalization.

However, a broad distribution of normalized values does not necessarily mean that the measurements are stable and reliable. In low-expression regions, the original read counts are small. If dynamic range is evaluated only from FPKM or RPKM values, the practically reliable quantitative range may be overestimated.

Therefore, conclusions in early RNA-Seq comparison papers stating that “RNA-Seq has a wider dynamic range than microarray” should not be accepted without checking the actual read counts, the original count distribution, the handling of low-expression genes, and the apparent spread created by FPKM or RPKM.

Genes and Cases Where RNA-Seq Tends to Be Advantageous

RNA-Seq has advantages that microarrays do not have. For example, it does not depend on pre-existing probe sets, can be re-analyzed using updated annotations, and can be extended to analyses based on sequence information.

Even at the same expression level, longer genes may be more likely to produce reads in RNA-Seq, making them less prone to insufficient counts in low-expression regions. Therefore, even for low-expression genes, RNA-Seq may make it easier to examine expression patterns when the gene is long and has high sequence mappability.

RNA-Seq is also often described as useful for examining transcript structures and isoforms. However, transcript-level expression values obtained from typical short-read RNA-Seq are estimates based on fragmented reads, not direct observations of full-length transcripts. They should not be treated in the same way as data obtained from long-read RNA-Seq, which can read full-length transcripts more directly.

In addition, when expression is estimated at the transcript level, reads are divided among multiple isoforms, so the number of reads assigned to each transcript becomes smaller. Therefore, in general RNA-Seq data that were not designed specifically for isoform analysis, the output values may be obtained, but they are not always reliable enough to interpret with confidence.

Genes or cases where RNA-Seq tends to be advantageous	Reason
Genes without designed probes	They are outside the measurement target of microarrays
Genes to be reviewed using updated annotations	They may become analyzable through re-mapping or re-counting
Low-expression genes that are long and have high mappability	They may still obtain enough read counts to examine expression patterns
Cases where transcript structures or isoforms need to be examined	Sequence information can be used, although short-read RNA-Seq depends on estimation and requires sufficient read depth

Detecting Genes Without Probes Is Not the Same as Interpreting Them Reliably

One important advantage of RNA-Seq is that, unlike microarray, it does not require probes to be designed in advance. As a result, genes without probes, genes included in updated annotations, and new transcripts can become analysis targets.

This is an important feature of RNA-Seq. However, being able to detect a gene is not the same as being able to interpret it with the same confidence as well-characterized known genes.

For genes with insufficient annotation, genes with highly similar sequences, and low-expression genes, read assignment and expression estimation may be unstable. In particular, genes whose intron-exon structures are supported experimentally by cDNA libraries or ESTs may not have the same reliability as genes without such support.

Read assignment and expression estimation are performed automatically by software. However, the resulting numerical values cannot all be treated as measurements with the same level of reliability. The reliability of expression values can be affected by the support level of gene models, read mappability, the possibility of multi-mapping, and similar sequences in nearby regions.

Furthermore, even when expression values are obtained, it is difficult to interpret the results biologically if the function of the gene or transcript, the cell types in which it is expressed, or its known biological role is unclear. In practice, even if many uncertain, unannotated transcripts are detected, they are often difficult to place at the center of analysis or interpretation because their relationship to the research question cannot be clearly explained.

In other words, RNA-Seq expands the range of measurable targets. However, it does not expand the range of interpretable genes to the same extent. In general gene expression analysis, detecting unknown genes or transcripts is not always a major practical advantage.

Genes and Cases Where Microarray Tends to Be Advantageous

Microarrays depend on pre-designed probes, so their measurement targets are limited. This is a constraint, but for certain genes, it can also become an advantage.

For low-expression and short genes, RNA-Seq may not obtain enough read counts, or the counts may be zero to only a few reads, making sample-to-sample variation appear large. In contrast, when a highly specific probe exists and the signal is sufficiently above background noise, microarray may allow more stable comparison.

However, short genes are not always difficult for RNA-Seq. In current deep bulk RNA-Seq, short genes can still be analyzed stably if they are sufficiently expressed, contain unique exons, and have high mappability. The essential issue is not gene length itself, but whether enough effective counts are obtained for sample-to-sample comparison.

In gene families that share highly homologous regions, short-read RNA-Seq reads may correspond to multiple genes. Depending on the analysis method, multi-mapping reads may be excluded, or they may be distributed among multiple candidate genes to estimate expression levels.

In such cases, the estimated expression values may appear closer to each other even when the actual expression levels differ among genes. In other words, for highly similar gene families, RNA-Seq values may not sufficiently reflect gene-specific expression differences.

On the other hand, when probes are designed in regions with high specificity, microarray may provide signals that make gene-specific differences easier to interpret.

Measuring short and highly homologous microRNAs is a challenge for both RNA-Seq and microarray. For microarrays, one potential advantage is that the measurement system can be designed not only based on probe sequences, but also based on the physical properties of hybridization.

Reference: Agilent miRNA microarray probe design

Genes or cases where microarray tends to be advantageous	Reason
Low-expression and short genes	RNA-Seq may not obtain enough effective counts
Highly similar gene families	Read assignment can be ambiguous in RNA-Seq, and estimated expression values may appear closer depending on the method
Genes with well-designed probes	They may be easier to compare as stable continuous signals
Relative comparison of known genes	Stable comparison is easier within the probe-covered range
Comparison with accumulated historical data	Data from the same or similar platforms can be reused more easily

For Long-Term Diagnostic Use, the Stability of the Measurement System Also Matters

One advantage of microarray is that it is based on a physical measurement platform with fixed probe sequences and positions. This is a constraint, but it can also be an advantage because the same measurement system can be maintained more easily over time.

In diagnostic tools and long-term clinical testing, it is important to keep the measurement targets, measurement methods, and decision criteria as stable as possible. In microarrays, each probe position and the gene or region it measures are fixed in advance. Therefore, as long as the same platform is used, the measurement system itself can be kept relatively fixed.

In contrast, RNA-Seq involves many steps before expression values are obtained: RNA extraction, library preparation, rRNA depletion or poly(A) selection, fragmentation, PCR, sequencing conditions, read mapping, and expression estimation. If the reagents, protocols, instruments, or analysis pipeline change at any step, the resulting expression values may also be affected.

In research, flexibility can be valuable. However, in diagnostics, stability can be valuable. RNA-Seq is a highly flexible and powerful method for research, but when it is used as a diagnostic tool operated under the same criteria over many years, fixing, validating, and maintaining the reproducibility of the entire measurement system becomes a major challenge.

Look at the Performance You Can Actually Rely On, Not Just Theoretical Capabilities

RNA-Seq is a technology that, in theory, can capture many types of information that are difficult to obtain with microarrays. However, what is described as “possible” in papers or technical explanations is not always the same as what many researchers can use stably in routine data analysis.

In this article, we do not compare RNA-Seq and microarrays based on their theoretical maximum capabilities. Instead, we compare them from the perspective of practical performance in gene-level expression profiling: whether many researchers can use the data, check reproducibility, and interpret the results in a realistic workflow. From this perspective, mature Agilent microarrays have characteristics that can be comparable to RNA-Seq in terms of dynamic range and quantitative stability.

The difference between RNA-Seq and microarrays is not simply a difference between “new technology” and “old technology.” Because they are based on different measurement principles, the types of genes they handle well, the types of genes they handle less well, and the conditions under which results can be compared reliably are different. Therefore, even when RNA-Seq results and microarray results differ, we should not immediately assume that one is correct and the other is wrong.

What matters is not judging the reliability of results based only on the name of the technology. Instead, it is important to consider the original data distribution, read counts, signal intensity, gene length, mappability, probe design, annotation, and sample-to-sample variation together, and to choose the analysis method and way of handling the data that best fit the research purpose.

Accumulating Old and New Gene Expression Data as Research Assets

Even now that RNA-Seq is widely used, the value of previously measured microarray data does not disappear. Microarrays have advantages such as long-term accumulated data, validated measurement systems, and ease of comparison with previous studies.

On the other hand, RNA-Seq has the advantage of being independent of probes and allowing more flexible re-analysis. What matters is not choosing one and discarding the other. It is important to understand the measurement principles and limitations of both, and to have an environment where data can be accumulated, compared, and re-analyzed when needed.

By revisiting not only new data but also data accumulated in the past, research continuity and reproducibility can be improved. Gene expression data are not results that should be analyzed once and forgotten; they are research assets that can be reviewed later from different perspectives.

Use Subio Platform to Examine RNA-Seq and Microarray Data in the Same Environment

Subio Platform supports both RNA-Seq and microarray data. You can import Gene Counts, normalized expression data, microarray signal data, and other expression datasets, then proceed to visualization, filtering, PCA, clustering, differential expression analysis, and enrichment analysis in the same environment.

For specific analysis procedures, please see the following tutorials:

Because Subio Platform supports both RNA-Seq and microarray data, you can visualize, compare, and understand both newly generated data and previously accumulated data. Why not start by using Subio Platform to examine actual RNA-Seq and microarray data and see the differences for yourself?

Support Help - Theory & Case Study RNA-Seq Is Not a Simple Upgrade from Microarrays in Gene Expression Analysis - Understanding the Strengths and Limitations of Each Approach