RNA-Seq vs. Microarray

Vaguely might you think that RNA-Seq is better than microarray, but is it true? The best way to make it clear is to measure the same sample set by both and compare. The result of directly comparing RNA-Seq and microarray data is shown in the movie at right.

It is clear that the sensitivity of RNA-Seq (HiSeq2000) is inferior to that of Agilent microarrays. The dynamic range of RNA-Seq is narrower than Agilent, and the sensitivity of differential expression is weaker than Agilent.

So why is the groundless myth that RNA-Seq is better than microarray so spread out? Let's see what Illumina insists in "Benefits of RNA-Seq vs. Microarray Technology," and how they are wrong.

What Illumina Insists

Unbiased detection of novel transcripts

Unlike arrays, RNA-Seq technology does not require species- or transcript-specific probes. It can detect novel transcripts, gene fusions, single nucleotide variants, indels (small insertions and deletions), and other previously unknown changes that arrays cannot detect.

(from Illumina's website)

This is theoretically true, but practically very difficult because sequence reads are generally mapped to determined gene models in standard methods. You cannot find novel transcripts generally.

Broader dynamic range

With array hybridization technology, gene expression measurement is limited by background at the low end and signal saturation at the high end. RNA-Seq technology quantifies discrete, digital sequencing read counts, offering a broader dynamic range.

(from Illumina's website)

This is true when you compare with Illumina's BeadChip data. However, it is unfair to generalize the conclusion because it is not the best microarray technology. As you watched in the movie above, the result is opposite if you compare with Agilent's microarrays.

Increased specificity and sensitivity

Compared to microarrays, RNA-Seq technology offers increased specificity and sensitivity, for enhanced detection of genes, transcripts, and differential expression.

(from Illumina's website)

There are some clear mistakes in their comparing ways. Please read the details in the next section.

Easier detection of rare and low-abundance transcripts

Sequencing coverage depth can easily be increased to detect rare transcripts, single transcripts per cell, or weakly expressed genes.

(from Illumina's website)

This is also theoretically true, but not practically. A sequencer can rarely capture lowly expressing RNAs, and you need many replicates to capture all transcripts. It is super hard and expensive, comparing to Agilent microarray.

Consequently, even though what they say are theoretically true, they are not always true in reality. There are some experiments which are probably possible only with sequencing, so it is worth using this technology for these purposes. However, you must know such challenges are onerous and costly.

The Tricks in Comparing

We know some papers mention that RNA-Seq is superior to microarrays in sensitivity of detecting lowly- and differentially-expressing genes. We read those papers and found the following common problems in their logic. If you do not understand why they are problematic, please read the explanation about the dynamic ranges of microarray and RNA-Seq technologies.

They directly compare numbers of not-same units.

They often directly compare signal intensity of microarray vs. FPKM of RNA-Seq on a scatterplot or they instantly compare ranges of these values as saying like 5-order or 5-log. You can do it unofficially because it is fast and easy. However, it is not proper, or pretty fundamental mistake to directly compare numbers in different units in official scientific statements.

They do not use count but use FPKM/RPKM.

They use FPKM or RPKM, but not count data. It sounds biologically adequate that larger genes can have more sequence reads and read count must be normalized by length. What is the matter?

If you carefully compare count and FPKM values, you will notice that the range of FPKM is wider than count's. However, this is only reflecting the variety of gene size. So it is doubtful to conclude "wide dynamic range." Moreover, the average size of genes is larger than 1kb, and it means FPKM tends to be smaller. So it is also debatable to conclude "better sensitivity in low level."

Additionally, FPKM does not make sense even in the biological context. If your question is "which gene is more expressing in this sample?" you have to convert count to FPKM. However, general questions are "which sample does express this gene more/less?" Converting to FPKM is not necessary to answer such questions.

They wrongly mention "dynamic range."

If you compare replicate samples, values are expected to be similar. If you look at the scatterplot of counts in the movie above, values less than 30 look random. So this is the noise range of count values. You cannot trust value in the noise range, even if there are measurements.

On the contrary, the noise range in FPKM's scatterplots is not apparent (‘9 “30). This is only due to the side effect of gene size normalization, but of course, it does not mean FPKM data is noise-free. As you know, the dynamic range is not the whole signal range but the range of values which are trustfully measured by the system.

They wrongly call the entire range as the dynamic range.

Please also read what we describe about the dynamic range of RNA-Seq.

They unfairly generalize the conclusion.

There are a variety of microarrays, and the performance (precision, sensitivity, detection ability of differentially expressed genes) vary.

If they say "RNA-Seq is better than Illumina's or Affymetrix's microarray," we agree. However, Agilent's microarray is superior to RNA-Seq as the movie shows.

It is a cheap trick to generalize the conclusion by unfairly comparing with not-the-best one.  

Please read more about a variety of dynamic ranges of microarray technologies.

Why the myth goes so broad?

A limited number of figures in papers cannot represent the complex aspects of omics data. So they can bring the wrong impression. We know authors publish with the raw data as the evidence though; it is time-consuming to re-analyze the raw data, and nobody takes time to argue about the analysis and conclusion. This is the root of the problem.

We realized the importance of making omics data more visible and tangible to all researchers. We provide Subio Platform for freeing the omics data from the black box.

Services for Assessment and Experimental Planning

Tons of omics data are now available through the internet though; it is laborious and time-consuming to use it for technology assessment.

So we provide "Data Analysis Service" to help your assessment. We support your understanding of the "actual" ability and limitation of the methods and technologies. It significantly reduces the risk of failure.

We also provide "Premium Analysis Service." It is not only a substitution service for doing experiments. We support from experimental designing to minimize the risk of failure. This service is valuable especially if it is difficult to reproduce the research, such as large-scale studies or using rare and sporadic samples.

Back to Top