RNA-Seq vs. Microarray

Vaguely might you think that RNA-Seq is better than microarray, but is it true? The best way to make it clear is to measure the same sample set by both and compare. The result of directly comparing RNA-Seq and microarray data is shown in the movie at right.

It is clear that the sensitivity of RNA-Seq (HiSeq2000) is inferior to that of Agilent microarrays. The dynamic range of RNA-Seq is narrower than Agilent, and the sensitivity of differential expression is weaker than Agilent.

So why is the groundless myth that RNA-Seq is better than microarray so spread out? Please take a look at the following section to examine what Illumina insists in "Benefits of RNA-Seq vs. Microarray Technology."

RNA-Seq data analysis is trickier than microarrays'. So we recommend Data Analysis Service more for people having RNA-Seq data files. We also provide RNA-Seq Data Analysis Support Service for who have difficulties in converting the raw data (FASTQ) to gene expression levels.

What Illumina Insists

Unbiased detection of novel transcripts

Unlike arrays, RNA-Seq technology does not require species- or transcript-specific probes. It can detect novel transcripts, gene fusions, single nucleotide variants, indels (small insertions and deletions), and other previously unknown changes that arrays cannot detect.

(from Illumina's website)

This is theoretically true, but practically very difficult because sequence reads are generally mapped to determined gene models in standard methods. You cannot find novel transcripts generally.

Broader dynamic range

With array hybridization technology, gene expression measurement is limited by background at the low end and signal saturation at the high end. RNA-Seq technology quantifies discrete, digital sequencing read counts, offering a broader dynamic range.

(from Illumina's website)

This is true when you compare with Illumina's BeadChip data. However, it is unfair to generalize the conclusion because it is not the best microarray technology. As you watched in the movie above, the result is opposite if you compare with Agilent's microarrays.

Increased specificity and sensitivity

Compared to microarrays, RNA-Seq technology offers increased specificity and sensitivity, for enhanced detection of genes, transcripts, and differential expression.

(from Illumina's website)

There are some clear mistakes in their comparing ways. Please read the details in the next section.

Easier detection of rare and low-abundance transcripts

Sequencing coverage depth can easily be increased to detect rare transcripts, single transcripts per cell, or weakly expressed genes.

(from Illumina's website)

This is also theoretically true, but not practically. A sequencer can rarely capture lowly expressing RNAs, and you need many replicates to capture all transcripts. It is super hard and expensive, comparing to Agilent microarray.

Consequently, even though what they say are theoretically true, they are not always true in reality. There are some experiments which are probably possible only with sequencing, so it is worth using this technology for these purposes. However, you must know such challenges are onerous and costly.

Banner traveler

The Tricks in Comparing

We know some papers mention that RNA-Seq is superior to microarrays in sensitivity of detecting lowly- and differentially-expressing genes. We read those papers and found the following common problems in their logic. If you do not understand why they are problematic, please read the explanation about the dynamic ranges of microarray and RNA-Seq technologies.

They directly compare numbers of not-same units.

They often directly compare signal intensity of microarray vs. FPKM of RNA-Seq on a scatterplot or they instantly compare ranges of these values as saying like 5-order or 5-log. You can do it unofficially because it is fast and easy. However, it is not proper, or pretty fundamental mistake to directly compare numbers in different units in official scientific statements.

They do not use count but use FPKM/RPKM.

They use FPKM or RPKM, but not count data. It sounds biologically adequate that larger genes can have more sequence reads and read count must be normalized by length. What is the matter?

If you carefully compare count and FPKM values, you will notice that the range of FPKM is wider than count's. However, this is only reflecting the variety of gene size. So it is doubtful to conclude "wide dynamic range." Moreover, the average size of genes is larger than 1kb, and it means FPKM tends to be smaller. So it is also debatable to conclude "better sensitivity in low level."

Additionally, FPKM does not make sense even in the biological context. If your question is "which gene is more expressing in this sample?" you have to convert count to FPKM. However, general questions are "which sample does express this gene more/less?" Converting to FPKM is not necessary to answer such questions.

They wrongly mention "dynamic range."

If you compare replicate samples, values are expected to be similar. If you look at the scatterplot of counts in the movie above, values less than 30 look random. So this is the noise range of count values. You cannot trust value in the noise range, even if there are measurements.

On the contrary, the noise range in FPKM's scatterplots is not apparent (‘9 “30). This is only due to the side effect of gene size normalization, but of course, it does not mean FPKM data is noise-free. As you know, the dynamic range is not the whole signal range but the range of values which are trustfully measured by the system.

They wrongly call the entire range as the dynamic range.

Please also read what we describe about the dynamic range of RNA-Seq.

They unfairly generalize the conclusion.

There are a variety of microarrays, and the performance (precision, sensitivity, detection ability of differentially expressed genes) vary.

If they say "RNA-Seq is better than Illumina's or Affymetrix's microarray," we agree. However, Agilent's microarray is superior to RNA-Seq as the movie shows.

It is a cheap trick to generalize the conclusion by unfairly comparing with not-the-best one.  

Please read more about a variety of dynamic ranges of microarray technologies.

Why the myth goes so broad?

A limited number of figures in papers cannot represent the complex aspects of omics data. So they can bring the wrong impression. We know authors publish with the raw data as the evidence though; it is time-consuming to re-analyze the raw data, and nobody takes time to argue about the analysis and conclusion. This is the root of the problem.

We realized the importance of making omics data more visible and tangible to all researchers. We provide Subio Platform for freeing the omics data from the black box.

This is the frontier, you know.

You asked a lot of people and finally found that nobody knows the way. There is no guide map, of course, because it is a frontier. Remember, what you need is the frontier spirit.

We provide "Experimental Planning Support Service" for better experimental design, better choice of technology, and lessen the risk of failure. We also help the assessment of methods or systems that you are going to take. We recommend not believing somebody's talk blindly. Why won't you prepare as much as best before you go?

We also provide high-quality experiment outsourcing services. Even if it looks pricey compared to other service providers, we recommend you choose this in the following cases. (1) You have more than 12 samples. (2) Re-sampling is extremely difficult. (3) You plan a prospective study. And (4) You have samples from FFPE or sorted cells that require excellently sophisticated skills.

Banner epilogue
Back to Top