This page provides a clear explanation of the limitations and data quality issues in single-cell RNA-Seq (scRNA-Seq) analysis. ________________________________________
Introduction: Have You Ever Looked Behind the “Beautiful Figures”?
Single-cell RNA-Seq (scRNA-Seq) has become a standard approach in omics analysis. The colorful visualizations generated by tools like Loupe Browser often resemble immunostaining (FISH) images, giving the impression that they faithfully represent the true state of cells.
However, one must be careful not to be misled by this visual appeal. When data are examined in detail using Subio Platform, it becomes evident that beneath these images lie underlying data fragility that cannot be fully corrected computationally, as well as stochastic detection events that occur purely by chance.
________________________________________
1. The Problem of Sequencing Depth in scRNA-Seq and the Illusion of “Cellular Identity”
This is an issue we raised as early as 2019, when single-cell RNA-Seq was just beginning to gain traction.
In protocols such as those from 10x Genomics, the number of reads per cell is only on the order of tens of thousands. Compared to conventional RNA-Seq, this is dramatically low.
A low read count implies an extremely limited dynamic range. When visualizing histograms of read counts across cells in Subio Platform, it becomes immediately clear that there are substantial differences in sequencing depth among cells.

The figure above visualizes scatter plots comparing cells with the lowest sequencing depth and those with the highest depth (generated from GSE164898). The black dots represent ribosomal protein genes, which are expected to be highly expressed in all cells. From this figure, the following observations can be made:
- Low-depth cells: Even ribosomal protein genes are only barely detectable.
- High-depth cells: At best, only a few genes show expression levels comparable to ribosomal proteins.
This leads to an important conclusion: the technology itself carries a structural risk of misinterpreting simple “depth variation” as “cellular identity.”
Another fundamental principle must not be forgotten:
“Not detected” does not mean “not expressed.”
It may simply mean that the gene was not detected by chance.
The “beautiful figures” produced by analysis tools are, in reality, constructed from such highly unstable data. We must recognize this inherent uncertainty.
________________________________________
2. Do Not Ask AI Only for “Solutions” (A Perspective from 2026)
Now, let us shift to the perspective of 2026. The key difference from 2019 is that anyone can now consult AI for analysis.
If you ask AI, “What are the limitations of scRNA-Seq?”, you will likely receive a reasonable answer. However, try the following: ask AI, “How can batch effects in scRNA-Seq be corrected?” You will quickly be presented with sophisticated methods such as MNN or Harmony.
If you are unaware of the fundamental limitations of data quality, you may accept these answers without question. However, once you understand the underlying fragility of the data, a critical question naturally arises:
“Can data with missing signals truly be ‘corrected’ through computation alone?”
Once this question emerges, you can begin to interrogate AI more deeply—asking about the characteristics and limitations of each method. Instead of focusing only on solutions (How), you examine the underlying problem structure (What). Recognizing contradictions and maintaining a healthy skepticism—this sense of “something feels off”—is essential for analysts to survive in the age of AI. It is the essence of critical thinking.
________________________________________
The Mirage Created by Advanced Normalization
Between 2019 and 2026, there have been advances in normalization methods for scRNA-Seq. CPM (Counts Per Million) is commonly used to correct for depth variation, but in cells with extremely low read counts, the detection of even a single read can disproportionately influence expression values, amplifying variability.
This phenomenon—noise amplification in low-input data—is not unique to single-cell analysis. As shown in Case Study 403, it is a universal challenge in data analysis, also encountered when analyzing lowly expressed genes in bulk RNA-Seq.
More advanced methods such as TMM and Median-of-ratios (originally designed for bulk RNA-Seq) are now recommended in some contexts. However, applying them to scRNA-Seq data can be a risky gamble. This is because it is inherently difficult to determine whether the “corrected data” produced by sophisticated algorithms are actually closer to the truth than simple CPM.
These methods rely on a mathematical assumption: that most genes are not differentially expressed. But recall that in scRNA-Seq, only a small subset of highly expressed genes can be reliably detected. If we are effectively observing only this limited subset, the validity of that assumption becomes questionable.
In practice, rather than blindly trusting a single method, analysts should compare results across multiple normalization approaches. By examining both normalized and raw count data side by side, one must determine which representation introduces the least distortion—or at least provides the most reasonable interpretation.
________________________________________
If you wish to explore raw scRNA-Seq data in Subio Platform, you can download a matrix file from GEO and ask ChatGPT to generate a Python script to convert it into a dense matrix TSV format (genes as rows, cells as columns). Running the script in Jupyter will produce a file that can be directly imported into Subio. Thanks to AI, such “tasks” have become remarkably easy. For more details, see the separate article: “Generating Code with ChatGPT and Executing It in Jupyter.”
Even as of 2026, scRNA-Seq remains a developing technology, and its reliability is far from absolute. Rather than assuming that “advanced methods guarantee correctness,” we should remain aware of current limitations while anticipating future improvements—particularly in measurement sensitivity and stability (rather than merely in correction algorithms). Above all, we must maintain a mindset of critical thinking.
________________________________________
Limitations of Single-Cell RNA-Seq — and How to Work with Them
Single-cell RNA-Seq is a powerful technique.
However, due to its inherent characteristics, it is highly affected by noise and bias,
and interpreting the results requires careful judgment.
Clustering and marker gene identification are commonly performed,
but whether these results truly reflect biological meaning must be critically evaluated.
The key issue is not that limitations exist.
The problem is proceeding with analysis without recognizing them.
That is why it is essential to examine your data carefully and make informed decisions.
Subio Platform provides an environment designed for this kind of validation.
- Check how clusters relate to sample conditions
- Visualize relationships between clusters and batch effects
- Detect distortions and biases in data distributions
Instead of accepting results at face value,
it enables you to verify them for yourself.
If you want to learn how to perform RNA-Seq analysis in practice,
please refer to our RNA-Seq analysis tutorial.
________________________________________
Conclusion: Distinguishing Truth with Your Own Eyes
This is precisely why you need to examine your data directly using Subio Platform.
If you rely on black-box workflows and simply accept “plausible” results generated by AI,
you may never even notice these underlying issues.
What truly matters is recognizing the distortions and limitations in raw data
and verifying them yourself.
Only through this process can you move beyond being a passive user of tools
and become an analyst who draws conclusions based on your own judgment.
________________________________________
Related Topics
- RNA-Seq data quality depends on RNA input amount
(Low-input RNA-Seq does not achieve the same data quality as standard bulk RNA-Seq)