P-value is not so important for biology.

  • Gene Expression
  • Microarray
  • High-Throughput Sequencing

Inexperienced analysts tend to too-much rely on P values. But you need a balanced view point on the P value. P value is only one of indexes.

Firstly, you have to precisely understand what the P-value is. The statistical significance has nothing to do with the biological significance. Please carefully read the following statement.


1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the
probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on
whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the
importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or

Secondly, a large-scale study already revealed that P value is inferior to fold-change combined with a non-stringent P value cutoff to get a reproducible differentially expressed genes (DEGs) lists.

Investigating the concordance of Gene Ontology terms reveals the intra- and inter-platform reproducibility of enrichment analysis

Handbook of Statistical Bioinformatics; 9.4.7 The Rat Toxicogenomics Study; A Valication of Reproducibility of Microarray Results. page 188.

Most of the previous studies questioning the reproducibility and reliability of microarrays for gene expression analysis are based on the statistical significance (P value) alone instead of the actual measured quantity of differential expression (fold change or ratio) for selecting DEGs. The reliance on only P value to create DEGs lists has resulted in the apparent irreproducibility between test sites and between microarray platforms. Our results from analyzing data sets from the MAQC human reference RNA samples and the rat toxicogenomics study samples indicate that a straightforward approach of fold change ranking combined with a non-stringent P value cutfoff can successfully generate reliable DEG lists. Furthermore, compared to P value ranking, this joint method can minimize the impact of normalization methods on the reproducibility of DEGs lists. That is, the DEG list from P value ranking based gene selection methods is more susceptible to the choice or normalization methods. We recommend a straightforward approach of fold change ranking combined with a non-stringent P value cutoff as a baseline practice for microarray data analysis to reproducible lists of DEGs. The fold change criterion ensures the reproducibility of DEGs and the P value criterion controls false positives.

What's scientifically important is concordant output which the third-party can reproduce. If you abuse stringent P-values, it results in the over-fitting problem.

Thirdly, consider the entire logical structure of the manuscript. The purpose of using P-values is to limit the false-positives. But the omics usually is used in the discovery phase, where you have to mind false-negatives rather than false positives. You examine the hypothesis generated from omics results from multi-aspects. For example, if you confirm the expression pattern by RT-PCR, it is proving it with another method. It is much more substantial evidence than adding samples of the same technology. And then use the P-values is useful at the phase just before the development like the clinical trial.

And lastly, the biologically essential genes are likely to have moderately low P-values from the general biological models. Genes with very low P-values are likely to be downstream of pathways, which are helpful for the diagnostic or predicting purpose though you might hardly expect them as causal genes.