A New Style of RNA-Seq Data Analysis|Analyze with R/Python × Visualize and Manage Data with Subio

R & Python X Subio Platform

Subio Platform allows you to perform data visualization and preprocessing intuitively.

At the same time, R and Python are useful for more advanced statistical analysis and custom analysis.

By combining them, you can build a workflow for analyzing data while understanding it.

This page explains the data formats and practical workflow for integrating Subio Platform with R / Python.


Why Integrate Subio Platform with R / Python?

Subio Platform is strong in data visualization and in accumulating and reusing analysis results.

On the other hand, R and Python are well suited for various types of advanced statistical processing.

By combining the two, you can:

  • Check the state of the data while visualizing it
  • Perform statistical analysis flexibly
  • Bring the results back into Subio Platform for visualization and interpretation
  • Accumulate analysis results and retrieve them later for re-analysis

This makes it possible to choose analysis methods flexibly and turn analysis results into reusable research assets.


Data That Can Be Exported from Subio Platform

Subio Platform can export the following types of data for use in R / Python.

Data Format Main Uses
Gene Counts / TPM / FPKM (raw data or preprocessed data) TSV Differential expression analysis (DESeq2 / edgeR), PCA / UMAP, clustering, classification and prediction using machine learning, etc.
Sample information (groups, conditions, etc.) TSV Group settings for differential expression analysis, classification and labeling (such as supervised machine learning), and batch information management
Annotated gene lists TSV Enrichment analysis and network analysis
Genomic region lists with values (ChIP-Seq, methylation, CNV, etc.) BED Genomic position-based analysis and multi-omics integration analysis

These data can be read directly into R or Python.


Data That Can Be Imported into Subio Platform

Results analyzed in R or Python can be imported into Subio Platform for visualization and interpretation.

For example, the following types of data can be imported:

  • Differential expression analysis results from statistical models (P values, FDR, etc.)
    By importing them as a Measurement List, you can visualize genes with differential expression and compare shared or condition-specific genes using Venn diagrams.
  • Normalized or corrected data
    By importing them as another set of samples, you can compare them with the original data and visually examine differences among correction methods.
  • Classification or characterization results from clustering or machine learning
    By importing them as sample attributes, you can use them for visualization, comparison, validation, survival curve analysis, and other analyses.
  • Genomic regions with associated numerical values
    These can be visualized as bar graphs or heat maps in the genome browser, and used to examine relationships or correlations with gene expression.
  • Component information obtained from advanced analyses
    PCA loadings, component vectors from NMF or ICA, and similar outputs can be imported and used to visualize and examine sample or gene features in Subio Platform.

Note: In linear methods such as PCA and NMF, these coefficients can be used to calculate sample scores or positions.

Note: Nonlinear methods such as UMAP and t-SNE do not define coefficients in the same way, but their sample coordinates can be used for visualization.

By importing these data, you can visualize and interpret analysis results from multiple perspectives.


Common Points to Note

When integrating Subio Platform with R or Python, keep the following points in mind:

  • When handling tabular data in Subio Platform, use TSV format. CSV files should be converted to TSV format when necessary.
  • Gene IDs and sample names can be handled flexibly, but it is recommended to use consistent IDs and naming rules so that the correspondence among samples, genes, and analysis results remains clear.

If these points are not handled properly, sample or gene correspondence may shift, leading to misinterpretation of the analysis results.


Using AI such as ChatGPT for Analysis

In recent years, more researchers have started using tools such as ChatGPT to create R or Python code and support specific analysis tasks.

By organizing data in Subio Platform and generating R/Python code only for the necessary parts, you can:

  • Minimize the amount of code you need to handle
  • Make it easier to identify the cause of errors
  • Manage the analysis workflow as a set of smaller steps

Building a large analysis pipeline with AI all at once carries risks. On the other hand, using small programs in combination can be a practical and efficient approach.

The Analyst Must Ultimately Check the Results

Generative AI can be used to create R/Python scripts, but only the analyst can judge whether the results are appropriate.

Subio Platform allows you to manage data locally and proceed with analyses, including integration with R and Python, within your local environment. It provides an environment where you can visually check the output from AI by yourself.

[Preparation] Use Subio Platform to understand the overall data structure and export data for analysis
[Computation] Run AI-generated code in R/Python
[Interpretation] Bring the results back into Subio Platform, visually examine them, and interpret the findings

The following articles explain the actual operations in detail.


Next Steps

With Subio Platform, analysis results obtained in R or Python can be accumulated in a form that can be visualized, compared, managed, and reused. Download the free version and try the workflow with actual data.

Download Subio Platform for Free

The RNA-Seq Data Analysis Tutorial walks through Gene Counts import, normalization, filtering, PCA, clustering, differential expression analysis, and enrichment analysis, while checking the state of the data at each step.

It also introduces a workflow for differential expression analysis using DESeq2 and edgeR, where R scripts are generated with ChatGPT and the results are visualized and checked in Subio Platform.

The experience of manually checking data helps you understand the nature of RNA-Seq data and the characteristics of different analysis methods, and provides a foundation for designing future automation strategies. This perspective is useful not only for experimental researchers, but also for bioinformaticians who design analysis pipelines or systems.