When FASTQ Processing Takes Too Long in Subio Platform - Considering Salmon / kallisto for Large Datasets

Subio Platform can use FASTP, HISAT2, and StringTie to calculate Gene Counts and TPM from FASTQ files, and then import the results directly into Subio Platform for further analysis.

To use this workflow, you first need to set up an environment where FASTP, HISAT2, and StringTie can be executed. On Windows, this means setting up WSL so that Linux programs can run. On macOS, this means installing the necessary tools using Anaconda or a similar environment. The setup and execution procedures are explained in detail on separate pages.

Once this environment has been set up, you can run FASTQ file processing from Subio Platform and proceed from the creation of Gene Counts and TPM to data import. After that, in most cases, you only need to select the FASTQ files and run the process, so this workflow is simple and convenient when the number of samples is small.

On the other hand, when the number of samples is large, processing time can become a problem. Because the FASTP, HISAT2, and StringTie workflow includes read mapping, it can take several hours per sample, depending on the performance of the PC and the size of the FASTQ files.

For example, in an environment where processing takes 5 to 6 hours per sample, only about 4 to 5 samples can be processed in 24 hours of continuous processing. This may not be a major problem for a few samples, but when processing several dozen or more FASTQ files, it is worth considering the option of using fast quantification tools such as Salmon or kallisto to create a Gene Counts-like table first and then import it into Subio Platform.

For processing many FASTQ files, a local environment is more practical

When running the process in a local environment, using Jupyter Notebook or a similar tool can help you keep commands and notes together in one notebook. This makes it easier to record which command was run for each sample, which reference data was used, and which output files were created.

However, processing FASTQ files on a local PC requires a certain level of computing resources. As a guideline, you should prepare at least around 16 GB of RAM and enough free disk space to store the FASTQ files, index files, and output results.

Ask AI for the details based on your own environment

The specific way to use Salmon or kallisto depends on the OS, PC environment, single-end / paired-end settings, reference data, FASTQ file names, and output format.

Rather than trying to explain one fixed procedure that works for every environment, it is more practical to tell AI about your own environment and ask it to guide you through installation, index creation, running each sample, and summarizing the result files.

The overall workflow is to first quantify expression using Salmon or kallisto, then use R packages such as tximport to summarize the output into a gene-level Gene Counts-like table. After that, you can import the table into Subio Platform and proceed with visualization and analysis in the same way as with ordinary expression data.

Example prompt for AI:

I have paired-end RNA-Seq FASTQ files.
I want to use Salmon in Jupyter Notebook on a Windows PC to quantify expression,
then use R packages such as tximport to create
a gene-level counts table that can be imported into Subio Platform.
Please explain the steps for beginners, including setting up a conda environment,
installing Salmon, preparing the transcriptome index,
running many FASTQ files in batch,
and creating a gene-level counts table with tximport.

By asking in this way, AI can organize the steps according to your current environment. If an error occurs during the process, you can show the error message to AI and ask for help identifying the cause and how to fix it.

Summary

No matter which method you use to process FASTQ files, the most important point is how you check and interpret the expression table you created. Check the expression data visually in Subio Platform, and then proceed to normalization, PCA, clustering, differential expression analysis, and GO / Pathway analysis.

Choosing the Right Grinding Method

Support Help - Installing & Using Software When FASTQ Processing Takes Too Long in Subio Platform - Considering Salmon / kallisto for Large Datasets