How to setup the pipeline of the RNA-Seq FASTQ file processing (Windows version)

This is a guide how to prepare for importing RNA-Seq FASTQ files to Subio Platform on a Windows computer. If you use a Mac, please go to the guide for macOS.

You are about to run the Linux programs on a Windows computer. To make them work, you must activate Windows Subsystem for Linux and install Ubuntu 18.04 or 20.04. The WSL is available on Windows 10 or later. If you use Windows 11, you can choose either WSL1 or WSL2. And both work for this pipeline.

The movie below covers installing Ubuntu on Windows 10. If you work with Windows 11, please get the instruction from other websites. You don't need to install the Linux GUI package.

And the fastp installation process changed. It's easy to get the executable file of faspt from the anaconda archive. Please see the last paragraph of this page for details.

Subio Platform utilizes the following tools to process the RNA-Seq FASTQ files.

  • fastp to trim adapters and filter low-quality reads.
  • HISAT2 to align reads on the reference genome.
  • StringTie to assemble alignments and to estimate gene expression levels.

Instruction:

  1. Activate WSL.
  2. Get Ubuntu from the Microsoft App Store.
  3. Launch Ubuntu to initialize it. Set the username and password for the Linux.
  4. Download the binary files of fastp, HISAT2, and StringTie for Linux systems.
  5. Set the path to these binary files on the Subio Platform's setting panel.
  6. Download the HISAT2 indexes of the suitable organism.
  7. Download the GTF file of the target organism, of the same genome version as the HISAT indexes.
  8. Set the path to the indexes and GTF file on the Subio Platform's setting panel.

The preparation of RNA Seq data processing on Windows10

The FASTQ file processing takes a long time. So add the following option in the fastp setting. It limits the number of read-sequences to process, and it tells you the pipeline works fine or not within a couple of minutes. After the confirmation, delete this option to execute on the whole.

--reads_to_process=100000

Fastp Option

We confirmed that the pipeline works correctly with the following versions. The different versions of the tools might cause error.

  • fastp 0.22.0
  • HISAT2 2.1.0
  • StringTie 2.1.1

We notice that HISAT2 2.2.0 is a bit trickier to make it work because you have to remove all spaces from the paths to the index and GTF.

The version of fastp should be 0.22.0. Other versions might have problems with the Subio Platform's pipeline. If you see any problems with your fastp, please download linux-64/fastp-0.22.0-h2e03b76_0.tar.bz2 from Anaconda's fastp archive site and replace the executable file.