How to setup the pipeline of the RNA-Seq FASTQ file processing (Windows10 version)

This is a guide how to prepare for importing RNA-Seq FASTQ files to Subio Platform on a Windows computer. If you use a Mac, please go to the guide for macOS.

You are about to run the Linux programs on a Windows computer. To make them work, you have to activate Windows Subsystem for Linux (WSL1) and install Ubuntu 18.04 or 20.04. The WSL works on Windows 10. HISAT2 and StringTie run significantly slow on WSL2, so we currently recommend you work with WSL1.

Subio Platform utilizes the following tools to process the RNA-Seq FASTQ files.

  • fastp to trim adapters and filter low-quality reads.
  • HISAT2 to align reads on the reference genome.
  • StringTie to assemble alignments and to estimate gene expression levels.

Instruction:

  1. Activate WSL.
  2. Get Ubuntu from the Microsoft App Store.
  3. Launch Ubuntu to initialize it. Set the username and password for the Linux.
  4. Download the binary files of fastp, HISAT2, and StringTie for Linux systems.
  5. Set the path to these binary files on the Subio Platform's setting panel.
  6. Download the HISAT2 indexes of the suitable organism.
  7. Download the GTF file of the target organism, of the same genome version as the HISAT indexes.
  8. Set the path to the indexes and GTF file on the Subio Platform's setting panel.

The preparation of RNA Seq data processing on Windows10

The FASTQ file processing takes a long time. So add the following option in the fastp setting. It limits the number of read-sequences to process, and it tells you the pipeline works fine or not within a couple of minutes. After the confirmation, delete this option to execute on the whole.

--reads_to_process=100000

Fastp Option

We confirmed that the pipeline works correctly with the following versions. The different versions of the tools might cause error.

  • fastp 0.22.0
  • HISAT2 2.1.0
  • StringTie 2.1.1

We notice that HISAT2 2.2.0 is a bit trickier to make it work because you have to remove all spaces from the paths to the index and GTF.

The version of fastp should be 0.22.0. Other versions might have problems with the Subio Platform's pipeline. If you see any problems with your fastp, please download linux-64/fastp-0.22.0-h2e03b76_0.tar.bz2 from Anaconda's fastp archive site and replace the executable file.