Getting .fastq.gz files from Gene Expression Omnibus (GEO).

When downloading the FASTQ files of a GSE record of GEO, you can download them one by one. It's quite tedious. So, it's a good idea to run it using multiple windows at the same time. 

The procedure

1. Open a GSM record of GEO website.
2. At the bottom of the page, there is a link to the SRX record. Click it.
3. See information about the sequence reads, especially whether it's SINGLE or PAIRED.
4. Since the number of SRR is at the bottom, pass this to the fastq-dump command of the SRA tool. If there are multiple SRR numbers, get the all the FASTQ files one by one.

Open a Command Prompt (or Terminal on macOS) window and move to the bin folder of the SRA Toolkit with cd command.

For example, create a fastqdump folder in the Documents folder, and type a command like;

fastq-dump --gzip --split-files --outdir "C:Users[user name]Documentsastqdump" SRR12345

  • --gzip option helps to save time.
  • --split-files option is required only for Paired-End reads. You have to remove this for the Single-End experiment.
  • --outdir option can be used to select any folder.
  • Change the SRR number of the sample you want to download.

If you have SRR1234.fastq.gz, you'd better rename as GSM5678. If it's a paired-end sample, rename the pair like GSM5678_1.fastq.gz and GSM5678_2.fastq.gz.

If there are multiple SRRs for a single GSM, concatenate all of them.

Concatenate with Command Prompt of Windows

copy /b SRR1234.fastq.gz  + SRR1235.fastq.gz  + SRR1236.fastq.gz  GSM5678.fastq.gz

Concatenate with Terminal of macOS

cat SRR1234.fastq.gz  SRR1235.fastq.gz  SRR1236.fastq.gz > GSM5678.fastq.gz