How to setup the environment for the pipeline of FASTQ to VCF (RNA-Seq) tool. (macOS version)

This is a preparation guide for using the FASTQ to VCF tool.

Install GATK4

Please finish setting up the pipeline for RNA-Seq FASTQ file processing on macOS first, because this pipeline also uses the same tools of fastp and HISAT2. What you add is only one more software, GATK4, using bioconda. So, just run the following command with the terminal to do it.

$ conda install -c bioconda gatk4

The Environment Set-up Panel

Setting up the pipeline is like doing it for the RNA-Seq FASTQ file processing. Settings for the fastp and HISAT2 executable are the same. The GATK Resource Bundle is available only for human (b37/hg19 and Grch38/hg38) but not for other organisms. 

However, the indexes provided by the HISAT2 website cause the error on GATK execution because their chromosome names are like "1", not "chr1." So, download the "chr" added version of GRCh38 HISAT2 indexes and GTF file that we added "chr" to chromosome names. Please uncompress the zip files before you use them.

FASTQ to VCF (RNA-Seq) Environment Set-up Panel (macOS)

1. Java 8 Executable

Go to the anaconda3 install directory, and select /bin/java

2. GATK jar Executable

Go to the anaconda3 install directory, and select pkgs/gatk[version]/share/gatk[version]/gatk-package[version]-local.jar

3. GTF/GFF3 File

You can download the VCF file of GRCh38 version from here. This GTF file is modified version of the original provided by EBI. Chromosome names have "chr." And to make this work, you need to use it together with the "chr" added version of HISAT2 index.

4. dbSNP VCF File

Get it from gcp-public-data--broad-references on Google Cloud. For GRCh38 data, please download Homo_sapiens_assembly38.dbsnp.vcf under the hg38/v0 folder.

5. Known Indels VCF File

You can get them from the same directory as step4. Download Homo_sapiens_assembly38.known_indels.vcf and Mills_and_1000G_gold_standard.indels.hg38.vcf under the hg38/v0 folder. Please note that you have to go to the second page to see them.

Troubleshooting of the files of Step 4 to 5.

Though the names of those files end with ".vcf.gz" on Goole Cloud, downloaded files might look like ".vcf" or something else. Both raw vcf and gz compressed files are accepted, but you may need to correct extensions.

A workaround for the error "env: python: No such file or directory" on recent macOS.

On executing hisat2-inspect, it halts because the error says it can't find Python. If you see such an error, please close the Subio Platform software. And then open it with the following command from Terminal.

open /Applications/Subio/Subio.app

If you want to analyze on hg19

If you analyze on hg19, please download the original HISAT2 Index (grch37_genome.tar.gz) and GRCh37 GTF file. And use the following dbSNP and Indels files.