How to get a proper GTF file from the Ensembl FTP site.

Setting up the RNA-Seq FASTQ file processing pipeline, you must get a suitable set of HISAT2 indexes and a GTF file. Even if you use them of a different organism or mismatched HISAT2 indexes and GTF, the execution false-normally completes without caution. So you must check you rightly select these files before you run the pipeline.

The provided HISAT2 indexes are much stabler than GTFs on the Ensembl FTP site. In other words, the latest (or current) GTF may be proceeding and not match the genome version of the HISAT indexes. In such cases, you must search the GTF file of the adequate genome version.

You can easily find the “current GTF” directory. However, if the genome version doesn’t match HISAT2 indexes’, you must search folders of former releases, e.g., release-99 or release-100.

There is a tip for traversing former releases.
1. Open the latest, not current, release folder and find the GTF of the target organism.
2. For example, if its address is 
https://ftp.ensembl.org/pub/release- 104 /gtf/mus_musculus/,
you can edit the release number in the URL field like
https://ftp.ensembl.org/pub/release- 103 /gtf/mus_musculus/ https://ftp.ensembl.org/pub/release- 102 /gtf/mus_musculus/

Thus, you can quickly find the latest GTF of the former genome version.

Related Topics