If you can't get HISAT2 indexes from the website, you have to create them yourself. This instruction is for those unfamiliar with the operation using Linux or Unix commands.
The Preparation only for Windows Users.
We introduce WSL to use bioinformatics tools for Linux systems on Windows. And they need to do something before building indexes. So Mac users can skip this section.
Firstly, please start WSL by hitting "wsl" on Command Prompt and update packages with the following command.
apt update
If you get a "permission denied" error by this command, you put "sudo" before the command like
sudo apt update
Please input the password you set while setting up Ubuntu to execute the command. And this workaround works when you get the error at the following steps.
Install python2 with a command like
apt -y install python2
Check the location of python2 and python3 with commands like
which python2 which python3
If the paths are "/usr/bin/python2" and "/usr/bin/python3," type the following command to check their sub versions.
ll /usr/bin/python2* ll /usr/bin/python3*
If they are "/usr/bin/python2.7" and "/usr/bin/python3.8," type the following command.
update-alternatives --install /usr/bin/python python /usr/bin/python3.8 1 update-alternatives --install /usr/bin/python python /usr/bin/python2.7 2
Type the following command to select which version of python to use if called by simply "python."
update-alternatives --config python
And then input the number indicating python3.8.
Lastly, please check if you have done rightly with a command.
python -V
If it responds "Python 3.8.x," where x can be any number, it means you could complete this section.
Get The Genome Sequence Files
You need genome sequence data in FASTA format files, separated by chromosome. For example, if you see a website of Ensembl's chicken data, click "Download DNA sequence (FASTA)" to download fa.gz files of all chromosomes.
I recommend you change the file names as short as possible, like "chr1.fa.gz" or "chrZ.fa.gz." This is not necessary, but it makes the command you will type more efficiently. And then, unbundle all the gz.
Building Indexes
Windows users need to start WSL by hitting "wsl" on Command Prompt. Mac users need to open Terminal.
Let's say you have the .fa files in a directory named "genomeseq" in the Documents directory of your account. Move the current directory to "genomeseq" by the "cd" command.
For Windows Users:
cd /mnt/c/Users/XXXXX/Documents/genomeseq
For Mac Users:
cd /Users/XXXXX/Documents/genomeseq
You have to replace XXXXX with your account name. And let's say you have hisat2-2.1.0 folder under your account's Documents directory. Then, you can build the indexes for the organism with the following command.
For hisat2-2.2.0 or later, if the genome is longer than 4 billion nucleotides, please use the hisat2-build-l command instead. Otherwise, hisat2-build or hisat2-build-s commands are OK.
For Windows Users:
/mnt/c/Users/XXXXX/Documents/hisat2-2.1.0/hisat2-build -f chr1.fa,chr2.fa,chr3.fa,,,chrZ.fa Gallus_gallus_GRCg6a
For Mac Users:
[PATH]/hisat2-2.1.0/hisat2-build -f chr1.fa,chr2.fa,chr3.fa,,,chrZ.fa Gallus_gallus_GRCg6a
Mac users have to determine [PATH] for your system. Please fill with the names of all .fa files at ",,," part. And please change "Gallus_gallus_GRCg6a" at last with any text without space, indicating the organism and genome version.
Wait until the execution completes. You will find .ht2 files in "genomeseq" folder.