A Tutorial for The Variant Data Analysis from RNA-Seq FASTQ Files

This is one of the easiest ways to execute GATK on a set of FASTQ files of RNA-Seq, especially for Windows users. And if you want to extract variants that were found more in disease samples than control samples, or extract genes which have more variants in diseased samples than control, this is the tool.

Detecting Genomic Variations (SNPs and indels) from the RNA-Seq FASTQ files.

The pipeline is created as referring to GATK's RNAseq short variant discovery (SNPs + Indels). Please be noticed that we deploy HISAT2 for alignment instead of STAR to save memory consumption. Please be noticed that it won't work with any other organisms but human.

Please setup the pipeline to run GATK on the RNA-Seq data.

If you already have VCF files, you can skip the previous part. Please start from here.

Predicting the effect of detected SNPs or indels.

SnpEff is a variant annotation and effect prediction tool. The Annotate VCF tool utilizes this program from Subio Platform's GUI. You have to use this tool before using aggregation tools. They say it accepts both .vcf and .vcf.gz files, but it may work fine only with .vcf files. So I recommend you uncompress the .gz files.

if you don't find a database which is suitable for your data, please build it by yourself.

Comparing mutations between the control and case groups for filtering.

This tool accepts annotated VCF files. So run the Annotate VCF tool first.

The input VCF files might be of two groups, which are case and control. This tool aggregates variations so that you can easily extract candidates of target by enabling filtering mutation type or the frequency in the control or case groups.

There are two modes of execution. The “Count by Variant” mode aggregates each variation. On the other hand, the “Count by Location” mode ignores what kind of mutation occurs, but minds only location.

Comparing mutations per gene between the control and case groups for filtering.

This tool accepts annotated VCF files. So run the Annotate VCF tool first.

The input VCF files might be of two groups, which are case and control. This tool aggregates variations per gene so that you can easily extract candidates of the target gene by enabling filtering mutation type or the frequency in the control or case groups.

Summarizing mutations of a target gene per exon. Extracting variants on specified exons.

This tool is useful after the analysis with the Aggregate Variants or the Aggregate Variants per Gene tool.
After you got a list of candidates of the target gene, this tool allows you to summarize per exon for each candidate.

And the Filter by Exon tool extracts genomic elements on specified exons of a transcript.