We've previously written about the OpenVax neoantigen prediction pipeline, which is the computational basis of 3 clinical trials at Mount Sinai. The purpose of this post is to provide a quick-start tutorial for using our pipeline on your own data to predict cancer neoantigens and select peptide vaccine contents, aiming to elicit a T cell response against those neoantigens.
In a nutshell, the OpenVax pipeline is a Dockerized end-to-end workflow using Snakemake that starts with raw tumor/normal sequencing data and does all the necessary processing to generate neoantigen predictions. It is easy to set up and run, contains all needed dependencies, and does not require a cluster. All you need is a Docker installation and a relatively beefy machine - we run it on a 24-core server, but 16 cores should be enough as well.
These are the steps performed by the OpenVax pipeline:
The OpenVax pipeline assumes you have the following datasets:
You will need to provide tumor and normal whole exome sequencing data, as well as tumor RNA-seq data. These files need to be in gzip-compressed FASTQ format. The pipeline also expects as input a list of MHC class I alleles for the individual.
You will need to provide a reference genome and associated files:
Optionally, you can also include:
To install the latest version of the OpenVax pipeline from Dockerhub, run this one-liner:
docker pull openvax/neoantigen-vaccine-pipeline:latest
Verify everything installed correctly and see all available pipeline options (e.g. ability to execute a dry run listing all commands, specify memory/CPU resources for the pipeline, and others):
docker run openvax/neoantigen-vaccine-pipeline:latest -h
You run the pipeline by invoking a Docker entrypoint in the image, giving it three directories as mounted Docker volumes:
Make sure that all 3 directories are world-writable - the Docker pipeline runs as an unprivileged user, and the pipeline will need to write data to one or more of these directories.
Let’s say you want to run the pipeline using the GRCh38 reference genome. If you’re using our provided processed files, download and uncompress them (and make the reference genome directory world-writable):
gsutil -m cp gs://reference-genomes/grch38.tar.gz /your/path/to/reference/genome/
cd /your/path/to/reference/genome/ && tar -zxvf grch38.tar.gz
chmod -R a+w grch38
An OpenVax pipeline YAML config file contains sample-specific settings and tool configurations that may be common across samples and shared in multiple pipeline runs. This config file needs to live in the directory /your/path/to/fastq/inputs, the same directory as your input FASTQ files. Some notes about this:
For an example run, try starting with test data from our GitHub repo, consisting of a YAML config file and two small FASTQ files of reads overlapping a single somatic mutation, set up to run using the GRCh38 reference genome. First, download this reference genome as described in the Setup section above. For this simple test, we will re-use the tumor DNA sequencing as our RNA reads. Download the test data from these files to the directory you'll have mounted as your /inputs volume:
After you create your /outputs directory, we may now run the pipeline:
docker run -it \
-v /your/path/to/fastq/inputs:/inputs \
-v /your/path/to/pipeline/outputs:/outputs \
-v /your/path/to/reference/genome:/reference-genome \
The first time you run this, it may take several minutes as necessary processing files are being downloaded and cached. The output will be a set of ranked variants and proposed vaccine peptide results in several file formats, including basic text (ASCII) and PDF. If everything works correctly, you should see a single IDH1 R132H variant in the final output.
What if I want to use OpenVax to just call somatic variants?
You can also run the OpenVax pipeline just to call somatic variants, if you don't have tumor expression data or MHC alleles for your sample. Simply use the same pipeline configuration, but omit the tumor RNA part of the sample as well as HLA alleles - this will call and write MuTect and Strelka variants into their own respective VCFs! See an example of a variant-calling-only pipeline config here.
Post by Julia Kodysh
We just sent you an email. Please click the link in the email to confirm your subscription!