blast
Table of Content
Versions and Availability
About the Software
Basic Local Alignment Search Tool, or BLAST, is an algorithm
for comparing primary biological sequence information, such as the amino-acid
sequences of different proteins or the nucleotides of DNA sequences. - Homepage: http://blast.ncbi.nlm.nih.gov/
Usage
A suite of tools are provided in the BLAST+, such as blastx, blastn, blastp. The command line below is provided by NCBI, which is a blastn search against a database. This command:
blastn -db nt -qurey nt.fsa -out results.out
will run a blastn search of nt.fsa (a nucleotide sequence in FASTA format) against the nt database, printing results to the file results.out.
BLAST+ uses an environment variable $BLASTDB to point to the directory where the database is located in. So this environment variable should be specified before running blast commands, see example below:
Example: run blastx search in PBS batch job
This example shows a blastx search of trinity.fasta against the nr database, printing results to the file blastx.out. The nr database files are in the directory /work/ychen64/nr.
#!/bin/bash #PBS -q workq #PBS -l nodes=1:ppn=20 #PBS -l walltime=72:00:00 #PBS -A your_allocation_name export BLASTDB=/work/ychen64/nr blastx -query trinity.fasta -db nr -out blastx.out -num_threads 20
The option -num_threads is used to specify the number of threads (CPUs) in the BLAST search. This number should match the number in "ppn=" in the "#PBS -l nodes=1:ppn=" to fully utilize the CPU power.
Example: create local BLAST database in PBS batch job
The local BLAST database can be created by a perl script called "update_blastdb.pl" included as part of BLAST+. perl is required to run this script. In this example, a nr database is created at /work/ychen64
#!/bin/bash #PBS -q workq #PBS -l nodes=1:ppn=20 #PBS -l walltime=72:00:00 #PBS -A your_allocation_name cd /work/ychen64 # Specify the database name here: export DATABASE=nr mkdir $DATABASE cd $DATABASE update_blastdb.pl --decompress --verbose $DATABASE
It will take a while to create a large local BLAST database, so update_blastdb.pl should be run with the interactive or batch job.
Once the local BLAST database is created, it needs to be updated in a timely manner (every couple of days/weeks months based on your database). Just run the script above again to update the database. The Documentation for the update_blastdb.pl script is available by running the script without any arguments.
Note:
- Please don't search against the database on the NCBI BLAST server (i.e. using "-remote" option), especially for the large case. Search against local BLAST database only.
- Please only use one compute node to run BLAST job. The parallel search technique provided by BLAST+ is based on the thread parallelism , so it cannot be used on the multiple nodes.
- On the cluster with SLURM job scheduler (rather than PBS), use #SBATCH -c to specify the number of threads per process in the SLURM directives. Skip #SBATCH -n in the SLURM directives.
Resources
- The BLAST Home Page provides links to protein and genomic data sets, as well as information on specific tools.
Last modified: September 10 2020 17:18:38.