The COE possesses specialized hardware and software solutions for
common database searches used in bioinformatics. This document provides
a quick guide to structure, functionality and basic use of these
resources for end users.
If you have any questions about the use of these resources,
please don't hesitate to contact
us.
The specialized resources at the COE consist of three systems:
Paracel BlastMachine: This machine consists of a 24 CPU cluster
of 2GHz Pentium Xeon processors running Linux. Paracel has optimized the
hardware configuration, as well as tweaking the BLAST source code itself,
to maximize throughput of NCBI BLAST. This is the accelerator used to
run methods not ported to the hardware systems described below, such as
PSI-BLAST (iterative alignment) and MEGABLAST (genome-to-genome style
whole DNA comparisons).
TimeLogic Decypher: The system at the U of C consists of 4
on-the-fly reprogrammable hardware systems. The makes it the most versatile
machine, running both heuristic (shortcut) and canonical methods by simply
loading a different hardware image. TimeLogic has created a version of
BLAST that runs approximately 30-times faster than the original for large
data sets. TBLASTX runs 1.5 times faster than on the 24 CPU BlastMachine.
Gene-BLAST has also been developed to allow rapid alignment of ESTs to genomic
sequence (intron spanning), something BLAST cannot do. Hidden Markov Models of
protein families run more than 100 times faster than software methods running
on a cluster at Washington University at St. Louis.
GeneMatcher2: This machine consists of 26748 ASIC (specially printed
hardware circuit) processors that work in parallel to perform database searches.
The GeneMatcher2 is especially fast in performing the very sensitive
Smith-Waterman pairwise alignment method, including compensation for frameshifts
(whether programmatic or sequencing errors). It also is the only hardware
accelerator to implement the GeneWise method from the EBI for comparing an
HMM to genomic sequence (i.e. intron spanning). It is approximately 250 times
faster than the original software genewisedb.
The following table summarizes the systems available, and the
strengths and weaknesses of the various algorithms they implement
based on our experience. For more information on the nature of the
algorithms themselves, please read the CBR sequence alignment primer.
All of the commands included below can be
launched from the command line on coe01. Locally installed clients
are also available from coe01.
For database
listings and upload procedures please see the documentation ("Docs")
for the program of interest. A list of useful common Decypher Keywords for controlling the behaviour of those jobs is available.
Please note that the hardware accelerated systems perform most efficiently when queries are comprised of many sequences.
| Algorithm | Name | Hardware accelerated? | Recommended? | Basic usage | Output format | Docs |
| BLAST | NCBI BLAST | No, but optimally recompiled for coe01 | No, unless Paracel BLAST is unacceptable | blastall -p blastp -d nr -i query.fasta -a 4 -o output.txt |
All standard Blast output formats (text, XML...) | Docs |
| Paracel BLAST | No, but rewritten to run on a Linux cluster | Yes, if Tera-BLAST is unacceptable, or the query is under 10000 symbols | pb blastall -p blastp -d nr -i query.fasta -o output.txt |
Blast output formats (text, XML...) with slight modifications | Docs | |
| Tera-BLAST | Yes, using TimeLogic Decypher | Yes for queries up to 10000 symbols, may yield slightly more results too. Gene-BLAST spans introns! | dc_template_rt -template tera-blastp -targ nr -query query.fasta > output.txt
or intron spanning: dc_template_rt -template Gene-BLASTP -targ nr -query query.fasta > output.txt
Do not use software dc_blast and dc_blastall |
Configurable Decypher specific format (including tab delimited) | Docs | |
| HMM | Decypher HMM | Yes, also frameshift compensating | Yes, up to 100 times faster than software | dc_template_rt -template hmm_aa_vs_hmm -targ pfam -query query.fasta > output.txt |
Configurable Decypher specific format (including tab delimited) | Docs |
| HMMer | No | No | hmmpfam pfam query.fasta > output.txt |
HMMer format | Docs | |
| Paracel HMM | Yes | No | btk search hmm query=query.fasta database=pfam invert=1 |
HMMer, BLAST or various BTK formats | Docs | |
| PSI-BLAST | Paracel PSI-BLAST | No, but rewritten to run on a Linux cluster | Yes | pb blastpgp -i MAGPIE-infile -d nr -o outfile.txt -j 10
--quiet |
Standard BLAST formats, with slight modifications | Docs |
| NCBI PSI-BLAST | No, but optimally recompiled for coe01 | No, unless Paracel PSI-BLAST is unacceptable | blastpgp -i MAGPIE-infile -d nr -j 10 -o outfile.txt |
Standard BLAST formats | Docs | |
| Smith-Waterman | Decypher SW | Yes | No, unless Paracel SW is unacceptable | dc_template_rt -template sw_aa_vs_aa -targ nr -query query.fasta > outfile.txt |
Configurable Decypher specific format (including tab delimited) | Docs |
| FastA SW | No | No | fasta3_t -Q -H -L -T 4 query.fasta %nr > outfile.txt |
Various FastA formats | Docs | |
| Paracel SW | Yes, frameshift compensating, intron-spanning | Yes, global and local alignments supported | btk search swp format=blast2 verbose=q dbset=NR output=outfile.txt query=infile.fasta |
BLAST or various BTK formats | Docs |