Using sequence search resources at the U of C Sun COE

The COE possesses specialized hardware and software solutions for common database searches used in bioinformatics. This document provides a quick guide to structure, functionality and basic use of these resources for end users.
If you have any questions about the use of these resources, please don't hesitate to contact us.

Web forms
About the Search Systems

The specialized resources at the COE consist of three systems:
Paracel BlastMachine: This machine consists of a 24 CPU cluster of 2GHz Pentium Xeon processors running Linux. Paracel has optimized the hardware configuration, as well as tweaking the BLAST source code itself, to maximize throughput of NCBI BLAST. This is the accelerator used to run methods not ported to the hardware systems described below, such as PSI-BLAST (iterative alignment) and MEGABLAST (genome-to-genome style whole DNA comparisons).
TimeLogic Decypher: The system at the U of C consists of 4 on-the-fly reprogrammable hardware systems. The makes it the most versatile machine, running both heuristic (shortcut) and canonical methods by simply loading a different hardware image. TimeLogic has created a version of BLAST that runs approximately 30-times faster than the original for large data sets. TBLASTX runs 1.5 times faster than on the 24 CPU BlastMachine. Gene-BLAST has also been developed to allow rapid alignment of ESTs to genomic sequence (intron spanning), something BLAST cannot do. Hidden Markov Models of protein families run more than 100 times faster than software methods running on a cluster at Washington University at St. Louis.
GeneMatcher2: This machine consists of 26748 ASIC (specially printed hardware circuit) processors that work in parallel to perform database searches. The GeneMatcher2 is especially fast in performing the very sensitive Smith-Waterman pairwise alignment method, including compensation for frameshifts (whether programmatic or sequencing errors). It also is the only hardware accelerator to implement the GeneWise method from the EBI for comparing an HMM to genomic sequence (i.e. intron spanning). It is approximately 250 times faster than the original software genewisedb.

Strengths, Weaknesses & the CLI

The following table summarizes the systems available, and the strengths and weaknesses of the various algorithms they implement based on our experience. For more information on the nature of the algorithms themselves, please read the CBR sequence alignment primer.
All of the commands included below can be launched from the command line on coe01. Locally installed clients are also available from coe01. For database listings and upload procedures please see the documentation ("Docs") for the program of interest. A list of useful common Decypher Keywords for controlling the behaviour of those jobs is available.

Please note that the hardware accelerated systems perform most efficiently when queries are comprised of many sequences.

Algorithm Name Hardware accelerated? Recommended? Basic usage Output format Docs
BLAST NCBI BLAST No, but optimally recompiled for coe01 No, unless Paracel BLAST is unacceptable blastall -p blastp -d nr -i query.fasta -a 4 -o output.txt All standard Blast output formats (text, XML...) Docs
Paracel BLAST No, but rewritten to run on a Linux cluster Yes, if Tera-BLAST is unacceptable, or the query is under 10000 symbols pb blastall -p blastp -d nr -i query.fasta -o output.txt Blast output formats (text, XML...) with slight modifications Docs
Tera-BLAST Yes, using TimeLogic Decypher Yes for queries up to 10000 symbols, may yield slightly more results too. Gene-BLAST spans introns! dc_template_rt -template tera-blastp -targ nr -query query.fasta > output.txt
or intron spanning: dc_template_rt -template Gene-BLASTP -targ nr -query query.fasta > output.txt Do not use software dc_blast and dc_blastall
Configurable Decypher specific format (including tab delimited) Docs
HMM Decypher HMM Yes, also frameshift compensating Yes, up to 100 times faster than software dc_template_rt -template hmm_aa_vs_hmm -targ pfam -query query.fasta > output.txt Configurable Decypher specific format (including tab delimited) Docs
HMMer No No hmmpfam pfam query.fasta > output.txt HMMer format Docs
Paracel HMM Yes No btk search hmm query=query.fasta database=pfam invert=1 HMMer, BLAST or various BTK formats Docs
PSI-BLAST Paracel PSI-BLAST No, but rewritten to run on a Linux cluster Yes pb blastpgp -i MAGPIE-infile -d nr -o outfile.txt -j 10 --quiet Standard BLAST formats, with slight modifications Docs
NCBI PSI-BLAST No, but optimally recompiled for coe01 No, unless Paracel PSI-BLAST is unacceptable blastpgp -i MAGPIE-infile -d nr -j 10 -o outfile.txt Standard BLAST formats Docs
Smith-Waterman Decypher SW Yes No, unless Paracel SW is unacceptable dc_template_rt -template sw_aa_vs_aa -targ nr -query query.fasta > outfile.txt Configurable Decypher specific format (including tab delimited) Docs
FastA SW No No fasta3_t -Q -H -L -T 4 query.fasta %nr > outfile.txt Various FastA formats Docs
Paracel SW Yes, frameshift compensating, intron-spanning Yes, global and local alignments supported btk search swp format=blast2 verbose=q dbset=NR output=outfile.txt query=infile.fasta BLAST or various BTK formats Docs

Valid Strict XHTML 1.0! Valid CSS 2.0!

Paul Gordon