HPCwire
The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing / April 21, 2006
Vendor Spotlight:
Cartesian Gridspeed Turbocharges Genomic Searching

New Zealand start-up, Cartesian Gridspeed, has developed a bioinformatics tool that supposedly searches gene databases 10,000 times faster than other existing genomic search engines.  According to the company's CEO, Leonard Bloksberg, this allows industrial strength genomic searching to be performed on a standard PC instead of a multi-million dollar supercomputer cluster. HPCwire recently got an opportunity to ask Mr. Bloksberg about his company and the new technology that he helped develop.

HPCwire: What's the origin of your company and its name -- Cartesian Gridspeed?

Bloksberg: As a Senior Scientist at New Zealand's leading biotech company, Genesis Research, I was conducting research in systems biology when we encountered a need for significantly increased compute capacity. I got together with a group of programmers from Reel Two and we brainstormed a new way to get things done. Later I left Genesis and started Cartesian Gridspeed to work on bioinformatics technologies, and acquired the SLIM technology.

Genesis' core business is immunology. Systems biology and bioinformatics were a stretch for them. They have restructured and focused on their core immunology science.

The SLIM technology is based on very fast novel ways of working with data in arrays or grids. Rene Descartes created the concept of the grid, and the blue in our logo symbolizes the blue of the Fleur DeLis, which was the French flag at the time of Rene Descartes.

Cartesian Gridspeed was founded in April of 2005, and our alpha product was released exclusively in New Zealand in November of 2005. We are beginning a preliminary beta launch in Australia in April 2006, and the official beta launch is scheduled for May 2006 in the USA. The beta will be limited to no more than 24 companies world wide, and we are targeting the formal product launch for August 2006.

HPCwire: What does SLIM Search do and, without revealing any proprietary information, how does it do it?

Bloksberg: SLIM stands for Sequence Location Indexer and Matcher. SLIM Search is a super fast genomics search engine based on the SLIM technology. It is ten thousand times faster than a fully optimized implementation of Blast, the industry standard. SLIM Search offers more speed, more flexibility, and more sensitivity than anything else.

Ten thousand times faster is a ridiculous number, so here are some practical examples of what that means for a real biologist. SLIM Search can save you $100,000 per year in electricity costs if you run a Blast farm of 100 PCs where SLIM Search can do the job on a single PC. SLIM Search can assemble a human genome in half a day on a desktop workstation instead of months on a multi-million dollar cluster. SLIM Search can deliver daily updates of an orthogonal comparison comparative genomics database of every known gene on a single PC.

Modern genomics search tools break a search into 2 steps, a word search followed by an alignment. The word search phase is very simple, but must confront the entire dataset, so the need for speed is extreme. The fundamental concept of the word search is that you look for a short exact match -- the default in Blast is 11nt. If you find an exact match, you pass the hit on to an alignment phase as an HSP. SLIM Search is a significant improvement of how the computer carries out the word search, but the end result is the same. An exact match of 11nt does not vary no matter how the computer found it. SLIM Search then passes the pair on to the alignment program of Blast, so the biologist gets exactly the same results and statistics they are used to.

Every programming language has tools to build and scan an array, but these tend to fall apart with the magnitude of the problems we deal with in genomics. We got together a group of PhD-level biologists who understand programming, and PhD-level programmers and mathematicians who understand biology to develop new ways of building and scanning arrays. The result is the novel, patent pending SLIM technology that drives SLIM Search to deliver results never before possible.

HPCwire: What kind of computer systems can be used to run SLIM Search?

Bloksberg: SLIM Search is supported on Windows and Linux/Unix, and can run on your laptop. We are looking to support Mac and other systems soon as well. SLIM Search is RAM dependent, and requires at least 512 Megs of RAM. We recommend at least 2 Gigs of RAM for larger problems, and more may be required for extremely large datasets. We are currently working on distributed computing models to facilitate even extreme searches.

HPCwire: Have you measured the performance of your product against other sequence searchers -- for example SSearch, BLAST, FASTA, etc.?  If so, can you share the results?

Bloksberg: Yes, we have done detailed, controlled benchmarking of SLIM against the other relevant technologies, and that is where we get the 10,000 times faster number from. However, publishing comparisons against named products is problematic as most licenses specifically prohibit use of the product for benchmarking against other products. We have no such clause in our license, as we welcome comparisons of SLIM Search performance.

The only thing SLIM Search does differently is how we build and scan an array for exact matches of a defined length. There are very few ways to do this. In fact, we can find only two methods in the literature that have been used before SLIM, and these are represented by Blast and SSAHA. Our benchmarking compares SLIM Search against both of these.

The Blast method is used in Blast, MegaBlast, BlastZ, and others. The SSAHA method is used in SSAHA, Blat, Pash, MUMmer, SALSA, Pattern Hunter, and others. The only other technology used for sequence comparison is not a word search, but an alignment method called the Dynamic Programming Matrix (DPM), which is very precise, but very slow. The DPM is used in Smith Waterman, Needle, FASTA, and the alignment module of Blast (Bl2Seq).

HPCwire: Do you know of other desktop-based sequence searching tools?  If so, how is SLIM Search different?

Bloksberg: Most sequence analysis tools can be used on a desktop computer. The difference with SLIM Search is the magnitude of what you can do on your desktop computer. SLIM Search allows a desktop to perform searches that would require hundred million dollar super computers with other methods.

HPCwire: Do you have any early end-user experiences with your product that you can share with us?

Bloksberg: Yes. Most of our feedback falls into two categories:

   (1) "Wow, this is really fast!"
   (2) "Can you please give me this new feature?"

HPCwire: Have you officially launched your product?

Bloksberg: The alpha launch was done in November 05, and we now have 7 customers in New Zealand. The beta launch is scheduled for May 06, with the preliminary launch in Australia only in April 06. The full commercial launch is scheduled for August 06.

HPCwire: How do you intend to market SLIM Search?

Bloksberg: SLIM Search will be offered at $399.95 per copy, but will only be available in bulk licenses of 100 or more for the first year. A 50 percent discount will be offered to beta customers. Interested parties are invited to contact Cartesian Gridspeed at info@gridspeed.co.nz or +649 266 2002. We are currently upgrading our web site so that SLIM Search can be purchased directly from our web site, www.gridspeed.co.nz. We also have regional resellers in various parts of the world.