|The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing / May 18, 2007|
The advancement of genomics and proteomics via high performance computing is drawing new companies into the drug discovery business. One such company, Gencom Inc., claims its goal is to revolutionize in-silico drug discovery by dramatically speeding up the identification of novel therapeutics. Without external funding, the company has developed a drug discovery platform based on the Intel Itanium 2 hardware. At April's Intel Developer's Forum, Gencom received an Honorable Mention in the Itanium Solutions Alliance's Humanitarian Impact awards for its use of Itanium to advance functional genomics.
Recently, we got the opportunity to ask Gencom's founder and chief technology architect, Michael J. Colonna, about the company and the nature of the technology it has developed.
HPCwire: Could you give us a brief overview of Gencom's history, your vision as the founder, and where the company stands today as far as products and services offered?
Colonna: Gencom was founded in 2002 with an initial vision that was focused on improving computational performance in the area of molecular modeling with the ultimate goal of improving drug-to-market times. At the time, many of the most simplistic models were taking up to 30 days to process on some of the most powerful computers available. It was obvious that bigger and faster servers with increasingly more processors were not getting the job done.
We decided to take an entirely different approach towards the analysis of the problem which started with a decomposition of the entire process, beginning with the models themselves. The decomposition of the models revealed significant deficiencies in terms of how the models were constructed. This served to establish our first goal which was to develop platform independent "pre-processing" optimization software that would restructure the models for optimal computational performance. We put together some code and tested a simple model consisting of a single gene-protein pair and the results were very promising.
During the course of our analysis it became evident that even if we improved computational performance of the models, there were other deficiencies, both process and technical, that threatened to trivialize the results of our efforts. This led to a significant expansion of our scope which included a rather lofty goal to optimize the entire drug discovery process. We started the entire analysis over, beginning with a business modeling exercise that revealed numerous process and technical candidates for optimization and established the high-level requirements for our technical architecture.
Today, Gencom offers its clientele a full suite of complimentary services that are designed exclusively to improve their ability to more rapidly identify new drug candidates, provide early identification of high-risk candidates and expedite the discovery-to-market process in order to bring more drugs to market, faster, more efficiently and more safely. The services include fully validated on-demand high performance computing, in-silico forced degradation modeling and discovery-focused electronic data capture.
HPCwire: Can you discuss the GENeSYS technology: its unique attributes, the nature of the hardware and software, its performance attributes, and what kinds of users and applications it's targeted for?
Colonna: GENeSYS [pronounced Genesis] is our second generation computing architecture that began its life as Black Widow. Like Black Widow, GENeSYS consists of seven software components that work together to address each of the optimization requirements and the Unifying Platform which, among other things, brings together and normalizes various genomic and proteomic data from the public domain. GENeSYS includes all of the optimization components of Black Widow but is significantly improved in terms of its ability to narrow the focus of drug candidates through constraint-based simulations of metabolic networks and contains improvements in the ability to predict chemical and biological degradation.
The first challenge required us to rethink the computational requirements that would be necessary to achieve our goals. This led to the development of a technical architecture that would run the optimization software components which included the need to dynamically allocate system resources as required based on the complexity of the models. We called the approach "hyper-mesh" or G2 -- grid computing on steroids -- due to its ability to scale dynamically, create multiple virtual processors and encapsulate and isolate multiple processes to eliminate the risk of process "bleed-over."
The architecture at this point was technology independent in that we had only defined the requirements without consideration for deployment. It was essentially an exercise that permitted us to dream big and scale back if necessary, based upon limitations of available hardware. Given our initial load projections, we knew that it would be a tall order for any processor to fill and we even considered various contingencies that would get us close to where we wanted to be. One of the many tracks of the analysis and design phase included an assessment of available hardware architectures that could conceivably deliver the type of performance we were hoping for.
Enter EPIC, Itanium's Explicitly Parallel Instruction Computing architecture. As a purely intellectual exercise, we overlaid EPIC and competing architectures with models of biological systems in an attempt to identify analogs that would ultimately support the software architecture which was modeled on functional biology. Much to our surprise, this turned out to be a fairly good indicator as to the likelihood that the processor architecture would be suitable for our purposes.
The second iteration of analysis and design was focused on a proof of concept for the Itanium and EPIC. Based upon our rough calculations, we could conceivably reduce turnaround time to somewhere in the millisecond range as opposed to weeks. Being absolutely cynical and convinced that our calculations were flawed, we solicited objective validation from various independent subject matter experts and they all confirmed that "on paper", our assumptions and calculations appeared to be accurate.
This was sufficient for us to commit to the development of an initial prototype. Black Widow Lite was cobbled together using bailing wire and duct tape, figuratively speaking, and contained limited-utility versions of the seven software components written in C++, each strung loosely together with PERL. It was so fragile that we often joked that everyone had to hold their breath during testing for fear that it would completely unravel with the slightest movement. It was not very pretty but it served to validate our initial assumptions and calculations.
The current iteration of GENeSYS is written in C# and PERL has been replaced with F# due to its scripting capabilities which are far more efficient and elegant in execution. Using our own measure of computational speed, we have been able to derive the equivalent of 700 teraflops in terms of overall throughput. While our measure of speed would not stand in terms of the requirements to make the Top 500 list, we are more concerned with measuring overall performance based upon computation of multi-dimensional biological organisms which is a bit different than measuring purely clock speed, I/O, etc. Ultimately, we are not interested in making the list, rather we are interested in identifying drug candidates.
Early in the inception stage, we leaned towards the use of Microsoft-based technologies due to our collective experiences using other technologies and vendors in the past, which were less than positive. As a startup, we were confident that Microsoft would provide us with a level of support, that with limited resources, other vendors simply would not. That added an additional challenge as most of the development on Itanium was on Linux-based machines so we knew that there would be very little history to draw upon in terms of lesson learned. In hindsight, we are confident that we made the right decision for numerous reasons.
In terms of applicability, the utility of GENeSYS is extremely broad and could be applied in many areas where large-scale computational modeling is required. While we had numerous discussions in terms of the breadth of applicability, we made a strategic decision to focus exclusively on drug discovery optimization and the unique requirements associated with FDA compliance to eliminate any further complexity.
HPCwire: What is the rationale behind the use of an Itanium-based platform, as opposed to say an x86-based cluster or a capability type supercomputer, such an IBM Blue Gene or a Cray system?
Colonna: We had always envisioned a multi-threaded approach to software optimization, possibly on HT equipped x86-based servers, but found that EPIC provided far more capability in terms of overall throughput than we could achieve with 10 times the number of x86 machines, and this was prior to the release of the dual-core Montecito. Blue Gene or any of the other supercomputers of that type would most likely have limited our creativity in terms of development and also deployment.
The days of supercomputers, I believe, are long gone and we are now in the age of "superclusters" and the Itanium is the "big gun" on the block in terms of large-scale computational modeling capabilities. Now, with the dual-core Itanium which significantly reduces power consumption while improving overall performance, the field of in-silico drug discovery could be changed forever. Our hope is that Big Pharma and biotech organizations will apply some of their brainpower and brawn towards the development of applications that could significantly improve the quality of life for all of us. That should be the goal of any technology.
HPCwire: Are there users today, beta or otherwise, that you can talk about?
Colonna: We partnered early on with five biotech companies (NDA's prevent disclosure of their names) that would act as pilots during the development process and help with the definition of requirements. That number has now increased to seven; and as part of our initial agreement and in return for their invaluable guidance, they have an exclusive right to use the technology for a prescribed period of time. GENeSYS is currently going through validation. The inclusion of the metabolic signaling pathway functionality has proven to be far more complex than with Black Widow. So far, we are absolutely thrilled with what has been accomplished and are awaiting approval of the first drug discovered on GENeSYS with hopefully many more to come.