October 14, 2005
SDSC Taps Plants for Fuel
Imagine mowing your lawn and then dumping the grass clippings into the gas tank of your car. Inside your tank, the grasses are digested and converted into ethanol -- a high-performance, clean-burning, renewable fuel. You avoid the astronomical cost of filling up with old-fashioned petroleum, and the U.S. avoids the costly environmental, climate, and security issues of depending on nonrenewable fossil fuel. While tapping yard clippings as a source of gas might still be something found only in movies, the use of plant material as a major energy source has attracted nationwide attention, with ethanol blends already being offered at the pump.
But the process of producing ethanol remains slow and expensive, and researchers are trying to formulate more efficient, economical methods -- a challenge that hinges on speeding up a key molecular reaction being investigated in a Strategic Applications Collaboration between researchers at the San Diego Supercomputer Center at UC San Diego, the Department of Energy's National Renewable Energy Laboratory (NREL), Cornell University, The Scripps Research Institute and the Colorado School of Mines.
The interest in ethanol, commonly known as grain alcohol, is being driven by the combination of rising petroleum prices and government subsidies for so-called biofuel, a mixture of 15 percent (by volume) gasoline and 85 percent ethanol, known as E85, which sells for an average of 45 cents less per gallon than gasoline. Efforts to mitigate climate change are also spurring the growth of such renewable fuels, which add far less net greenhouse gas to the atmosphere than burning fossil fuels because the step of growing plant material removes carbon dioxide. In August 2005, President George W. Bush signed a comprehensive energy bill that included a requirement to increase the production of biofuels including ethanol and biodiesel from 4 billion to 7.5 billion gallons within the next 10 years.
While most people are familiar with the process used to turn plant material -- such as hops -- into ethanol, for beverages like beer, that process is slow, expensive, and the end product too impure for energy use. To produce ethanol for energy use on a massive scale, researchers are trying to perfect the conversion of biomass plant matter such as trees, grasses, byproducts from agricultural crops, and other biological material -- via industrial conversion in "biorefineries."
"Cellulose is the most abundant plant material on earth and a largely-untapped source of renewable energy," said project manager Mike Cleary, who is coordinating SDSC's role in the project. "So this collaboration is addressing not just a significant problem in enzymology but a problem of huge potential benefit to society."
The central bottleneck in making the biomass to fuel conversion process more efficient is the current slow rate of breakdown of the woody parts of plants -- cellulose -- by the enzyme cellulase, which also happens to be expensive to produce. The enzyme complex of cellulases, made up of proteins, acts as a catalyst to mediate and speed this chemical reaction, turning cellulose into sugars.
Scientists want to understand this process at the molecular level so that they can learn how to enhance the reaction. Using molecular dynamics simulations, which model the movement of the enzyme at the atomic scale, the researchers want to determine if the kinetics of the enzyme agree with models based on biochemical and genetic studies. By probing in minute detail how the enzyme makes contact with cellulose at the molecular level, the researchers hope to speed up the process and make it more cost effective by discovering ways the enzyme can be altered through genetic engineering.
The cellulase enzyme complex is actually a collection of enzymes, each of which plays a specific role in breaking down cellulose into smaller molecules of sugar called beta-glucose. The smaller sugar molecules are then fermented with microbes, typically yeast, to make the fuel, ethanol. One of the parts of the enzyme complex, cellobiohydrolase (CBH I), acts as a "molecular machine" that attaches to bundles of cellulose, pulls up a single strand of the sugar, and puts it onto a molecular conveyor belt where it is chopped into the smaller pieces. In order to make this process more efficient through bioengineering, researchers will need a detailed molecular-level understanding of how the cellulase enzyme functions. But the system has been difficult to study because it is too small to be directly observed under a microscope while too large for traditional molecular mechanics modeling.
To explore the intricate molecular dynamics of cellulase, researchers at NREL have turned to CHARMM (Chemistry at HARvard Molecular Mechanics), a suite of modeling software for macromolecular simulations, including energy minimization, molecular dynamics, and Monte Carlo simulations. The widely-used community code, originally developed in 1983 in the laboratory of Martin Karplus at Harvard University, models how atoms interact.
In the cellulase modeling, CHARMM is used to explore the ensemble configurations and protein structure, the interactions of the protein with the cellulose substrate, and the interactions of water with both. Not only are the NREL simulations the first to simultaneously model the cellulase enzyme, cellulose substrate, and surrounding water, they are among the largest molecular systems ever modeled. In particular, the researchers are interested in how cellulase aligns and attaches itself to cellulose, how the separate parts of cellulase -- called protein domains -- work with one another, and the effect of water on the overall system. And they are also investigating which of the over 500 amino acids that make up the cellulase protein are central to the overall workings of the "machine" as it chews up cellulose.
To the biochemists in the collaboration, the simulation is like a stop-motion film of a baseball pitcher throwing a curveball. In real-life the process occurs too quickly to evaluate visually, but by breaking down the throw into a step-by-step process, observers can find out the precise role of velocity, trajectory, movement, and arm angle. In simulations on SDSC's DataStar supercomputer, the researchers have modeled a portion of the enzyme, the type 1 cellulose binding domain, on a surface of crystalline cellulose in a box of water. The modeling revealed how the amino acids of the domain orient themselves when they interact with crystalline cellulose as well as how the interaction disrupts the layer of water molecules that lie on top of the cellulose, providing a detailed glimpse of this intricate molecular dance.
The NREL cellulose model includes more than 800,000 atoms, including the surrounding water, the cellulose, and the enzyme -- an enormous structure to model computationally. According to the researchers, an accurate understanding of what is happening will require the capability to scale up their simulation to run for 50 nanoseconds in the reaction -- an extremely long amount of time in molecular terms and highly demanding in computational terms (there are one billion nanoseconds in one second). To reach 50 nanoseconds, the researchers must calculate 25 million time-steps at two femtoseconds per time step (one femtosecond is one quadrillionth of a second).
However, the sheer size of the model is beyond the limit of the current capabilities of the CHARMM simulation code, which has been difficult to scale as the number of computer processors grows larger, since the code was originally written to model thousands, not hundreds of thousands, of atoms. The SAC partners have worked to enhance CHARMM to scale to larger numbers of atoms and to run on some of the largest resources available to academic scientists in the U.S., including DataStar (recently expanded to 15.6 teraflops), TeraGrid (4.4 teraflops), and BlueGene (5.7 teraflops).
To determine how much time the large-scale CHARMM simulations require, a calculation on DataStar found that a series of 500-step simulations on a 711,887 atom system for one picosecond (one thousandth of a nanosecond) required 12 minutes on 64 processors and 9 minutes on 128 processors. Because of scaling issues, a full nanosecond run will require 1,000 times more time than these benchmarking runs, so that full-scale simulations are expected to require nearly one million CPU hours.
To extend the capabilities of the CHARMM simulation code to this unprecedented scale, SDSC's Giri Chukkapalli, a computational scientist, along with Scripps' Michael Crowley, a software developer in Charles Brooks' lab, have reengineered parts of CHARMM to be more efficient running as a parallel, rather than serial, application. In particular, the researchers in the SAC collaboration have targeted a number of subroutines in the code, which are being altered to speed up its performance on 256 and 512 processors.
Outreach on the part of SDSC resulted in this large cross-agency collaboration based on a team approach, with interdisciplinary participation by biochemists from NREL, enzymologists and carbohydrate chemists from Cornell, software developers from TSRI, and computational scientists at SDSC. To validate and gauge the accuracy of the CHARMM simulations, the models are studied by James Matthews and John Brady of Cornell, and Linghao Zhong at Penn State, who compare the simulated version of the overall action of the cellulase complex with experimental results. Similarly, the chemists at NREL, including Mark Nimlos, Mike Himmel, and Xianghong Qian at the Colorado School of Mines, interpret the biochemical findings. In addition to assisting with the software development and scaling to be able to run larger simulations, SDSC is also the key site for computation since the center houses compute resources such as DataStar, with capabilities far beyond those available at the other collaborators.
"We were looking for opportunities for collaboration with other agencies," said Cleary. "SDSC has unique expertise to offer in improving community codes like CHARMM and other molecular dynamics tools like AMBER." It turned out that Cleary, along with other SDSC staff, knew some of the researchers who had been working on the cellulase problem at NREL and the other sites. Their work was an ideal fit for a SDSC SAC collaboration, with each group lending its expertise to the project.
The collaboration, funded by U.S. Department of Energy's Office of Energy Efficiency and Renewable Energy, and the Office of the Biomass Program, fit the mission of SDSC's SAC program -- to enhance the effectiveness of computational science and engineering research conducted by nationwide academic users. The goal of these collaborations is to develop a synergy between the academic researchers and SDSC staff that accelerates the researchers' efforts by using SDSC resources most effectively and enabling new science on relatively short timescales of three to 12 months. And beyond the project results, the hope is to discover and develop general solutions that will benefit not only the selected researchers but also their entire academic communities and other high-performance computing users. In this case, beyond being able to model cellulase digesting cellulose to improve the production of ethanol, the improvements to CHARMM are opening the door so that the software, running on cutting-edge hardware systems, can simulate many other large-scale biological systems. In turn, that will allow scientists to pose entirely new questions, opening novel avenues for research, said Cleary.
According to Chukkapalli, "We're excited about the advent of new architectures that provide massive amounts of computing power. The questions from biophysics, structural biology, and biochemistry that have been only dreams in the minds of computational chemists are now on the verge of being studied in realistic simulations."