College of Engineering News • Iowa State University

Advanced Computation

If, as Julie Dickerson asserts, the appreciation of “fuzzy logic”—at first blush, a contradiction in terms—is critical for understanding the metabolic pathways of plants, what role, then, does advanced computation play in the engineering of biosystems?

Computation, after all, is inalterably anchored in the hard, immutable binary logic of 1s and 0s. And while in theory these can be configured infinitely and on as vast a scale as money, materials, and imagination can afford, there is no “fuzzy” scheme in which two plus two can possibly equal anything other than four.

The objective of advanced computation, however, is not so much to model natural systems precisely, if that could even be done, but instead to compile, collate, and analyze our collective experimental knowledge of those systems on a scale not otherwise possible.

Accelerating Speed

To date, the tools available to Iowa State scientists and engineers seeking to unlock nature’s secrets have been formidable, from the BlueGene/L supercomputer acquired in 2006 to last year’s $5 million upgrade of the C6 virtual reality chamber, an improvement in imaging Dickerson calls “amazing.” Yet as impressive as these have been, Iowa State engineers working in the biosciences will soon have at their disposal even greater resources, both on and off campus.

When Srinivas Aluru brought an IBM BlueGene/L to Iowa State in order to complete his work in sequencing the corn genome, “CyBlue,” as it’s called, was among the 100 fastest supercomputers in the world. Yet although hardly ready for the cyberboneyard, today CyBlue no longer ranks even in the top 500, so rapidly does the power and availability of computation increase.

That’s about to change. In 2008 Aluru and his colleagues took possession of a new Sun Microsystems supercomputer. Tentatively named “CyStorm,” the new unit will operate at 28.16 teraflops—that’s 28.16 trillion floating-point operations per second, well over five times the speed of CyBlue. Yet while CyStorm ranked among the top 100 supercomputers upon acquisition, by the time it is fully operational later this year Aluru expects it to rank no higher than 200.

However, even more impressive—and potentially more valuable—will be the Blue Waters supercomputer housed on the campus of the University of Illinois. A project of the Great Lakes Consortium for Petascale Computation, of which Iowa State is a charter member, when fully operational the $200 million Blue Waters supercomputer will operate at a few petaflops—each petaflop being one quadrillion calculations per second, nearly 36 times faster than CyStorm.

While tools such as Blue Waters will be applied to a host of “grand” challenges in science and engineering, including the simulation of complex engineered systems, much of its prodigious computing capacity will be directed toward modeling and predicting natural phenomena, including, in the words of its designers, “the behavior of complex biological systems” and “changes in the earth’s climate and ecosystems.”

Two challenges

In order to take advantage of tools such as Blue Waters or even CyStorm, however, engineers working in bioinformatics and systems biology face two core challenges: first, they must immerse themselves in a particular knowledge base in the natural sciences in order to understand the problem to be addressed; and, second, they must go back to the drawing board to develop applications that will actually work on systems the likes of which they’ve never before encountered.

“Blue Waters represents a radically new architecture for supercomputing,” says Jim Oliver, director of Iowa State’s CyberInnovation Institute and a participant in the Great Lakes Consortium. “You can’t just take the code you’ve been running on today’s hardware and simply move it to Blue Waters, which is orders of magnitude more powerful—it may not even be possible. You have to go back to first principles and look at your algorithms.”

For that reason, a key element of Blue Waters will be for project partners, including those at Iowa State, to work with collaborating scientists on revising their applications in order to optimize performance and scalability so they can take advantage of the project’s capabilities. However, the ultimate value of supercomputing lies not in incremental or evolutionary progress on existing work, but revolutionary leaps in the application of computers to both natural and engineered systems.

“Ultimately, Blue Waters doesn’t want to encourage science that is already being done,” Oliver says. “The project wants to radically open doors that haven’t been opened before, and that goes back to how you rethink the problem and how you map it onto this new architecture.”

The other challenge—understanding the problem in the first place—is, according to Aluru, at least as formidable as revising or scaling up previous work to a new platform.

“We really need to understand what it is being solved and how people would solve it if they were solving it, say, by hand,” says Aluru. “That requires us to go into that domain and learn something about what they are doing. It’s fortunate in that we learn a lot, but unfortunate because for every different problem we are working on we need to develop expertise, and that takes time.

“You can always buy speed,” he adds. “You can always buy capacity. What you cannot buy is the intellectual ability to utilize it.”

Dealing with data in depth

The truth of this observation is reflected in Aluru’s own experience, as he has spent the past dozen or so years immersing himself in the world of plant genomics in addition to his core expertise in computer engineering. Yet beyond the knowledge a computer engineer needs simply to comprehend the scope of key questions in the biosciences in order to be of use to, say, an agronomist or plant geneticist, there is also the additional expertise required to manage the sheer volume of data generated by a myriad of experimental devices.

In both his areas of bioinformatics and systems biology, says Aluru, new experimental devices capable of generating once unimaginable volumes of data are coming on board continually. Cutting-edge instruments such as Iowa State’s atom probe microscope or the Solexa sequencing machine recently acquired by Iowa State’s Plant Sciences Institute, which produce massive data sets over very short time scales, represent a particular challenge for researchers.

“The Solexa can read tens of millions of sequences in the same experiment,” Aluru says, “but they will be very short. So if you look at a genome as a very long string, you’d need multimillions of those. This instrument can generate nearly a billion base pairs a day. You would need vast computation to process such large quantities of data.”

Not only must these huge data sets be analyzed individually, progress in the biosciences will hinge on researchers’ abilities to integrate them both with other data from similar devices and with those produced by other means and even for other purposes altogether.

“This is high throughput data generation through experimental devices,” Aluru notes. “You take a large number of such experiments, and you look at them collectively and ask, ‘what kind of system would produce this kind of response?’ And you try to infer the system from these responses.”

‘Dramatic change’ at hand

From that computer-enabled “inference” (call it the “fuzzy logic” of the digerati) drawn from multiple combinatorial data sets will come the knowledge base that turns information into innovation, whether it’s the next generation of cancer drugs, drought- and disease-resistant crops, or predictive modeling of the global climate that enables policy makers to reliably see the consequences of today’s industrial decisions 10, 20, or even 50 years out.

For that to happen, though, will require not just a revolution in computer hardware, but in the human mind as well, as engineers seek to assimilate and adapt to the exponentially more powerful tools at their disposal.

“The bottom line,” says Oliver, “is that the machines are advancing at a tremendous rate. And the challenge with Blue Waters or any other supercomputer is that all of the associated tools—be they compilers or run times or drivers and all this stuff—have to be redone in addition to the physics and the algorithms and the modeling we use to simulate the drug or the atmosphere or whatever.

“There’s a whole bunch of work that will be done over the next five years,” Oliver concludes. “If it goes well, we could see some dramatic changes.”

Loading...