Exploiting Complexity in Drug Research

September 1, 2011

Even with the 132,000 processors available on Intrepid, not every aspect of the blood-flow simulation can be modeled in detail. So Karniadakis and Grinberg run two codes simultaneously. The NekTar code, which is actually a suite of simulation codes, handles the broad-brush model for the fluid in the system. And LAMMPS, a molecular dynamics simulator code, handles the detailed simulation of the particles in the model. “By coupling these codes, scientists can gain insight into the effects of the large scale flow patterns on the detailed particle behavior,” says Insley.

The more that researchers explore the basic and applied sides of drug research, the more complicated it gets. Likewise, advancing technology generates ever-expanding data sets. On the computation and simulation side, high-performance computers add hundreds of thousands of processors working in parallel. With so much data and so many tools, the challenge appears simultaneously impossible and supercharged. By exploiting the complexity, researchers explore incredibly powerful possibilities.

At the University of California, San Diego, Igor Tsigelny and his colleagues combine advanced algorithms and supercomputing power from the Argonne National Laboratory, Argonne, Ill. to take on neurodegenerative diseases. "The main challenge with modeling such diseases is that the responsible proteins are unstructured," says Tsigelny. In other words, these proteins can change shape—like a wiggling worm—until they aggregate to specific oligomers. To study which proteins play a role in a neurodegenerative process, all of the possible conformations of the protein candidates must be considered. That means that thousands of protein conformations might cause a disease being studied.

To cull the collection of proteins down to about 100 high-potential candidates, Tsigelny and his colleagues developed a Web-based tool called Membrane Associated-Proteins Assessment, or MAPAS. This tool reveals which conformations of a protein would tightly bind with a membrane. Using Argonne's IBM Blue Gene/P supercomputer, the researchers found specific proteins that bind membranes in Alzheimer's and Parkinson's disease and then searched for compounds that block that binding. "We've developed some lead candidates, tested them in mice, and shown results," Tsigelny says. Elezer Masliah and Wolfgang Wrasidlo participated in the experimental part of this project.

This work provides a central example of the power of combining high-performance computing and parallelized algorithms to approach healthcare challenges. Still, advances must be made to bring these tools to more researchers. As Jose L. Alvarez, business development manager at Dell (Austin, Texas), says: "Biotechnology companies, pharmaceutical companies, and academia lack high performance research computing resources that can address research questions in a timely manner. There are very few systems that can deliver an answer in a short period of time, and their schedule is often full for lengthy periods of time."

Hybrid architectures

For three decades or longer, Moore's Law—the number of transistors on a chip doubles every two years—pushed computers ahead at a blistering pace. "We're now hitting the limits of what standard computers can do, but problems have not hit their limits" says George Vacek, PhD, head of the life sciences group at Convey Computer in Richardson, Texas. So some developers are moving to CPUs coupled with other processors, such as graphics processing units (GPUs) and field-programmable gate arrays (FPGAs). These so-called heterogeneous or hybrid approaches push much of the parallel processing to the GPUs or FPGAs, which are designed for this kind of computation.

As an example, Convey builds its hybrid-core computers, the HC-1 and HC-1ex, around an Intel x86 processor and FPGA-based coprocessor, which can encode application-specific instructions right in the hardware. "This leads to greater performance to do bigger research problems in a smaller footprint," Vacek explains. "That smaller footprint reduces the power and cooling needed." In high-performance computing, such savings are crucial.

So if a user brings an application that works on a standard Linux x86 server, it will run on a Convey computer right out of the box. Moreover, a Convey hybrid-core computer fits in a standard rack. To get really high performance, though, an application must be optimized with the right instructions that take full advantage of the hybrid hardware. In a benchmark study running a Smith-Waterman algorithm to search for sequence alignments, Vacek says, the HC-1ex ran it 400 times faster than a single core on a standard platform.

For the moment, Convey focuses on genomics and proteomics, and the company offers many tools to accelerate the key applications in those fields. If a user wants to create a new set of instructions to accelerate an application, Convey offers what is called a "Personality Development Kit."

Although GPUs evolved from the gaming industry, these devices now play a key role in high performance computing. Dell, for example, combines Intel, AMD, and GPU cores in the same cluster. Alvarez says that they can reach 16 teraflops (floating point operations) per second with a system that includes 32 GPUs, 96 CPU cores, and 1 terabyte of RAM.

As he points out: "GPU computing has helped to accelerate several computational chemistry, molecular modeling, and bioinformatics codes, including AMBER11, NAMD, GROMACS, Schrödinger, OpenEye, CUDA-BLASTP, HMMER, etc.," adding, "This acceleration allows researchers to obtain results in hours and days compared to weeks and months if compared with CPU core-only computer architecture."

Making such a system usable, however, also depends on software. As Alvarez says, "By utilizing an open standards architecture model coupled with our functional software stack, we can provide a very appealing ecosystem to drug discovery researchers."

Go with the flow

The range of problems amenable to simulation with high-performance computers spans virtually all of healthcare. For example, George Karniadakis, a professor of applied mathematics at Brown University, Providence, R.I., and Leopold Grinberg, a senior research associate at Brown, simulate blood cell movement through vessels. Working on such a project demands a broad knowledge of both the biophysics of the blood systems and high-performance computing. For the latter, Karniadakis and Grinberg work with Joseph Insley, a senior software developer at Argonne, and Michael Papka, deputy associate laboratory director for the computing, environment, and life sciences directorate at Argonne.

"The partnership targets our in-house expertise of the IBM Blue Gene/P 'Intrepid' hardware, computational analysis, and visualization, and Brown's expertise in fluid flow simulation," says Papka. "We've built a dynamic team that is making real headway toward understanding the effects of clot formations on aneurisms."

Even with the 132,000 processors available on Intrepid, not every aspect of the blood-flow simulation can be modeled in detail. So Karniadakis and Grinberg run two codes simultaneously. The NekTar code, which is actually a suite of simulation codes, handles the broad-brush model for the fluid in the system. And LAMMPS, a molecular dynamics simulator code, handles the detailed simulation of the particles in the model. "By coupling these codes, scientists can gain insight into the effects of the large scale flow patterns on the detailed particle behavior," says Insley.

In this way, researchers can model a larger system while simulating crucial elements at a very high level of detail. "Machines powerful enough to allow us to simulate the human circulatory system down to the desired level of detail don't exist yet," Papka explains. "So to address this, Brown is developing a multi-scale application that can approximate this level of detail."

Such accelerations of simulations require lots of work. As Papka says, "It's a huge jump from running on a campus computing center to using a large national resource, and that move introduces lots of unexpected behaviors." Code might work one way on 1,000 processors and require a completely different approach on 32,000.

To create advanced displays of information, researchers use some off-the-shelf tools and develop others. For example, the Argonne team working on the blood simulations develops custom plug-ins that are used with ParaView—open source software developed in collaboration with Kitware (Clifton Park, N.Y.). For other simulation efforts, the team uses the VisIt visualization tool, which was developed by the U.S. Department of Energy's Advanced Simulation and Computing Initiative.

Building a system

At GNS Healthcare in Cambridge, Mass., combinations of supercomputers, algorithms, basic research, and patient data reveal the pathways behind diseases. Much of the horsepower behind this approach comes from REFS, which stands for reverse-engineering and forward-simulation and consists of machine-learning and simulation algorithms developed by GNS.

Recently, the company showcased the power of this approach in collaboration with Biogen Idec, Weston, Mass., by searching for novel drug targets for rheumatoid arthritis. The companies created a computerized model of the disease to run virtual clinical trials, in which simulated compounds can be tested against various targets.

In discussing this collaboration, Colin Hill, CEO at GNS Healthcare, says, "We detected drug targets and pathways that can be manipulated or knocked down with a drug to get a strong response in patients who do not respond to the standard anti-TNF therapies, such as Remicade, Enbrel, Humira."

As an example of the power of this approach, Hill points out that in addition to identifying novel drug targets for rheumatoid arthritis, the REFS platform predicted CD86, the clinically validated target of the Bristol-Myers Squibb, New York, rheumatoid arthritis drug Orencia, as a key target for non-responders to anti-TNF therapy. "The identification of CD86 as a key target validated the REFS platform's ability to make high-value, unbiased, hypothesis-free discoveries directly from patient data," Hill says.

The growing combination of high-performance hardware and software promises a supercharged industry in the near future. As Papka says, "We're trying to add that third branch to science: basic theory, experimentation, and now computation."