Virginia Bioinformatics Institute installs hybrid-core computing platform
Harold "Skip" Garner, executive director of Virginia Bioinformatics Institute and professor in the Department of Biological Sciences at Virginia Tech, says he pays close attention to the DNA of high-performance computing (HPC).
Some 15 years ago, a Texas computer company, Convex, supplied the supercomputer that aided Garner’s work in mapping chromosome 11 and half of chromosome 15. Today, another Texas computer company, Convey, is helping his Virginia Bioinformatics Institute team research everything from the genes that cause cancers to how a virus spreads during pandemics.
Garner says he has “a lot of faith” in the Convey team, its Convex-Convey technology pedigree, and the company’s engineering focus on hybrid-core computing. “There is a real promise of scalability that does not come from just stacking up servers,” he says.
Convey’s revolutionary hybrid-core computing architecture tightly integrates advanced computer architecture and compiler technology with commercial, off-the-shelf hardware – namely an Intel® Xeon® processor and Xilinx® Field Programmable Gate Arrays (FPGAs). The systems help customers reduce energy costs associated with high-performance computing, while dramatically increasing performance over industry standard servers. Additionally, Convey systems are easy for programmers to use because they provide full support of an ANSI standard C, C++ and FORTRAN development environment.
By using bioinformatics, which combines transdisciplinary approaches to information technology, medicine and biology, researchers at the institute generate, interpret and apply vast amounts of data from basic research to some of today’s key challenges in the biomedical, environmental and agricultural sciences.
For the Virginia Bioinformatics Institute research teams, Convey offers a novel computing approach with great promise. “We are intrigued by the concept of putting together traditional processors and FPGAs such that it would reduce the ‘threshold of pain’ for using these very high-throughput, very efficient processors. Also, there is the promise of having certain codes that will run on this architecture unlike anything else and anywhere else.”
Convey recently announced that its implementation of the Smith-Waterman algorithm, widely used in life sciences applications for aligning DNA and protein sequences, is 172-times faster than conventional methods. With such hefty performance increases, bioinformatics and computational biology researchers are able to discover more information about genes and, in turn, find new ways to cure and manage diseases.
Garner explains that the institute will use Convey’s hybrid-core systems for its data analysis work for the 1000 Genomes project (an international effort to sequence the genomes of approximately 2,500 people from about 20 populations around the world). The team has looked at 340 terabytes of data so far and much more is anticipated. The Convey systems will also be used to support text data mining as well as decision and policy informatics work at the institute.
The Virginia Bioinformatics Institute is concentrating, says Garner, on “the often under-appreciated and difficult computational area of repetitive DNA or microsatellite analysis. Microsatellites play very important, major roles in cancers as well as neurological diseases such as schizophrenia and autism. We are trying to cure cancer by developing diagnostics, identifying therapeutic targets, and working out how best to combine these approaches with drug treatments.”
A leader in the new field of “decision and policy informatics,” researchers at the institute use mathematical models and computer simulations to investigate, for example, how infectious diseases, such as influenza, emerge and spread through populations of millions of people. These simulations allow experts to test the impact of different public health interventions on the spread of infectious agents like viruses through large populations.
The institute also is focused on inventing technologies that help scientists be more productive in their research. For example, Garner’s group has developed eTBLAST, a biomedical data-mining engine that accepts a query and compares it to a collection of other texts, especially Medline. Computers analyze up to 20 terabytes per day of reference and duplication searches. Duplication searches are used to detect plagiarism in academic and medical research.
Already a computational powerhouse, the Virginia Bioinformatics Institute is working with Virginia Tech to develop a new high-performance computing hub for the university. Virginia Tech has a strong history of developing experimental computers that span engineering, computer science, biology, and other life science applications.
Says Garner, “We are developing supercomputing on demand services that will help to cater for the expansion in clinical applications that will be coming on-line shortly with the new Virginia Tech Carilion School of Medicine and Research Institute in Roanoke. We anticipate a host of exciting new applications in computationally intensive areas such as electronic records and data mining, clinical, image, and radiology data storage and analysis, as well as magnetic resonance imaging.”
“Bioinformatics is making huge advances but with the advent of modern technologies, such as the ultra high-throughput next-generation sequencers, the volume of generated data is out-stripping our ability to reduce that data to knowledge,” adds Garner. “As a consequence, we will need more and more processing power and data storage power. The new computer capabilities that we will be bringing on-line will help to push the envelope of scientific discovery across a tremendous range of scientific disciplines.”