University partners with Apple and Mellanox for energy efficient 22.8 TFlop supercomputer
Five years ago, Virginia Tech burst onto the high-performance computing scene using Apple Power Mac G5 computers to build System X, one of the fastest supercomputers of its time. Today, Srinidhi Varadarajan and Kirk W. Cameron of Virginia Tech's Center for High-End Computing Systems (CHECS) and professors of computer science in Virginia Tech's College of Engineering have designed a new supercomputer.
This time, while the new System G supercomputer is twice as fast as its predecessor, their primary goal was to demonstrate that supercomputers can be both fast and a more environmentally green technology.
System G clocks in at an incredible 22.8 TFlops (or trillion operations per second). And keeping with tradition, though bid under a competitive contract, the machine consists of 325 Mac Pro computers; each with two 4-core 2.8 gigahertz (GHz) Intel Xeon processors and eight gigabytes (GB) random access memory (RAM). “However, the novelty of this machine does not end there,” Varadarajan said.
They will discuss System G at the SuperComputing08 conference at the Austin Convention Center that is being held this week.
Most high-performance computing systems research is conducted at small scales of 32, 64, or at most 128 nodes. Larger machines are typically used in production mode where experimental software is anathema to the end user focused on solving fundamental problems in computational science and engineering. System G was sponsored in part by the National Science Foundation and CHECS to address the gap in scale between research and production machines. The purpose of System G is to provide a research platform for the development of high-performance software tools and applications with extreme efficiency at scale.
“Given our research strengths at the Center for High-End Computing Systems, we were able to partner with Mellanox to create the first supercomputer running over quad data rate (QDR) InfiniBand (40Gbs) interconnect technology. The low latency and high bandwidth characteristics of QDR InfiniBand enable new research in transparent distributed shared memory systems that focus on usability of cluster supercomputers,” said Varadarajan, director of CHECS. In preliminary tests, System G was able to obtain transfer rates of over three gigabytes per second with small message latencies close to one microsecond.
Given these state-of-the-art communication rates (e.g., data sets consisting of nearly one billion numbers traveling between any two compute nodes in one second, with the first value arriving in one-millionth of a second), supercomputer systems and applications requiring unprecedented levels of data movement can be considered.
But, what makes System G so green? “We set out to design the fastest supercomputer with advanced power management capabilities such as power-aware CPUs, disks, and memory. Our partnership with Apple ensured the most advanced network of power and thermal sensors ever assembled in this type of machine,” commented Cameron, an expert on green computing. According to Cameron, System G has thousands of power and thermal sensors. As the world’s largest power-aware cluster, System G will allow CHECS researchers to design and develop algorithms and systems software that achieve high-performance with modest power requirements, and to test such systems at unprecedented scale.
”We are pleased to have Mellanox 40Gb/s end-to-end InfiniBand adapters and switches be the foundation for Virginia Tech’s research initiatives on power-aware and green computing, advanced scientific research systems, and future high productivity solutions,” said Sash Sunkara, vice president of marketing at Mellanox technologies. “Our advanced interconnect technology is designed to provide world-leading productivity for high-performance computing and enterprise data center clustering solutions, providing faster and more efficient research and engineering simulations.”
The mission of the CHECS is world-class computer systems research in the service of high-end computing. CHECS faculty work on a broad array of problems and design a wide range of technologies, all with the goal of developing the next generation of powerful and usable high-end computing resources. Their focus is primarily on computer science systems research.
Center members recognize that high-end resources must be powerful in a broad sense (i.e., high-performance, high-capacity, high-throughput, high-reliability, etc.), and at the same time they must be more usable and more energy efficient than current high performance computing (HPC) systems. Toward that end, the center is pursuing a broad research agenda in areas such as processor and memory architectures, operating systems, run-time systems, communication subsystems, fault-tolerance, scheduling and load-balancing, power-aware systems and algorithms, numerical algorithms, and programming models.
The center’s goal is to build computing systems and environments that can efficiently and usably span the scales from department-sized machines to national-scale resources. CHECS was established in September 2005 and supported by Virginia Tech's College of Engineering. It currently has 12 tenured/tenure track computer science faculty and 65 masters and Ph.D. students.