Project to improve data compression, from your smart glasses to the stars
A Department of Energy-funded partnership between Virginia Tech, Argonne National Laboratories, and the Ohio State University is working to develop a flexible, customizable data compression technique for the future.
Our world is awash with data these days. While that’s true in many aspects of our lives, it’s a pressing issue for applications from augmented and virtual reality systems such as smart glasses to deep space communications, which require large amounts of data to be transmitted and stored continuously or over immense distances in order to perform.
These vastly different use cases — from immersive, dynamic, real-time video to long-range information transmission — present a challenge and an opportunity to create a data compression method that can be customized to the user’s needs.
That’s exactly what electrical and computer engineering Professor Lingjia Liu, part of Virginia Tech’s Institute for Advanced Computing, is working on in conjunction with Argonne National Laboratories and the Ohio State University as part of a $3 million Department of Energy grant-funded project. Their three-headed approach allows each party to bring a specialty to the table to help forward the science and math that make these technological leaps possible.
“This collaboration is a testimony that state-of-the-art knowledge in wireless can be transferred to the design of advanced computing philosophy,” said Liu. “This opens up the door for a whole lot of collaborative work in the domain of computing and communication.”
Just as the computing power arms race continues unabated, so does the race to improve compression ratios, or the factor by which data can be compressed for transmission — the higher the number, the better. In 2016, Argonne hit an upper bounds of a compression ratio around 10. A year later, it had basically doubled that. Several years after that, a new data decorrelation method pushed it to near 50. And now? It can reach 1,000 or more.
According to Sheng Di, computer scientist at Argonne National Laboratory, we don’t know the upper limit of compression ratios because existing information theory was developed before lossy compression, the kind that this project is studying. In that process, some information is discarded — or lost — intentionally, while endeavoring to maintain enough data to deliver the correct result on the other end of the transmission.
“In order to answer these questions, we need some mathematicians, like Professor Liu, to help us,” said Di. “We need to work together because we are the compression experts and Lingjia is the theory expert.”
Liu first started working with Argonne in October 2023, consulting on information theory focused on fundamental limits of data compression, after being introduced by computer science Professor Ali Butt, who had a longstanding relationship with the lab. After almost a year of frequent discussion, they understood he was the right partner for this project.
“It’s not so much cross-institutional as it’s cross-disciplinary,” said Kirk Cameron, faculty lead and interim director of the Institute for Advanced Computing. “There’s a tremendous benefit, particularly here in the D.C. area, of being known as thought leaders in a particular space. And that’s why you get the phone call.”
To that end, Liu was enlisted to develop new information theory to derive the upper bound for compression ratio, or lossy compressibility. His counterparts at Ohio State are working to try to preserve user requirements for different use cases and types of data sets. Argonne is then using those guidelines to work to further improve compression ratios, based on this theory.
What’s novel about this particular project is the compressor’s adaptability depending on the user and use case. If speed is the priority, it can work at up to 500GB per second, close to the memory bandwidth of a GPU. If the compression ratio itself is paramount, it can reach or even exceed that 1,000 mark. Proving that it meets those user requirements within their error bounds — in other words, without losing any information that fundamentally changes the data — could create a customizable model for a wide range of applications.
“We need to understand the user’s requirements and what kinds of metrics or analysis they want to preserve during the lossy compression process,” said Di. “Our compressor is also kind of a framework, which allows users to compose or customize a compression pipeline according to their requirements.”
Just finishing the first year of the three-year project, the research currently is focused on a trio of scientific applications: climate simulation, combustion application, and material science. But Liu sees a broad range of potential applications beyond that, due to the specific novelty of the approach.
“The data set that Argonne is interested in is very complicated with many practical constraints,” he said. “Therefore, the project is really trying to characterize some fundamental limits under realistic constraints that will have significant impacts and guidance on the actual computing algorithm design.”
For wireless and content providers, to satellite operators and space programs, the results of such a highly customizable framework could fundamentally shift what’s possible. And it highlights why these research partnerships are so crucial for continual progress that impacts our lives in ways seen and unseen.
“Whether it’s industry, or national laboratories, or just scientists in general, they take notice of, and they build upon our work,” said Cameron. “Everybody goes back to the source eventually to see what the source is thinking next.”