Professor Binoy Ravindran and his team of researchers haven’t just cracked the code – they’ve corrected it.

Using mathematical principles for understanding code behaviors, Ravindran’s group has found that source code in many popular reverse engineering tools such as Ghidra may not be as ironclad as creators intended. Through their mathematical proofs, software programmers now can be sure that their code is free of unintended behaviors attractive to hackers.

Securing the zeros and ones

Security practitioners often assess a program by testing its machine code. These machine codes, made up of a series of zeros and ones, are what’s executed on a computer. However, they can be hard to understand for humans, especially when the original source code used to create the machine code is not available.

With reverse engineering tools, those long and difficult-to-decipher numerical lines of information can be translated into source code, which is much closer to spoken language, allowing security practitioners to see what’s really going on in the binary code. Unfortunately, these tools may miss important behaviors of the machine code, especially those not intended by the programmers who originally wrote the source code, which enables hackers to find vulnerabilities. 

Ravindran and his team of computer engineers at Virginia Tech are aware that a number of industries and companies utilize online platforms for consumer operations. Many of these platforms house sensitive data – think credit card numbers, bank accounts, or personal information that you would share with a health care provider. That’s why verifying those reverse engineering tools is crucial for companies and consumers who trust companies to keep their information safe.

“When a programmer creates computer code, they may unintentionally introduce behaviors, such as revealing sensitive data stored in memory or allowing a normal user who interacts with the code to have superuser privileges for the computer running that code,” said Ravindran, who is a Bradley Senior Faculty Fellow. “As a programmer, you certainly are writing code with the idea that the code behaves exactly the way you intend. But sometimes, you might inadvertently introduce unintended behaviors.”

Those unintended behaviors only manifest when the machine code is exposed to uncommon combinations of machine state and input data. According to Ravindran, security researchers call these situations “weird behaviors.”

The Bradley Department of Electrical and Computer Engineering team has created a technique that is mathematically proven to display all possible behaviors in a machine code, both programmer-intended as well as unintended.

“Unintended behaviors are what we are really interested in because hackers exploit those behaviors to create malicious activity and take advantage of those vulnerabilities,” said Ravindran.

The novel mathematical proof-based tool created by Ravindran and the team as part of a Defense Advanced Research Project Agency project ensures that no vulnerabilities have been missed, which is imperative for security practitioners.

Putting the tool to the test

The research team applied the mathematical technique to a real-world hypervisor software program called Xen. Hypervisor software is typically used in cloud computer systems to manage computing resources, and Xen is used by Amazon for its cloud systems. Consumers who use Amazon cloud services or even save their credit card information online to simplify the checkout process can benefit from knowing that their information is secure based on these mathematical principles.

In just 18 hours, the team tested Xen's nearly 400,000 machine code instructions. The test determined the portion of Xen’s code that was analyzed contained only intended behaviors. Prior to this method, companies most commonly used a technique called “fuzzing,” which tests large sets of code for vulnerabilities via randomized selection.

researcher poses for camera while working on computer monitors in an indoor office space.
Electrical and computer engineering research faculty member Freek Verbeek tests the team's newly developed and mathematically proven technique to test machine code. Photo courtesy of Freek Verbeek.

“The idea with fuzzing is that if parts of code are randomly chosen, then if there is any vulnerable behavior in the code, it will likely manifest in one of these randomly chosen cases,” said Freek Verbeek, a research faculty member working on the project. “But of course, there are cases where fuzzing might miss something, especially those weird behaviors. This is where a more mathematical technique is preferred. As a consumer, I know that I want my data protected with 100 percent certainty.”

The team also tested the National Security Agency’s popular reverse engineering tool, Ghidra. With the mathematical proof-based code analysis, the team found unintended behaviors that could lead to attackers creating offensive code behaviors. 

Nico Naus, a postdoctoral researcher in electrical and computer engineering who also worked on the project, emphasized that these findings are ultimately meant to improve the lives of others. 

“Academic security research is all about finding flaws in software so we can fix it before hackers find vulnerabilities and do malicious things,” said Naus. “We’ve shared these weaknesses with the National Security Agency and plan to ensure any future unintended code behaviors are reported to the software community.”

Share this story