Extracting patterns and meaning from the vast and expanding trove of scientific data available to researchers has become a stumbling block for innovation. Xuan Wang, assistant professor of computer science, wants to use artificial intelligence (AI) to change that. 

Her five-year, $400,000 National Science Foundation Faculty Early Career Development Program (CAREER) award intends to solve this growing problem and speed up meaningful discoveries in any scientific field.

“I’m so pleased for Xuan,” said Christine Julien, head of the Department of Computer Science. “Getting a CAREER award with your first proposal speaks not only to the rigor of her research but also to the importance of using AI solutions for the good of everyone.”

This and other technological research relies on a range of funding sources, from university to federal government support.

"I'm grateful to NSF for believing in our researchers,” Julien said. “Support like this allows faculty like Xuan to launch their careers and put their ideas into action. It’s because we have a robust federally funded research program at Virginia Tech that we can make real progress on the global challenges we face.”

Fast facts

Xuan Wang portrait
Xuan Wang. Photo by Tonia Moxley for Virginia Tech.

From tedium to transformation

The need for an accurate, automated system was what brought Wang into the field of computer science in the first place.

She was pursuing a master’s degree in biochemistry when she found herself having to manually comb through all the genes across the whole human genome and compare them to experimental data. 

“It was so tedious. I thought, ‘It shouldn't be like this, and there must be some way to automate the process,’” Wang said.

So she explored computing and statistical analysis to solve her own problem and ended up completing a master’s degree in statistics, followed by a doctorate in computer science, both from the University of Illinois at Urbana-Champaign. 

“We want to build a very accurate and trustworthy system that can automatically read through these tons of published science papers and organize the information, so later when people want to use it, they can quickly search, find, and digest it,” Wang said.

Extracting meaning from data

Xang’s CAREER award will enable her team to continue recent work with Children’s National Hospital, a longtime Virginia Tech partner, to create an automatic extraction system for electronic health records.

“They have a huge amount of written, unstructured clinical notes. If we can automatically organize those so doctors can search them and quickly find the information they need to treat patients, it could improve health outcomes,” Wang said.

And the new grant will include a collaboration with PubMed, a freely accessible database provided by the U.S. National Institutes of Health. Wang’s team will be able to beta-test the systems on a national database that includes over 38 million citations for biomedical literature.

The project isn’t limited to health care and medicine. It will target systems that can work across an array of science and engineering fields.

“Over decades and even centuries, researchers have made scientific discoveries in different domains — biology, physics, math, and other disciplines,” Wang said. “All that documentation goes into databases. But it's actually very hard for even the experts who want to do follow-up work to find and synthesize the information they need.”

Big questions, small models

Wang, core faculty with the Sanghani Center for Artificial Intelligence and Data Analytics, said the project will compare the effectiveness of both large and small language models — two forms of AI — to create systems that can mine databases for information, then structure it to deliver accurate, streamlined overviews of existing knowledge.

Large language models are powerful, complex AI systems, like ChatGPT, that can understand, analyze, and generate human language by processing vast amounts of text data. “They are powerful and fast, but they are expensive to use and can make mistakes,” Wang said. “They can hallucinate, meaning they can provide information not included in the database they’re searching. They require a lot of structuring and regulation.”

In contrast, small language models are simpler, cheaper to use, and customizable.

“We can download, host, and tune a small model on our own computer,” Wang said. “For people in many of the research domains, we're not 100 percent sure which model is best, especially for structuring information for efficient search and analysis.”

With her CAREER award, Wang will build on her original passion for biochemistry, using artificial intelligence to push forward innovation in health care.

“I see many urgent problems in the medical domain, and it’s an area where computer science researchers can make a meaningful difference,” Wang said.

Xuan wang stands with research group
Xuan Wang (third from left) received a U.S. National Science Foundation CAREER award to develop artificial intelligence systems to help scientists make sense of big data. Photo by Tonia Moxley for Virginia Tech.
Share this story