Anuj Karpatne’s project “Lake-GPT: Building a Foundation Model for Aquatic Sciences” is one of the first 35 to be supported with computational time through the National Artificial Intelligence Research Resource (NAIRR) Pilot program, marking a significant milestone in connecting U.S. researchers and educators to computational, data, and training resources needed to advance artificial intelligence (AI). 

The NAIRR Pilot awards – a joint effort led by the National Science Foundation in collaboration with other U.S. federal agencies – are a result of President Joe Biden's landmark Executive Order on the Safe, Secure and Trustworthy Development and Use of AI and provide researchers and students access to key AI resources and data. 

Karpatne, associate professor in the Department of Computer Science and core faculty at the Sanghani Center for Artificial Intelligence and Data Analytics, was among 10 award recipients invited to speak at a White House event hosted by the Office of Science and Technology Policy on Opportunities at the AI Research Frontier on May 6 announcing the launch of the NAIRR Pilot program. Karpatne also was one of two recipients invited to give longer talks on their NAIRR projects at the AI Expo for National Competitiveness in Washington, D.C., hosted by the Special Competitive Studies Project. 

A second group of NAIRR Pilot awards, announced in late May, include Debswapna Bhattacharya, associate professor of computer science, and Xuan Wang, assistant professor of computer science and core faculty at the Sanghani Center. 

All NAIRR Pilot awardees are supported for six months and have access to advanced computing systems funded by the National Science Foundation or supported by the Department of Energy for their AI research. Karpatne is using the AI capability of the Summit supercomputer at Oak Ridge National Laboratory to model the quality of water in lakes and reservoirs across the United States. Bhattacharya is using the Texas Advanced Computing Center's Graphic Processing Units (GPUs) to supercharge foundational AI models for pandemic prediction, and Wang is using the Pittsburgh Supercomputing Center Neocortex as the allocated resource.

Building new class of foundation models in aquatic sciences

Karpatne’s research is exploring how advances in large foundation models such as ChatGPT can inspire similar breakthroughs on the AI research frontier to enable a scientific revolution in domains where the nature of data can be quite different from their industry counterparts involving internet-scale text and image data.

In the NAIRR project, Karpatne and his team are creating a new class of foundation models in aquatic sciences, termed Lake-GPT, to model a variety of processes related to the quality of water in lakes and reservoirs by using novel advances in the emerging field of ecology knowledge-guided machine learning.

This, he said, is distinct departure from mainstream practices of building “black-box” AI models that solely rely on supervision contained in data to also leverage the wealth of scientific knowledge available in many domains in diverse formats, including aquatic sciences like conservation laws of mass and energy.

“As climate change and growing demands for water continue to put increasing pressures on our already vulnerable water resources, we are witnessing growing crises of water availability and quality in many parts of the world,” Karpatne said. “Understanding how best to respond to the current crises requires exploring scenarios of change, both in use of water and trajectories of climate so that we can direct further action to protect our precious water resources while providing for society's needs.”

Karpatne is collaborating with lake scientists and ecologists from Virginia Tech and other universities: 

  • Cayelan Carey, professor of freshwater ecosystem science and Roger Moore and Mojdeh Khatam-Moore Faculty Fellow, Department of Biological Sciences
  • Mary Lofton, postdoctoral associate, Department of Biological Sciences 
  • Paul Hanson, distinguished research professor in the Center for Limnology at University of Wisconsin-Madison and lead principal investigator of an National Science Foundation Macrosystems Biology grant supporting this research
  • Benett McAfee, Ph.D. student advised by Hanson
  • Abhilash Neog, Ph.D. student advised by Karpatne 
  • Sepideh Fatemi Khorasgan, Ph.D. student advised by Karpatne 
  • Arka Daw, a Ph.D. graduate from Karpatne’s group, currently a distinguished staff fellow at Oak Ridge National Laboratory.

“We hope that our research will not only enable a new kind of model in aquatic science that forecasts the quality of water in lakes and reservoirs, but will also enable the discovery of emergent properties of lake processes and their interactions at macro-system scales that leads to advancing scientific knowledge,” Karpatne said.

composite head shots of Debswapna Bhattacharya and Xuan Wang

composite head shots of Debswapna Bhattacharya and Xuan Wang
Debswapna Bhattacharya (at left) and Xuan Wang. Virginia Tech photos

Predicting pandemics using GPU-accelerated high performance computing


Bhattacharya’s project, “GPU-accelerated High-performance Computing to Supercharge Foundational AI Models for Pandemic Prediction,” was spurred by the COVID-19 pandemic when, he said, “we saw first-hand an example of the medical, economic, and societal devastation that a pandemic can cause.”

Because most emerging human viral diseases are zoonotic, meaning that they originate from animals, there is a critical need to develop a comprehensive predictive framework to identify novel viral sequences with zoonotic potential and determine what genetic changes in existing viral strains could contribute to human spillover, he said.

“However, experimental approaches for pandemic prediction are expensive, low throughput, potentially dangerous, and incommensurate with the scale of the problem,” said Bhattacharya.

The aim of the research supported through the NAIRR program is to computationally model viral evolution powered by the advances in AI by harnessing the GPU-accelerated high-performance computing capability of the Texas Advanced Computing Center and developing a predictive intelligence framework for pandemic science.

“Our ultimate goal is to not only advance AI for health care but to also inform drug/vaccine development workflows,” he said.

Bhattacharya will work collaboratively with other Virginia Tech faculty from the Department of Computer Science:

Addressing growing concern about accuracy and truthfulness

Wang’s NAIRR Pilot award supports “Trustworthy QA: Enhancing Complex Reasoning in Large Language Models,” a project that addresses a growing concern about the accuracy and truthfulness of information provided by state-of-the-art question-answering systems. Built on top of pre-trained large language models, these systems — especially the open-source ones — are used in everyday applications such as search engines, chatbots, and virtual assistants and are widely adopted by companies when considering user data privacy.  

“Incorporating guidance from advanced models like GPT-4 further enhances the reasoning capabilities of LLMs,” said Wang. “And this approach promises to minimize misinformation and improve the overall trustworthiness of AI-driven information seeking for the general public.” 

Ph.D. student Zhenyu Bi is working with Wang on this project.

Share this story