University and industry scientists are determining how to forecast significant societal events, ranging from violent protests to nationwide credit-rate crashes, by analyzing the billions of pieces of information in the ocean of public communications, such as tweets, website queries, oil prices, and daily stock market activity.

"We are automating the generation of alerts, so that intelligence analysts can focus on interpreting the discoveries rather than on the mechanics of integrating information," said Naren Ramakrishnan, the Thomas L. Phillips Professor of Engineering in Virginia Tech’s Department of Computer Science. He is leading the team of computer scientists and subject-matter experts from Virginia Tech, the University of Maryland, Cornell University, Children's Hospital of Boston, San Diego State University, University of California at San Diego, and Indiana University, and from the companies, CACI International Inc., and Basis Technology.

Within Virginia Tech, the team spans the departments of computer science, mechanical engineering, statistics, and agricultural and applied economics, the Virginia Bioinformatics Institute, and the Institute for Critical Technology and Applied Science.

The project is supported by a potential $13.36 million three-year contract from the Open Source Indicators (OSI) Program of the Intelligence Advanced Research Projects Activity (IARPA), a research arm of the Office of the Director of National Intelligence. Three teams were awarded contracts, with continuation after the first year contingent upon satisfactory progress.

“Research shows that many significant societal events are preceded by population-level changes in communication, consumption, and movement. Some of these changes may be indirectly observable from diverse, publicly available data, but few methods have been developed for anticipating or detecting unexpected events by fusing such data,” said Jason Matheny, OSI program manager at IARPA. “OSI’s methods, if proven successful, could provide early warnings of emerging events around the world.”

Each OSI research team will be required to make a number of warnings/alerts that will be judged on their lead time, or how early the alert was made; the accuracy of the warning, such as the where/when/what of the alert; and the probability associated with the alert, that is, high versus very high.

The Virginia Tech-led team calls its project EMBERS, for early model-based event recognition using surrogates. Surrogates are accessible pieces of information that mirror or precede events of interest. The team intends to organize a huge database of surrogates predictive of real events and to apply these surrogates to public data sources.

The focus of the IARPA program is on Latin American countries. A key theme in the EMBERS project is the use of models to capture population-level behavioral changes in these countries. Tracking or identifying individuals is strictly excluded from the research.

"The models must be expressive enough to capture many important behaviors. For instance, how many people and what other factors result in a protest becoming violent? When do a few reported cases of dengue fever become an epidemic? But we do not want a model that is so complex that it becomes intractable. So finding the right balance is important,” said Madhav Marathe, professor of computer science and deputy director of the Network Dynamics and Simulation Science Laboratory at the Virginia Bioinformatics Institute, and EMBERS co-investigator.

Other EMBERS co-investigators at Virginia Tech include Achla Marathe, Anil Vullikanti, Stephen Eubank, Chris Barrett, Bryan Lewis, and Jiangzhuo Chen, of the Network Dynamics and Simulation Science Laboratory at Virginia Bioinformatics Institute; Chang-Tien Lu, of computer science; Scotland Leman, of statistics; and Michael Roan, of mechanical engineering.

EMBERS co-investigators at other institutions are Dipak Gupta, of San Diego State University; David Mares, of the University of California, San Diego; John Brownstein, of the Children’s Hospital of Boston; Johan Bollen and Luis Rocha of Indiana University; Aravind Srinivasan, Lise Getoor, and Jennifer Golbeck, of the University of Maryland, College Park; Tanzeem Choudhury, of Cornell University; Kristen Summers, of CACI International Inc.; and Jeff Godbold of Basis Technology.

“Extracting valuable information from massive data sets is the new frontier of computing. This project demonstrates the power of well-led interdisciplinary teams in developing new knowledge discovery and data analytics algorithms and systems to address important problems,” said Barbara Ryder, J. Byron Maupin Professor of Engineering and head of Virginia Tech’s Department of Computer Science.

“Large-scale analytics is considered to be one of the emerging technologies that will have transformative impact on lives." said Roop Mahajan, director, Institute for Critical Technology and Applied Science at Virginia Tech.

The team response to the IARPA solicitation was led by Jon Greene, director of National Security Research and Program Management at the Institute for Critical Technology and Applied Science. Christine Tysor at the institute will lead project management for EMBERS. “These individuals are invested in ensuring superior performance in all aspects of the formulation and execution of this project" said Mahajan.

"Naren Ramakrishnan has established a powerhouse team of leading experts from academia and industry. This team will use its expertise to deliver rapid ways to arrive at solid analytical decisions and quantitative predictions to our nation's intelligence analysts," said Richard C. Benson, the Paul and Dorothea Torgersen Chair and dean of Virginia Tech's College of Engineering. "Virginia Tech is honored to be leading such an accomplished group of investigators."

Share this story