Virginia Tech researchers win Digging into Data Challenge
A team of faculty and graduate student researchers from Virginia Tech and the University of Toronto submitted a winning proposal for the Digging into Data Challenge, an international funding competition designed to promote innovative humanities and social science research using techniques of large scale data analysis. “
An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic” is one of 14 projects approved for funding from the National Endowment for the Humanities and the Social Sciences and Humanities Research Council of Canada.
“An Epidemiology of Information” seeks to harness the power of data mining techniques with the interpretive analytics of the humanities and social sciences to understand how newspapers shaped public opinion and represented authoritative knowledge during the deadly pandemic that struck the United States in 1918. The research methods developed through this project promise new insights into understanding the spread of information and the flow of disease in other societies facing the threat of pandemics.
Principal investigators from Virginia Tech are Tom Ewing, professor of history and associate dean in the College of Liberal Arts and Human Sciences; Bernice L. Hausman, professor of English in the College of Liberal Arts and Human Sciences with a secondary appointment as professor at the Virginia Tech Carilion School of Medicine; Bruce Pencek, associate professor and college librarian for the social sciences in the University Libraries; and Naren Ramakrishnan, professor and associate head for graduate studies in computer science in the College of Engineering, and director of Virginia Tech’s Discovery Analytics Center. Gunther Eysenbach, senior scientist at the Toronto Centre for Global eHealth Innovation and professor of health policy, management, and evaluation at the University of Toronto is also a principal investigator on this interdisciplinary, international team.
Faculty and graduate assistants associated with three centers at the partner universities will conduct the research. The Center for the Study of Rhetoric in Society, in the Virginia Tech Department of English, supports a medical rhetoric research group that is exploring contemporary and historical vaccine controversies using interdisciplinary historical, rhetorical, and linguistic methods. The Discovery Analytics Center at Virginia Tech brings together researchers from computer science, statistics, mathematics, and electrical and computer engineering to tackle data analytics problems in domains such as sustainability, health information technology, computational neuroscience, and intelligence analysis. The Centre for Global eHealth Innovation at the University of Toronto develops techniques for monitoring news reports and public conversations about influenza-related issues in contemporary times.
“Winning the Digging into Data Challenge demonstrates Virginia Tech’s commitment to advance research at the intersection of humanities, social sciences, and applied technology,” said Ewing. “This project uses new analytical techniques to explore questions about how American society was transformed by the spread of deadly disease, with particular attention to the ways that the communication of news shaped the experience of the pandemic and the response of health authorities in the United States and Canada.”
“Just as the sheer number of news reports on the 1918 influenza make it impossible for individual researchers to access and interpret all the data, the Internet and the explosion of social media today generate too much data on a variety of socially significant topics for traditional textual analysts to handle,” noted Hausman. “Data mining allows rhetoricians to broaden the scope of individual inquiry and identify meaningful linguistic connections that might elude the lone researcher; the collaboration relies on both innovative and traditional methods, however, thereby retaining the significant contribution of in-depth textual analysis in the context of large-scale data mining.”
For Ramakrishnan, this project is an example of how text-mining methods can support not just the discovery of patterns but also the investigation of hypotheses of interest to humanists and social scientists. “We will extract linguistic features that will be used to model the topical content of newspaper articles. Such topic modeling will help us understand, for instance, how newspaper coverage was shaped by the war-time contexts, or what we might now call national security threats,” he said.
This project makes use of digitized newspapers, including the more than 100 titles for 1918 available from the Chronicling America website at the United States Library of Congress and the Peel’s Prairie Provinces collection at the University of Alberta. Pencek will provide guidance on accessing these materials as part of the library’s mission of supporting advanced research by faculty and graduate students.
According to Tyler Walters, dean of the University Libraries at Virginia Tech: “Our collaboration is representative of the important, new approaches to humanities and social science research coming about today, which are due largely to new technology and available stores of data.”
Virginia Tech and the University of Toronto will split $250,000 in external funding. The two-year project will begin immediately.
At Virginia Tech, this project is supported by the Institute for Society, Culture, and Environment and the Institute for Critical Technology and Applied Science. The support of these programs results from a commitment to advance interdisciplinary research that connects technological applications to the insights derived from the humanities and social sciences.