Lifu Huang receives NSF CAREER award to lay new ground for information extraction without relying on humans
Huang's extraction techniques could help analyze millions of research papers, reports, and emerging events around the world and preserve thousands of languages in danger of becoming extinct.
Considering the millions of research papers and reports from open domains such as biomedicine, agriculture, and manufacturing, it is humanly impossible to keep up with all the findings.
Constantly emerging world events present a similar challenge because they are difficult to track and even harder to analyze without looking into thousands of articles.
To address the problem of relying on human effort in situations such as these, Lifu Huang, an assistant professor in the Department of Computer Science and core faculty at the Sanghani Center for Artificial Intelligence and Data Analytics, is researching how machine learning can extract information without relying on humans.
In his research supported by an National Science Foundation Faculty Early Career Development (CAREER) Award, Huang is developing new paradigms to extract event knowledge from the text of any domain or scenario by transferring existing resources and ontologies from old domains to new ones with no human effort required. This is in contrast to existing approaches for event extraction, which depend heavily on human-labeled data for limited event types that are customized for a particular domain or scenario.
“Our proposed techniques could be used to analyze millions of research papers and reports and turn them into structured knowledge, enabling users to answer any questions they may have without looking into the millions of articles, speed up their discoveries of new patterns or knowledge in their research, and facilitate integration of existing literature into their research,” said Huang.
In the case of emerging events, government agencies would be able to track them more efficiently, analyze them based on structured knowledge without looking into thousands of articles, and inform critical personnel who can then make faster and better decisions.
Another significant contribution of Huang's research is its ability to help preserve local histories.
“About half of the world’s 7,000 languages are endangered and at the risk of becoming extinct. The cultural knowledge and historical events recorded in the data are valuable to local communities. With the proposed techniques in this project, we can identify all the meaningful cultural/historical events in these communities and store or document them into community-specific history books,” Huang said.
And these are just a few examples of benefits, he said. The ultimate goal is to extend applicability to almost any domain.
Two of Huang’s Ph.D. students, Minqian Liu and Sijia Wang, are working with him on the project.
Established in 1995, the CAREER award is the National Science Foundation’s most prestigious award supporting early-career faculty with the potential to serve as academic role models in research and education and to lead advances in their organization’s mission.
As part of the award, Huang is planning to expand his educational outreach by partnering with the Center for the Enhancement of Engineering Diversity. He will design hands-on learning activities for pre-college students from diverse underrepresented groups who participate in an annual series of well-established summer camps, including a C-Tech2 program for 50 junior and senior high school girls, two-thirds of whom are from Virginia. Specifically, he is developing a two-to-three-hour interactive Natural Language Processing session.
Huang will also participate in the Black Engineering Excellence at VT program for 50 rising junior and senior high school students from Martinsville, Richmond, Prince William County, and Franklin City.
Since 2021, Huang has recruited undergraduate students of diverse representation across the country as summer interns. Advised directly by him and mentored by his graduate students, the interns are involved in ongoing research projects in natural language processing and open world event extraction and trained with the basic skills for applied research in machine learning. They also have an opportunity to publish and present their research outcomes at research conferences and symposiums.
“Working with undergraduate students in these various capacities, I can leverage the techniques and tools developed in my CAREER project to explain challenges in automatic text understanding in the context of real world applications,” said Huang. “I believe such programs contribute to attracting more and more diverse high school students into STEM and computer science early on and, in the long-term, better prepare them for pursuing higher education or STEM job opportunities.”