Virginia Tech team selected as finalist in Alexa Prize SimBot Challenge to advance next-generation virtual assistants

Barbara L. Micale

7 Mar 2022

Research team sitting around desk looking at screen — One of 10 finalists in the Alexa Prize SimBot Challenge, Virginia Tech’s team members meet regularly for updates on their specific work and overall progress on the project. The winner will be announced in 2023. Photo by Andrew Cybak for Virginia Tech.

A Virginia Tech team from the Sanghani Center for Artificial Intelligence and Data Analytics is one of 10 finalists chosen to compete in the Alexa Prize SimBot Challenge. The challenge focuses on advancing the development of next-generation virtual assistants that continuously learn and gain the ability to perform common sense reasoning to help humans complete real-world tasks.

“The SimBot should be able to understand the intention of a task as well as any instructions or feedback it receives from a user and interpret the environment to correctly predict what action is needed to complete it,” said Lifu Huang, assistant professor of computer science and faculty at the Sanghani Center.

“Our team is developing an innovative, multimodal instructional and actionable knowledge graph from large-scale instructional resources in text, images, and videos that cover thousands of human daily tasks that SimBot can leverage to further enhance its capability to interact with users and handle a broad range of human tasks,” said Haung, who will advise a team of four Ph.D. students on how to tackle the research and navigate technical challenges.

Huang’s expertise includes extracting structured knowledge with limited supervision natural language understanding and reasoning with external knowledge and common sense, natural language generation, representation learning for cross-domain transfer, and multimodality learning.

Huang shared the following scenario of advanced interaction that the team will strive to achieve during the challenge. It shows how SimBot successfully completes a more complex task composed of simpler subtasks.

SimBot: Hello, how can I help you today?

User: Hi, I will teach you how to prepare breakfast for me today.

SimBot: Sure, what should I do?

User: First, you need to make a cup of coffee. Then, you need to slice the bread and cook it with the toaster. Finally, you need to deliver breakfast to my room.

SimBot: Could you give me more detailed instructions on how to slice the bread?

User: You should use the knife and cut it into several pieces in approximately the same size.

SimBot: Got it. Should I prepare breakfast for you today?

User: Yes, go ahead.

SimBot prepares the breakfast as taught by the user.

SimBot: Breakfast is ready.

User: Well done!

“One of our overall goals is for users to feel that our SimBot is capable and intelligent, communicative, knowledgeable, and efficient,” Huang said.

Below are the four members of the team, all Ph.D. students in computer science:

Minqian Liu is serving as team leader for the Alexa Prize SimBot Challenge. In that role, he is responsible for setting up long-term and short-term goals, coordinating the team’s work, and communicating with the challenge organizer. As a team member, he will be building the object state tracking, action verification, and dialogue-based interaction components. He also will participate in the design and implementation of other modules, especially the construction of the multimodal instructional knowledge graph.

Sijia Wang will establish the knowledge graph from instructional articles, images, and video demonstrations on the internet, such as WikiHow. She also will focus on collectively grounding entities and actions extracted from text to video to associate each entity or action with a visual image or video clip.

Zhiyang Xu will be responsible for extracting the subtasks from the human instructions by leveraging an external actionable knowledge graph and enforcing the temporal and logical constraints of those subtasks.

Ying Shen will focus on the multimodal goal-oriented action prediction network, which requires the reasoning across the inputs in multiple modality. She will also take part in the construction of the multimodal instructional knowledge graph.

“It is an honor to be a finalist team in the Alexa Prize SimBot Challenge and a tribute to the excellence of our students,” said Huang.

“The challenge is a great way of highlighting the students’ work and also an opportunity to contribute to the field of artificial intelligence by helping to bridge the gap of theory-practice divide and adding value to both theoretical and practical perspectives,” he said.

University teams selected for the Alexa Prize SimBot Challenge participate in both the public benchmark and live interaction phases and receive a research grant, Alexa-enabled devices, free Amazon Web Services cloud computing services to support their research and development efforts, as well as other resources, and Alexa team support. The winning team will receive a $500,000 prize, while the second- and third-place teams will receive prizes of $100,000 and $50,000, respectively. The winner will be announced in 2023.