Researchers work to create the ultimate driver’s test for automated vehicles
Automated vehicles have been steadily rolling out in U.S. cities, but scaled deployment still faces a daunting challenge: proving the technology can safely navigate the complexity of real-world driving.
Virginia Tech researchers estimate that traditional testing methods could take decades – or hundreds of millions of driving miles – to validate the full range of situations an automated vehicle may encounter.
For Feng Guo, a lead data scientist at the Virginia Tech Transportation Institute (VTTI), defining exactly what to test is the foundation for comprehensive safety validation.
“One of the key questions is, 'Exactly what are the scenarios we put into those different tests?'” Guo said. “The challenge becomes how do you select a combination of statistically relevant cases that represent the entire space of scenarios you can possibly encounter?”
In a study published in Nature Communications, researchers demonstrated how a relatively small number of test cases could be used to measure automated vehicle performance for a wide variety of traffic conditions. The framework strategically samples from a large database of human driving behavior to identify scenarios that are representative of real-world driving, reflecting both everyday situations and rare, safety-critical events.
The work was a collaboration between Guo, two doctoral students, and Xin Xing, associate professor from the Department of Statistics. Together, the team demonstrated a rigorous method to select test cases and benchmark automated driving systems (ADS) against real-world human behavior.
“We all share the same goal,” Guo said. “To prove the ADS is safe.”
The challenge of validating automated vehicles
There are multiple tools to validate roadworthiness, from closed-course testing to advanced simulation models that can run the system through thousands of virtual roadways and curated traffic situations.
Guo said the testing framework can help streamline automated vehicle rollout by providing a systematic way to set up test scenarios for those who develop, test, and evaluate automated vehicles. This is especially important for ensuring the system can handle corner cases – the rare and unpredictable situations that can often lead to safety risks on the road.
“ADS training uses relatively large data sets, but many times those kinds of rare corner cases are not necessarily comprehensively represented in the training data sets,” he said.
“This will ensure some of these uncommon scenarios are being evaluated during the initial screening and certifying process.”
A standard testing “curriculum” for automated vehicles can help make the most out of limited test track time. In addition, using one-for-one comparisons between automated vehicles and human drivers can fill potential gaps.
Because automated vehicles are currently deployed in limited areas, Guo said early performance data may still be missing important context.
“There's a kind of inherited potential selection bias because you select the safety environment for initial deployments,” he said. “If you want to do a fair comparison, you should put the ADS in the similar kinds of driving scenarios that a common human driver is facing. That will give you more convincing evidence of the safety of the ADS.”
The research
To establish a baseline of human driving, research team members leveraged the institute's naturalistic driving database, which holds 33 million travel miles of driving data, originally collected as part of the Federal Highway Administration’s second Strategic Highway Research Program.
From that dataset, they pulled about 300,000 randomly selected driving segments, covering approximately 31,000 miles of driving. These segments captured situations encountered by human drivers on the road, providing detailed information from cameras, sensors, and kinematic data.
To ensure samples were representative of the average driving population, they used a systematic sampling scheme that weighed the safety critical scenarios differently than everyday driving.
Guo said that when weighing the samples, it was important to find the right balance between everyday driving and safety-critical events.
“Most driving is boring,” Guo said. “A crash occurs in a few seconds window – think about once in millions of hours of driving. But those few seconds during that safety critical situation that puts the system into a challenge, that is when performance matters most.”
Finally, team members whittled down the selection to the most representative and safety-relevant test cases. They demonstrated a method to compare automated driving systems performance data to human driver benchmarks by comparing performance against human averages. This showed that, with a strategically selected but small number of cases, they could efficiently measure performance on the road without sacrificing reliability.
Guo said the framework can show that automated driving systems have the foundational capabilities to safely navigate everyday driving and safety-critical situations.
“I think our method is not only good for testing, but also will help ADS developers,” he said, “because now they can take a holistic look at their training data set and say, ‘Will my training data sets cover enough real-life driving scenarios?’”
Expanding the methodology
The researchers said future work could focus on making the framework accessible for regulatory agencies and technology developers. One potential next step would be creating a shared library of test cases that could be used by industry to evaluate automated vehicle safety.
Additional research could explore additional driving situations, including leveraging advances in deep generative modeling to create new rare or unseen driving scenarios.
Guo said, ultimately, this framework can help expediate automated vehicle deployment while building public trust.
“The most important thing is to bring structure to this testing and development,” he said, “to ensure your system works properly and safely.”
Authors
- Chen Qian, former research assistant, VTTI
- Jingbin Xu, former research assistant, VTTI
- Xin Xing, associate professor in the Department of Statistics at Virginia Tech
- Feng Guo, lead data scientist, Division of Data and Analytics at VTTI and professor in the Department of Statistics at Virginia Tech
Original study: doi.org/10.1038/s41467-026-69675-8