Danfeng "Daphne" Yao, a computer science professor at Virginia Tech, wants to improve the prediction accuracy of machine-learning models in medical applications. Her research findings were recently published in Communications Medicine, a selective open-access journal from Nature Portfolio.

“Inaccurate prediction may produce life-threatening consequences,” said Yao, who is both the Elizabeth and James E. Turner Jr. '56 Faculty and CACI Faculty fellow in the College of Engineering. These prediction errors could result in miscalculating the likelihood of a patient dying in an emergency room visit or of surviving cancer. 

Many clinical data sets are intrinsically imbalanced, Yao said, because they are dominated by majority groups. “In the typical one-machine-learning-model-fits-all paradigm, racial and age disparities are likely to exist, but are unreported,” she said. 

Yao and her team of researchers have collaborated with Charles B. Nemeroff, a member of the National Academy of Medicine and a professor in the Department of Psychiatry and Behavioral Sciences at the University of Texas at Austin’s Dell Medical School, to investigate how biases in training data impact prediction outcomes, particularly the effect on underrepresented patients, such as young patients or patients of color.

“I was absolutely delighted to collaborate with Daphne Yao, who is a world leader in advanced machine learning,” said Nemeroff. “She discussed with me the notion that new advances in machine learning could be applied to a very significant problem clinical investigators encounter frequently, namely the relatively small number of ethnic minorities that typically enroll in clinical trials.”

He said these low enrollment numbers result in medical conclusions drawn largely for white patients of European descent, which may not be applicable to ethnic minority groups. 

“This new report provides a methodology to improve prediction accuracy for minority groups,” said Nemeroff. “Clearly, such findings have enormously important implications for improving clinical care of patients who are members of ethnic minority groups.”

Yao’s Virginia Tech team consists of Department of Computer Science doctoral students Sharmin Afrose and Wenjia Song, along with Chang Lu, Fred W. Bull Professor in the Department of Chemical Engineering.  To conduct their research, they performed experiments on four different prognosis tasks on two datasets using a new Double Prioritized (DP) Bias Correction method that trains customized models for specific ethnicity or age groups.

“Our work presents a new artificial intelligence fairness technique to correct prediction errors,” said Song, a fourth-year Ph.D. student whose research areas include machine learning in digital health and cybersecurity. “Our DP method improves the minority class performance by up to 38 percent and significantly reduces disparities of predictions across different demographic groups, up to 88 percent better than other sampling methods.”

Song, along with fellow graduate student Afrose, worked with specific data sets to conduct their experiments. 

The Surveillance, Epidemiology, and End Results dataset was used by Song for tasks on breast cancer and lung cancer survivability, while Afrose, a fifth-year Ph.D. student, worked with a data set from Beth Israel Deaconess Medical Center in Boston for in-hospital mortality prediction and decompensation prediction tasks.

“We are excited to have found a solution for reducing bias,” said Afrose, whose research focus includes machine learning in health care and software security. “Our DP bias-correction technique will reduce potentially life-threatening prediction mistakes for minority populations.”

With these findings published and openly accessible, the team is eager to collaborate with other investigators to use these methods in analysis of their own clinical data. 

“Our method is easy to deploy on various machine learning models and could help improve the performance of any prognosis tasks with representational biases,” said Song.

Communications Medicine is dedicated to publishing high-quality research, reviews, and commentary across all clinical, translational, and public health research fields.

Share this story