Sanghani Center research takes new approach to analyze depression, anxiety from Reddit posts to provide better care, lower suicide rate
Suicide, the 10th leading cause of death for adults in the United States and the third leading cause of death among kids ages 10 to 14 and young adults ages 15 to 24, is often the result of an underlying mental health condition such as depression, anxiety, or bipolar disorder.
Motivated by a suicide mortality by state map released by the Centers for Disease Control and Prevention (CDC) on the increasing severity of mental health crisis — further exacerbated by the COVID-19 pandemic — three Ph.D. students and their advisor at the Sanghani Center for Artificial Intelligence and Data Analytics are analyzing social media in a way that can help social workers and other professionals better understand and tackle different aspects of mental health issues to help prevent suicide.
“Since social media platforms like Reddit are accessible, convenient, and anonymous, more users candidly express feelings about their own mental health issues,” said Shailik Sarkar, first author on “Predicting Depression and Anxiety on Reddit: a Multi-task Learning Approach.” Sarkar and his team garnered the Best Paper Award at the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining in November.
Sarkar’s collaborators are fellow students Abdulaziz Alhamadani and Lulwah AlKulaib, and Chang-Tien Lu, professor of computer science and associate director of the Sanghani Center.
Typically, research on the Reddit platform would employ the subreddit name, such as r/Anxiety, r/Depression, or r/SuicideWatch, to detect the type of the post and the nature of the mental health issues that correlate to the post.
But the multi-label classification data set designed by the Virginia Tech researchers does not overly rely on the subreddit topic chosen by the user because someone posting in the r/Anxiety subreddit could also be suffering from other mental health conditions such as depression, sleep disorder, or post-traumatic stress disorder.
Their unique model has a task-specific feature that extracts the words and phrases used in a post and assigns them a score corresponding to each topic that applies. For example, words and phrases such as “meds,” “panic attacks,” “often,” “taking,” and “worrying” could suggest anxiety, while “struggling,” “therapist,” “don’t want,” “need,” and “anymore” could indicate depression.
“Reddit posts can be longer than other social media posts, especially when pertaining to mental health discussion, so there is a very good chance that the same post will fall into different categories,” said Sarkar. “Insights into the kind of language used by those suffering from anxiety, depression, and other mental illnesses can be of great benefit to social workers and mental health practitioners. A better understanding of those under their care may help prevent incidents of suicide.”
“Our research with Reddit — which always adheres to privacy and anonymity of users — demonstrates how the lack of unlabeled data or benchmark data set can be tackled by using active learning and can help inform other data scientists who are interested in human-centered computing or social media data mining,” said Lu.