The Leonardo DiCaprio thriller “Inception” is the brain-twister movie of 2010. It is a box office smash. But if you haven’t seen it yet, you may want to avoid the Internet. Spoilers giving away the film’s ending abound on scores of movie fan community websites and comment sections of critic reviews. But what if there was a way to freely surf such websites and avoid spoilers?

For the uninitiated, a spoiler is a sentence within a news story, review or comment on a film, TV show or book that reveals a major plot detail that potentially could ruin the viewer or reader’s full enjoyment. For instance, a review that mentions Dorothy killing the Wicked Witch of the West by dousing her with water in “The Wizard of Oz” could ruin the film for some viewers. They might skip the film altogether, knowing its ending.

Enter graduate student Sheng Guo of Yangzhong, China, a Ph.D. student in the computer science department at Virginia Tech’s College of Engineering and his advisor, Naren Ramakrishnan, a professor of computer science. The men have developed a data mining algorithm that uses linguistic cues to spot and flag spoilers before you read them, thus saving much frustration for those who enjoy being surprised. Guo recently presented his findings at the 23rd International Conference on Computational Linguistics held in Beijing. 

Guo has long considered the use of such a program after stumbling across a comment that revealed the ending of the 1994 prison drama “The Shawshank Redemption,” before he watched it. The spoiler was on the popular website, Internet Movie Database (imdb.com), which catalogues tens of thousands of films and features hundreds of thousands of reviews written by users from around the world. The website flags spoilers as it catches them, “but the performance is very bad, and one of our work’s target and evaluation criteria is to beat their method,” said Guo.

“I enjoy reading imdb.com, and it is a nice website for introducing good movies. I read the movie comments there a lot,” he said. “Not all people clearly indicate the spoilers in their comments, and even if they do realize the possible spoilers, they may use various ways to mark a spoiler, which do not always work.” As a result, Guo avoids reading reviews and comments on crime and mystery thrillers.

In the early stages of development, the program is now designed to flag entire articles as a spoiler if endings of a work are revealed. How does it work? From known spoilers, the program learns combinations of vital words in sentences, such as “Dorothy,” “douse,” “water,” “kill” and “Wicked Witch,” and applies them to new comments. This is more effective than relying on individual words which often reveal nothing. As well, generic descriptions such as “Dorothy meets the Tin Man” will not be flagged as they do not reveal the ending, Ramakrishnan said.

“The words have to be used in the right parts of speech and in specific relation to each other,” said Ramakrishnan.

The program won’t delete content with spoilers, only flag them so that readers on their own can decide whether or not to continue reading. It also can be deployed so that it warns writers that they have typed a spoiler into their web post, even if they did so unwittingly. “As a poster is writing a review, the program can help analyze the relationships between the words in the review and tell him or her that the review has high probability of being ranked as a spoiler,” Ramakrishnan said.

Guo and Ramakrishnan hope to fine tune and improve the program as more funding becomes available. Now if only we could screen out the movie trailers that needlessly give away the entire plot of the film.

Share this story