Software improves captioning for those with hearing deficits

October 17, 2017
Written By:
Laurel Thomas
Contact:

ANN ARBOR—Making sure deaf and hard-of-hearing students get the information presented in class and current academic events requires a lot of advance planning by the students and the offices that serve them.

It’s also a bit costly at $150 an hour or more, and even with the best captionists in the business is subject to error. Computerized automatic speech recognition programs, while able to convert speech in under five seconds, have unusually high error rates.

But software developed by a University of Michigan researcher makes getting real-time captions on-demand possible by engaging multiple, non-expert captionists at the same time.

An article in the current issue of Communication of the ACM (Association for Computing Machinery) reports the success of Scribe, the program that takes content from several less-skilled translators and intelligently forms captions in less than four seconds.

“What we did is tried to essentially democratize this process,” said Walter Lasecki, U-M assistant professor of information and of computer science and engineering, who began the work at the University of Rochester. “The trick is to algorithmically combine the efforts of a lot of people.”

Currently, a student requiring help must notify an office dedicated to serving his or her needs well in advance to request assistance in a class or event. The office then hires a translator at an hourly rate plus travel. These captionists typically are hired for hours at a time.

Oftentimes, the translator is not someone with subject-matter expertise, which Lasecki said could be problematic in a senior-level mechanical engineering course or class with similar advanced content.

Asking several peers or hiring a half-dozen work study students is not only less expensive and easier to manage, especially for events with little notice, but the translation ends up being more accurate, Lasecki said.

On average, people can only type about 10 to 20 percent of what is being said. But when you combine the notes of many people, the picture becomes more complete.

“If we’re both typing the same thing, I might miss a word but you might get that word,” Lasecki said.

By having numerous note takers, even an incorrect interpretation of the material can usually be sorted out because it’s likely more than one person has the same take on it.

“By doing turn-taking and then aggregation, we can actually get a much more reliable signal,” he said.

Lasecki said there is still room for improvement. For one, punctuation is challenging. But he hopes one day the program can be helpful to students and university offices that assist them.

 

More information: