Monday, November 22, 2010

Google rolls out improvements in automatic captions on YouTube

From the Wall Street Journal:
When YouTube launched its automatic closed-captioning service a year ago, there were a few important words the tool couldn’t recognize — like “YouTube.”

The problem highlighted some of the difficulties faced in the push to make new technology more accessible to people with disabilities. Accessibility tools like speech recognition are still evolving. And so much information is now generated by users that it seems impossible to make it all accessible to everyone.

In the past few weeks, Google has rolled out major improvements in the technology behind automatic captions, reducing the overall word error rate by 20%. The tool, which is accessed by clicking on the little “cc” button on most YouTube videos and selecting “Transcribe Audio,” can be used by anyone. But it’s not perfect.

YouTube doesn’t disclose numbers on the error rates because there is “quite a wide range” depending on the type of video, said research scientist Michiel Bacchiani, who heads up the speech recognition team for YouTube. “Things like one person speaking on a news broadcast will do quite well. If it’s a bunch of people yelling and running around with music in the background, it will do quite poorly,” he said in an interview with Digits.

Unfortunately, a lot of the video on YouTube consists of people running around and yelling, and even news videos will have some glaring errors. Sometimes the captions still just don’t make sense, and some YouTube contributers prevent them from showing up on their videos. But almost anything is helpful for someone who has trouble hearing, said Google’s “caption Jedi,” software engineer Ken Harrenstien, who is deaf. He added that YouTube released the product so it could be helpful in some circumstances, rather than waiting until it was perfect before providing it.

Using hundreds of hours worth of data, Google has been improving its system. The tool uses algorithms to account for different speech patterns. If it recognizes the word “dog,” for example, the algorithms will tell it the probability that the next word is “ran” and not “tan,” said Mike Cohen, the head of speech recognition at Google.

Over the past year, Google has added new vocabulary — things like “YouTube,” “iPad” and “smartphone.” And the tool now passes through the speech file multiple times to adapt to the quirks of speech within the video, like whether the speaker has an accent.

Google says people have watched video with automatic captions more than 23 million times.

But the biggest problem when it comes to accessibility is that it’s just not something that most people think about, said Mr. Harrenstien.

“In general I would say the problem is awareness,” he said in an interview with Digits. “In developing a service, people are not thinking about how helpful captions are to so many people.”

Google also has tried to make it easier for people who upload videos to include captions, since those aren’t prone to the errors of the automatic system. Captions require not only text, but information on timing so that the text can be displayed properly, and Google will take a plain text file and automatically generate that timing information for users.

The introduction of that feature into Google’s tools for developers prompted a surge in captions. Over the past year, the number of user-generated captions has tripled to more than 500,000 — but that’s still a small number compared with the 35 hours of video uploaded every minute to the site.

Google doesn’t provide statistics, but Mr. Harrenstien says he’s seen that users who provide captions get more hits because it makes the content easier to find via search. “In fact if your search hits the exact snippet, we allow you to jump directly to that place in the video. It saves a lot of time,” he said.