Get all your news in one place.
100’s of premium titles.
One app.
Start reading
The Hindu
The Hindu
National
Aroon Deep

YouTube launches Hindi auto-captioning feature after 13 years

YouTube has started rolling out automatic captions for Hindi videos, a much delayed expansion of its speech recognition-aided subtitles since the feature was launched in 2010. The automated subtitles could open up millions of Hindi language videos to viewers who are hearing impaired.

Hindi subtitles have been available on the platform on videos where creators have specifically chosen to add them; but YouTube hasn’t offered a convenient way to automatically caption Hindi videos. Since creators on YouTube have to pay for professionally created and timed subtitles, many do not commission them.

It is unclear when precisely Hindi captioning started becoming available. Transcription of Hindi has been available on Google Translate and other products by the search giant. But the inclusion of Hindi auto-captioning as a widely available feature for Hindi videos is a signal that enough data has now been gathered and processed on Hindi speech that Google feels it can offer enough accuracy on most videos in the language. By extension, that means that language data availability on Indian languages is expanding.  

Well before the generative Artificial Intelligence boom, firms like YouTube have been using voice recognition for accessibility purposes. But that’s easier said than done for languages that are not heavily represented online. “In the speech to text problem, you need a lot of speech in Hindi, and a corresponding correct transcript, which is fed to [AI] models that learn by looking at this data,” Mayuresh Nirhali, a senior executive at Reverie, which works on solving problems related to Indian languages on the Internet, said.

Developing AI-enabled services like speech recognition for Indian languages is particularly difficult due to several foundational challenges, including inconsistent encoding of text online, as well as regional variations in spelling and pronunciation, Mr. Nirhali said. Now that more data appears to be available — at least to big tech firms — the situation is improving. A YouTube spokesperson did not respond to queries on the launch of auto-captioning in Hindi.

Mimicking the style of closed captioning for television viewers in countries like the United States, where it is mandatory for the small screen, YouTube’s captions show up as blocks of words as and when they are spoken, with little punctuation. While captions for news broadcasts are generally created in real time for professional TV channels, AI-enabled speech recognition allows automatic captioning to be timed more precisely, allowing viewers to pick up pauses and other cues of speech.

But accuracy and quality issues linger. Even in auto-generated English captions, for which YouTube has been perfecting its technology for over a decade, mistakes are common, and many words are often mistranscribed. Hindi captions are no different, The Hindu found in some videos. Many lines that are not articulated by speakers, even in single-speaker contexts like stand-up comedy videos, are simply omitted, while other words are transcribed by similar-sounding words.

YouTube has by default censored offensive terms and swear words in automatic captioning for several years. When a prohibited word or term is used in English for instance, YouTube transcribes it as an underscore in square brackets ([_]). This does not seem to be the case in Hindi yet, and expletives show up as similar sounding words.

One major challenge for Hindi speech recognition has been the common use of English in everyday speech. For the moment, YouTube is simply ‘devanagarising’ English words in Hindi sentences, displaying them without switching to English script, while skipping over English-only sentences entirely. “The expectation for anybody building AI models [for speech and text] is that more colloquial and realistic ground root data is included, so that the model learns the nuances of mixing languages,” Mr. Nirhali said. 

“Languages are such, their spread is so wide that you’ll always have different translations or transcriptions than the models have understood,” he added. “There’s never a line you can draw and say, ‘that’s it, I’m done’.” 

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.