Google Inc.’s Cloud Speech Application Programming Interface has been updated with more features for developers who wish to integrate speech recognition capabilities into their Android applications.
The update increases the number of languages for which speech recognition is available, and also adds support for “long-form” audio clips. The idea is to provide more functionality and control for developers, Google’s product manager Dan Aharon said in a blog post Monday.
The Google Cloud Speech API, or application programming interface, is a machine learning-powered tool that developers can use to add capabilities such as voice and audio file transcription, voice-enabled commands and call center routing to the applications and services they build. Google said the API relies on deep learning algorithms to keep improving its speech recognition capabilities with repeated use. The speech recognition features can also be customized to particular settings or content by training the API with specific words and phrases used in those situations.
Monday’s update to the API extends its long-form audio capabilities from 80 minutes up to 180 minutes. In addition, the Cloud Speech API can now support files longer than three hours, but only on a case-by-case basis, Aharon said. Developers who want to take advantage of this need to apply for a quota extension through Google’s Cloud Support.
Google has also added word-level timestamps at the request of developers, Aharon said. The timestamps provide developers with the ability to jump to specific points of a transcript where a piece of text has been spoken. They can also be used to display relevant text while an audio clip is being played, helping users to dramatically reduce the time it takes to proofread transcripts.
Finally, Google added support for an additional 30 languages, which means the Cloud Speech API now supports 119 in total. The new languages include Bengali, Latvian and Swahili, covering almost a billion speakers across the world, Google said.
Google updates Cloud Speech API with support for long-form audio clips