Google on Monday is rolling out updates to the Cloud Speech API, introducing a set of features that were built to meet enterprise customers’ needs. The new features represent a maturity level for the product, initially built just for Google’s internal use.
“We’ve been working on speech for more than a decade, closer to 20 years… but primarily we’ve always been centered on making Google’s products better and creating a greater experience for Google users,” said Google Cloud product manager Dan Aharon. “Then last year things changed. We kicked the gears up a little in terms of cloud efforts, and we wanted to help third party companies take advantage of machine learning.”
Google has offered that value proposition across its Cloud Platform — customers can take advantage of the same cutting-edge technology powering Google’s own products.
So when the Cloud Speech API was released in beta last year, it represented the first phase in Google’s journey as a cloud vendor, in which it could take its own tools and offer them to other companies. “We’re now looking at what our cloud customers need and doing some R&D to support that and build better products,” Aharon said.
The first new feature is better long-form audio support. It now offers support for files up to three hours long, up from 80 minutes. Files longer than three hours can be supported on a case-by-case basis by applying for a quota extension.
Support for longer files, Aharon said, will support a range of use cases, such as analyzing calls between customer support agents and customers or video transcription services.
Google’s also adding word-level timestamps, the most requested feature. With word-level timestamps, users can jump to the exact spot in a file that they’re looking for. The feature is clearly useful for any kind of transcription service. Aharon said that after Google improved the quality of its Speech API, the most common critique from prospective customers on the fence about the product was, “They like the quality, but they’re held back because of transcripts don’t come up with timestamps.”
Google is also adding support for 30 additional language varieties, on top of the 89 it already supported. The API now covers languages including Bengali, Latvian and Swahili, covering more than one billion speakers.
“When you think about all of the large enterprises, they have a global presence and need to be in all of these markets,” Aharon said. “For a lot of these languages, it’s the first time they’re going to have capabilities in this space.”
It should also help Google win more customers in emerging markets. Even government representatives from around the globe have expressed interest to Google in seeing more languages supported, Aharon said, because “they see it as an important part of their economic evolution.”
So far, there are “many thousands” of customers using the Cloud Speech API, Aharon said, adding it’s seen consistently strongly growth over the last year. “If it continues growing at this pace, in two to three years it will be very significant,” he said.
[“Source-article”]