

Language Model customization with corpora text files - You can reuse your existing STT corpora text files as-is with no change.Here are the key features to consider for the next-generation STT models:
#Ibm speech to text update#
We are continuously rolling out new features and will update the link as more features become available. So it is important that you review the most up-to-date list of supported features for the next-generation STT models here. While the next-generation Speech models have all the commonly used features, they are not at complete feature parity with the previous-generation STT models. Identify the features and parameters you use So if you are using “en-US_NarrowbandModel”, your matching next generation model would be “en-US_Telephony”. If it has “Broadband”, the matching model name has “Multimedia”. If your model name has “Narrowband” in it, the matching next-generation model name has “Telephony” in it. We slightly changed the model names to illustrate this usage better. Narrowband models are used for telephony use cases while the Broadband models are best for multimedia use cases.

The previous-generation STT model names listed here, contain either the word “Narrowband” or “Broadband” for each language. This allows you to specify on either a per-word basis or as a whole, the maximum number of alternatives Watson has for the conversation.Photo by Evgeni Tcherkasski on Unsplash Identify the base model you currently use Voice and mail Watson wasn’t sure of max_alternatives Per word confidence allows you to see a per word confidence breakdown, so you can mark unknown words in the final output with question marks or similar to denote if it’s not confident it has transcribed correctly. Watson has support for US and GB variants of speech recognition, wideband, narrowband and adaptive rate bitrates. Luckily it has wide ranging WAV support, something GCP doesn’t, as well as FLAC, G.729, mpg, mp3, webm and ogg. Unfortunately Watson, like GCP, only has support for MULAW (μ-law compounding) and not PCMA as used outside the US. One useful use case is searching through a call recording transcript, and then jumping to that timestamp in the audio.įor example in a long conference call recording you might be interested in when people talked about “Item X”, you can search the call recording for “Item” “X” and find it’s at 1:23:45 and then jump to that point in the call recording audio file, saving yourself an hour and bit of listening to a conference call recording. This reads poorly in CURL but when used with speaker_labels allows you to see the time and correlate it with a recording. Timestamps timestamp each word based on the start of the audio file, This makes the transcription read more like a script with “Speaker 1: Hello other person” “Speaker 2: Hello there Speaker 1”, makes skimming through much easier. Speaker labels enable you to identify each speaker in a multi-party call. “transcript”: “hi Nick this is Nick leaving Nick a test voice mail “ Common Transcription Options speaker_labels=true I’ve got an Asterisk instance that manages Voicemail, so let’s fire the messages to Watson and get it to transcribe the deposited messages: curl -X POST -u "apikey:yourapikey" -header "Content-Type: audio/wav" -data-binary "" “confidence”: 0.831, Once you’ve grabbed your API key we can start transcribing. Select “Speech to Text” and you can view / copy your API key from the Credentials header. The first thing you’re going to need are credentials. Input formats support PCM coded data, so you can pipe PCMA/PCMU (Aka G.711 µ-law 7 a-law) audio straight to it. Sadly, Watson doesn’t have Australian language models out of the box (+1 point to Google which does), but you can add Custom Language Models & train it.

IBM’s offering is a bit more flexible than the Google offering, and allows long transcription (>1 minutes) without uploading the files to external storage. The last time I’d played with Speech Recognition on Voice Platforms was in 2012, and it’s amazing to see how far the technology has evolved with the help of AI. I’ve been using IBM’s Watson’s Speech to Text engine for transcribing call audio, some possible use cases are speech driven IVRs, Voicemail to Email transcription, or making Call Recordings text-searchable.
