IBM Cloud Speech-to-Text: Two Implementations

Thomas David Kehoe
10 min readApr 2, 2020

Three years ago I implemented IBM Cloud Speech-to-Text into my web app LanguageTwo. It was the worst part of my app. The streaming WebSockets connection was unreliable at best. I’ve spent three weeks fixing it.

My project uses AngularJS and Firebase.

My first plan was to discontinue the WebSockets streaming and instead record each audiofile, save it to the database, send the file to the IBM Cloud for processing, and then send the response to the browser to display to the user.

I spent a week figuring out how to set up the browser to record from the microphone. It turns out that there’s an old way, using Navigator.getUserMedia(), and a new way using MediaDevices.getUserMedia(). There are tons of libraries that use the old way, badly. I had to dig through those before I found the new way. The new way is easy. First, in the template I created two buttons (Bootstrap) to start and stop the recording, and room for the results.

<div class="col-sm-2 col-md-2 col-lg-2" ng-show="nativeLanguage === 'en-US'">
<button type="button" class="btn btn-block btn-default" ng-click="startWatsonSpeechToText()" uib-tooltip="Wait for 'Listening'">Start pronunciation</button>
</div>
<div class="col-sm-2 col-md-2 col-lg-2" ng-show="nativeLanguage === 'en-US'">
<button type="button" class="btn btn-block btn-default" ng-click="stopWatsonSpeechToText()">Stop pronunciation</button>
</div>
<div class="col-sm-6 col-md-6 col-lg-6"…

--

--

Thomas David Kehoe

I make technology for speech clinics to treat stuttering and other disorders. I like backpacking with my dog, competitive running, and Russian jokes.