IBM Cloud Speech-to-Text: Two Implementations
Three years ago I implemented IBM Cloud Speech-to-Text into my web app LanguageTwo. It was the worst part of my app. The streaming WebSockets connection was unreliable at best. I’ve spent three weeks fixing it.
My project uses AngularJS and Firebase.
My first plan was to discontinue the WebSockets streaming and instead record each audiofile, save it to the database, send the file to the IBM Cloud for processing, and then send the response to the browser to display to the user.
I spent a week figuring out how to set up the browser to record from the microphone. It turns out that there’s an old way, using Navigator.getUserMedia()
, and a new way using MediaDevices.getUserMedia()
. There are tons of libraries that use the old way, badly. I had to dig through those before I found the new way. The new way is easy. First, in the template I created two buttons (Bootstrap) to start and stop the recording, and room for the results.
<div class="col-sm-2 col-md-2 col-lg-2" ng-show="nativeLanguage === 'en-US'">
<button type="button" class="btn btn-block btn-default" ng-click="startWatsonSpeechToText()" uib-tooltip="Wait for 'Listening'">Start pronunciation</button>
</div>
<div class="col-sm-2 col-md-2 col-lg-2" ng-show="nativeLanguage === 'en-US'">
<button type="button" class="btn btn-block btn-default" ng-click="stopWatsonSpeechToText()">Stop pronunciation</button>
</div><div class="col-sm-6 col-md-6 col-lg-6"…