Real time high quality speech to text is now possible with OpenAI's WhisperCPP, a high-performance and open source automatic speech recognition model.
In his latest video on YouTube, Aaron demonstrates how to use his latest DragonOS image to transcribe audio from a radio voice channel that is received with an RTL-SDR. He makes use of SDR4Space as the command line receiver, WhisperCPP as the AI transcriber and Mosquitto for monitoring WhisperCPP outputs and displaying the text to the terminal.
Here's a short video showing exactly how to setup and run SDR4space in such a way that real time IQ captures are demodulated and feed to WhisperCPP (High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model) for transcribing.
The latest DragonOS FocalX R28 comes w/ everything needed to do exactly what I show in this video, to include a sample tiny model.
You'll noticed in the video that jobs are placed in a queue for continued captures and results are also sent over to Mosquitto MQTT where a client can see messages as they are created.
I chose to use an RTLSDR v3 dongle for the capture, but it's possible to configure SDR4space to use a variety of soapy supported SDRs.
In his first video Aaron shows how to get setup with the system on DragonOS. Shortly after uploading his first tutorial, Aaron noticed that recompiling WhisperCPP on the local system yielded a significant decrease in the processing time of the AI. After recompiling locally the transcribing then became near real time. In the second video Aaron briefly demonstrates the real time transcription.
In the past we posted a similar project that was based on the Amazon Transcribe cloud service. However WhisperCPP runs on a local machine, is open source and seems to be at least as good as Amazon Transcribe. So this appears to be a significant leap in transcribing ability and we could see it being used to automatically create text logs and alerts based on various radio channels.