Real time high quality speech to text is now possible with OpenAI's WhisperCPP, a high-performance and open source automatic speech recognition model.
In his latest video on YouTube, Aaron demonstrates how to use his latest DragonOS image to transcribe audio from a radio voice channel that is received with an RTL-SDR. He makes use of SDR4Space as the command line receiver, WhisperCPP as the AI transcriber and Mosquitto for monitoring WhisperCPP outputs and displaying the text to the terminal.
Here's a short video showing exactly how to setup and run SDR4space in such a way that real time IQ captures are demodulated and feed to WhisperCPP (High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model) for transcribing.
The latest DragonOS FocalX R28 comes w/ everything needed to do exactly what I show in this video, to include a sample tiny model.
You'll noticed in the video that jobs are placed in a queue for continued captures and results are also sent over to Mosquitto MQTT where a client can see messages as they are created.
I chose to use an RTLSDR v3 dongle for the capture, but it's possible to configure SDR4space to use a variety of soapy supported SDRs.
In his first video Aaron shows how to get setup with the system on DragonOS. Shortly after uploading his first tutorial, Aaron noticed that recompiling WhisperCPP on the local system yielded a significant decrease in the processing time of the AI. After recompiling locally the transcribing then became near real time. In the second video Aaron briefly demonstrates the real time transcription.
DragonOS FocalX Capture and Transcribe IQ w/ SDR4space/WhisperCPP/Mosquitto (RTLSDR, OpenAI)
DragonOS FocalX Captured IQ to Text Faster w/ SDR4space/WhisperCPP/Mosquitto (RTLSDR)
In the past we posted a similar project that was based on the Amazon Transcribe cloud service. However WhisperCPP runs on a local machine, is open source and seems to be at least as good as Amazon Transcribe. So this appears to be a significant leap in transcribing ability and we could see it being used to automatically create text logs and alerts based on various radio channels.
DARPA (Defense Advanced Research Projects Agency) has recently released video from their Spectrum Collaboration Challenge Championship Event where team GatorWings took home a two million dollar prize. In the original DARPA grand challenge teams competed to produce an autonomous car that can get through an obstacle course. In this spectrum challenge DARPA poses the questions, what if there was no FCC to control the band plan, and how do we make more efficient use of a scarce spectrum?
Given those questions the goal is for software defined radios driven by artificial intelligence's created by each team to autonomously find ways to manage and share the spectrum all by themselves. The AI's are required to find ways to listen and learn the patterns of other AI SDRs using differing wireless standards all of which are competing for the same slice of spectrum at the same time. The competition asks the AI's to provide simulated wireless services (phone calls, data link, videos, images) during a simulation run with all the AI's running at once. Whichever AI is able to provide the most stable services and at the same time share the spectrum fairly with the other AI's wins.
On October 23, 2019, ten teams of finalists gathered to compete one last time in the Championship Event of DARPA's Spectrum Collaboration Challenge (SC2), a three-year competition designed to unlock the true potential of the radio frequency (RF) spectrum with artificial intelligence. DARPA held the Championship Event at Mobile World Congress 2019 Los Angeles in front of a live audience.
Team GatorWings from University of Florida took home the $2 million first prize, followed by MarmotE from Vanderbilt University in second with $1 million, and Zylinium, a start-up, in third with $750,000.
Throughout the competition, SC2 demonstrated how AI can help to meet spiking demand for spectrum. As program manager Paul Tilghman noted in his closing remarks from the SC2 stage: "Our competitors packed 3.5 times more wireless signals into the spectrum than we're capable of today. Our teams outperformed static allocations and demonstrated greater performance than current wireless standards like LTE. The paradigm of collaborative AI and wireless is here to stay and will propel us from spectrum scarcity to spectrum abundance."
The highlights video is shown below, and the full two hour competition stream can be viewed here.
Highlights from the Spectrum Collaboration Challenge Championship Event
The competition was run on the DARPA Colosseum, the worlds largest test bed for performing repeatable radio experiments. Capable of running up to 128 two channel software defined radios with 3 peta-ops of computing power it allows experimenters to accurately simulate real world RF environments. It works by connecting special "channel emulator" RF computing hardware to each physical SDR, which can emulate any RF environment.
As expected, the AIR-T is not a cheap with it coming in at US$5,699, and this is with a 10% discount off the MSRP. However, the AIR-T is likely to be more of interest to high end industry and university researchers who have research money to spend. Also, compared to Ettus E310/N310 and LimeNET Mini SDRs which have built in non-GPU based computing platforms and similar SDR performance, the AIR-T could be seen as reasonably priced assuming that the software and drivers for it are decent. In the future we expect to see the price of similar SDR-AI development boards eventually reduce down to hobbyist level prices.
The basic idea behind the AIR-T is to combine a 2x2 MIMO SDR transceiver with a NVIDIA Jetson TX2 GPU that can be used to run artificial intelligence (AI) software fast. They will include software that will allow GNU Radio and Python code to be easily ported to the GPU architecture.
Why build tomorrow’s tech with yesterday’s signal processing tools? The Artificial Intelligence Radio - Transceiver (AIR-T) is a fully integrated, single-board, artificial intelligence equipped, software defined radio platform with continuous frequency coverage from 300 MHz to 6 GHz. Designed for new engineers with little wireless experience to advanced engineers and researchers who develop low-cost AI, deep learning, and high-performance wireless systems, AIR-T combines the AD9371 RFIC transceiver providing up to 2 x 2 MIMO of 100 MHz of receiving bandwidth, 100 MHz of transmitting bandwidth in an open and reprogrammable Xilinx 7 FPGA, with fast USB 3.0 connectivity.
The AIR-T has custom and open Ubuntu software and custom FPGA blocks interfacing with GNU Radio, allowing you to immediately begin developing without having to make changes to existing code. With 256 NVIDIA cores, you can develop and deploy your AI application on hardware without having to code CUDA or VHDL. Freed from the limited compute power of a single CPU, with AIR-T, you can get right to work pushing your telecom, defense, or wireless systems to the limit of what’s possible.
The SDR transceiver chip used is a Analog Devices 9371. This is a high end chip that can be found on high end SDR hardware like USRPs. If you're interested we had a post about decapping the AD9361 recently, which is a similar chip. It provides 2x2 MIMO channels, with up to 100 MHz RX bandwidth and 250 MHz TX bandwidth. The NVDIA Jetson TX2 is a GPU 'supercomputer' module specifically designed for AI processing. Many AI/machine learning algorithms, such as neural networks and deep learning run significantly faster on GPU type processors when compared to more general CPU's.
These are not cheap chips with the AD9371 coming in at over US$250 each, and the Jetson TX2 coming in at US $467. Although we don't know what sort of bulk discounts the AIR-T manufactures could get. But it will be certain that the AIR-T will not be for the budget minded.
The board is still awaiting release of it's crowdfunding round, and you can sign up to be notified of when the project launches on their Crowd Supply page.
The melding of AI and the RF spectrum will be common in the future, and a development board like this is one of the first steps. Some of the interesting use cases that they present are pasted below:
From Wi-Fi to OpenBTS, use deep learning to maximize these applications. By pairing a GPU directly with an RF front-end it eliminates the need of having to purchase an additional computer or server for processing. Just power the AIR-T on and plug in a keyboard, mouse, and monitor and get started. Use GNURadio blocks to quickly develop and deploy your current or new wireless system. For those who need more control, talk directly with the drivers using Python or C+. And for those superusers out there, the AIR-T is an open-platform, so you can program the FPGA and GPU directly.
Communicating past Pluto is hard. With the power of a single-board SDR with an embedded GPU, the AIR-T can certainly prove out concepts before you launch them into space. Reduce development time and costs by adding deep learning to your satellite communication system.
There is an endless number of terrestrial communication systems with more being developed every day. As the spectral density becomes more congested, AI will be needed to maximize these resources. The AIR-T is well-positioned to easily and quickly help you prototype and deploy your wireless system.
The AIR-T allows you to demodulate a signal and apply deep learning to the image, video, or audio data in one integrated platform. For example, directly receiving a signal that contains audio and peforming speech recognition previously required multiple devices. The AIR-T integrates this into one easy to use package. Whatever your application is, from speech recognition to digital signal processing, the integrated NVIDIA GPU will jump start your applications.
For many communications and radar applications once the signal is collected it must be sent to an off-board computer for additional processing and storage. This consumes valuable time. The AIR-T eliminates this. From its inception, it was designed to process signals in real-time and eliminate unnecessary latency.