Whisper Extension

Whisper is OpenAI's pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning. See more on Whisper's official page.

Whisper Extension for Switchboard allows you to effortlessly add speech-to-text capabilties in your applications.

Multiple Models Available: choose from vareity of Whisper models to suite your needs,whether you require higher accuracy with more extensive training data or a lightweight model optimized for faster performance and lower resource usage.
Wide Language Coverage: a diverse set of languages are supported
CUDA Acceleration: run Whisper inference on CUDA to harness power of NVIDIA GPUs for more efficient speech processing.
OpenVINO Acceleration: run Whisper inference on OpenVINO to harness power of Intel hardware (CPUs, GPUs and NPUs) more efficient speech processing (in beta).

The Whisper Extension provides the following audio nodes for a Switchboard SDK audio graph:

Node	Description
WhisperSTTNode	A sink node that uses Whisper models to predict text from audio sent to it.

Download SDK Extension

You can download this SDK extension from our Downloads Page.

Visit the page to access the latest version and start integrating it into your project!

Download SDK Extension​

Download SDK Extension