Whisper Extension

Whisper is OpenAI's pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning. See more on Whisper's official page.

Whisper Extension for Switchboard allows you to effortlessly add speech-to-text capabilties in your applications.

Multiple Models Available: choose from vareity of Whisper models to suite your needs,whether you require higher accuracy with more extensive training data or a lightweight model optimized for faster performance and lower resource usage.
Wide Language Coverage: a diverse set of languages are supported
CUDA Acceleration: run Whisper inference on CUDA to harness power of NVIDIA GPUs for more efficient speech processing.
OpenVINO Acceleration: run Whisper inference on OpenVINO to harness power of Intel hardware (CPUs, GPUs and NPUs) more efficient speech processing (in beta).

The Whisper Extension provides the following audio nodes for a Switchboard SDK audio graph:

Node	Description
WhisperSTTNode	A sink node that uses Whisper models to predict text from audio sent to it.