Whisper Extension
Whisper is OpenAI's pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning. See more on Whisper's official page.
Whisper Extension for Switchboard allows you to effortlessly add speech-to-text capabilties in your applications.
- Multiple Models Available: choose from vareity of Whisper models to suite your needs,whether you require higher accuracy with more extensive training data or a lightweight model optimized for faster performance and lower resource usage.
- Wide Language Coverage: a diverse set of languages are supported
- CUDA Acceleration: run Whisper inference on CUDA to harness power of NVIDIA GPUs for more efficient speech processing.
- OpenVINO Acceleration: run Whisper inference on OpenVINO to harness power of Intel hardware (CPUs, GPUs and NPUs) more efficient speech processing (in beta).
The Whisper Extension provides the following audio nodes for a Switchboard SDK audio graph:
Node | Description |
---|---|
WhisperSTTNode | A sink node that uses Whisper models to predict text from audio sent to it. |
Download SDK Extension
You can download this SDK extension from our Downloads Page.
Visit the page to access the latest version and start integrating it into your project!