Speaches Setup
Speaches is a self-hosted, OpenAI-compatible speech-to-text service that Intervu uses for transcription.
What is Speaches?
Speaches wraps OpenAI's Whisper model with an OpenAI-compatible API, allowing you to run transcription locally without sending audio to external services.
- GitHub: speaches-ai/speaches
- License: MIT
- Models: Supports all HuggingFace Whisper models
Installation Methods
Option1: Docker Compose (Recommended)
The easiest way to run Speaches locally.
For CPU (no GPU)
bash
# Download compose files
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cpu.yaml
# Set environment
export COMPOSE_FILE=compose.cpu.yaml
# Start Speaches
docker compose up --detachFor NVIDIA GPU (CUDA)
bash
# Download compose files
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
# Set environment
export COMPOSE_FILE=compose.cuda.yaml
# Start Speaches (requires nvidia-docker)
docker compose up --detachVerify Installation
bash
# Check if running
docker ps
# Should show something like:
# CONTAINER ID IMAGE COMMAND
# abc123def456 ghcr.io/speaches-ai/speaches:... "..."Option 2: Docker Run
Alternative if you prefer a single command.
bash
# CPU only
docker run \
--rm \
--detach \
--publish 8000:8000 \
--name speaches \
--volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
ghcr.io/speaches-ai/speaches:latest-cpu
# With GPU
docker run \
--rm \
--detach \
--publish 8000:8000 \
--name speaches \
--volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
--gpus=all \
ghcr.io/speaches-ai/speaches:latest-cudaOption 3: Python (Advanced)
For development or custom configurations.
bash
# Clone and setup
git clone https://github.com/speaches-ai/speaches.git
cd speaches
# Install uv package manager if needed
pip install uv
# Setup and run
uv python install
uv venv
source .venv/bin/activate
uv sync
# Start server
uvicorn --factory --host 0.0.0.0 speaches.main:create_appDownload a Transcription Model
Speaches needs a Whisper model to transcribe audio.
Install speaches-cli
bash
# Using uvx (recommended)
uvx speaches-cli --help
# Or using pip
pip install speaches-cliList Available Models
bash
# Set base URL
export SPEACHES_BASE_URL="http://localhost:8000"
# List STT models
uvx speaches-cli registry ls --task automatic-speech-recognition | jq '.data | [].id'Download a Model
bash
# Recommended for English
uvx speaches-cli model download Systran/faster-distil-whisper-small.en
# Alternative: Larger model (better accuracy, slower)
uvx speaches-cli model download Systran/faster-whisper-small.en
# Alternative: Multilingual
uvx speaches-cli model download openai/whisper-smallVerify Model Installation
bash
# List installed models
uvx speaches-cli model ls --task automatic-speech-recognition
# Should show your downloaded modelConfiguration in Intervu
Once Speaches is running and a model is downloaded:
- Open Intervu Settings (gear icon)
- STT Endpoint:
http://localhost:8000/v1/audio/transcriptions - STT Model:
Systran/faster-distil-whisper-small.en - STT API Key: Leave empty (not required for local Speaches)
- Click Test STT to verify connection
Model Selection
faster-distil-whisper-small.en— Fast, English-only, good for real-timefaster-whisper-medium.en— Better accuracy, slowerwhisper-large-v3— Best accuracy, requires more VRAM
Troubleshooting
Speaches won't start
bash
# Check logs
docker logs speaches
# Common issues:
# - Port 8000 already in use → Stop conflicting service
# - GPU not detected → Ensure nvidia-docker is installedModel download fails
bash
# Manually download via HuggingFace
# Speaches will auto-detect cached models
export HF_HOME=/home/ubuntu/.cache/huggingface/hubTranscription is slow
- Use a smaller model (
faster-distil-whisper-small.en) - Enable GPU acceleration if available
- Check CPU/memory usage