Skip to content

Speaches Setup

Speaches is a self-hosted, OpenAI-compatible speech-to-text service that Intervu uses for transcription.

What is Speaches?

Speaches wraps OpenAI's Whisper model with an OpenAI-compatible API, allowing you to run transcription locally without sending audio to external services.


Installation Methods

The easiest way to run Speaches locally.

For CPU (no GPU)

bash
# Download compose files
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cpu.yaml

# Set environment
export COMPOSE_FILE=compose.cpu.yaml

# Start Speaches
docker compose up --detach

For NVIDIA GPU (CUDA)

bash
# Download compose files
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml

# Set environment
export COMPOSE_FILE=compose.cuda.yaml

# Start Speaches (requires nvidia-docker)
docker compose up --detach

Verify Installation

bash
# Check if running
docker ps

# Should show something like:
# CONTAINER ID   IMAGE                              COMMAND
# abc123def456   ghcr.io/speaches-ai/speaches:...   "..."

Option 2: Docker Run

Alternative if you prefer a single command.

bash
# CPU only
docker run \
  --rm \
  --detach \
  --publish 8000:8000 \
  --name speaches \
  --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
  ghcr.io/speaches-ai/speaches:latest-cpu

# With GPU
docker run \
  --rm \
  --detach \
  --publish 8000:8000 \
  --name speaches \
  --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
  --gpus=all \
  ghcr.io/speaches-ai/speaches:latest-cuda

Option 3: Python (Advanced)

For development or custom configurations.

bash
# Clone and setup
git clone https://github.com/speaches-ai/speaches.git
cd speaches

# Install uv package manager if needed
pip install uv

# Setup and run
uv python install
uv venv
source .venv/bin/activate
uv sync

# Start server
uvicorn --factory --host 0.0.0.0 speaches.main:create_app

Download a Transcription Model

Speaches needs a Whisper model to transcribe audio.

Install speaches-cli

bash
# Using uvx (recommended)
uvx speaches-cli --help

# Or using pip
pip install speaches-cli

List Available Models

bash
# Set base URL
export SPEACHES_BASE_URL="http://localhost:8000"

# List STT models
uvx speaches-cli registry ls --task automatic-speech-recognition | jq '.data | [].id'

Download a Model

bash
# Recommended for English
uvx speaches-cli model download Systran/faster-distil-whisper-small.en

# Alternative: Larger model (better accuracy, slower)
uvx speaches-cli model download Systran/faster-whisper-small.en

# Alternative: Multilingual
uvx speaches-cli model download openai/whisper-small

Verify Model Installation

bash
# List installed models
uvx speaches-cli model ls --task automatic-speech-recognition

# Should show your downloaded model

Configuration in Intervu

Once Speaches is running and a model is downloaded:

  1. Open Intervu Settings (gear icon)
  2. STT Endpoint: http://localhost:8000/v1/audio/transcriptions
  3. STT Model: Systran/faster-distil-whisper-small.en
  4. STT API Key: Leave empty (not required for local Speaches)
  5. Click Test STT to verify connection

Model Selection

  • faster-distil-whisper-small.en — Fast, English-only, good for real-time
  • faster-whisper-medium.en — Better accuracy, slower
  • whisper-large-v3 — Best accuracy, requires more VRAM

Troubleshooting

Speaches won't start

bash
# Check logs
docker logs speaches

# Common issues:
# - Port 8000 already in use → Stop conflicting service
# - GPU not detected → Ensure nvidia-docker is installed

Model download fails

bash
# Manually download via HuggingFace
# Speaches will auto-detect cached models
export HF_HOME=/home/ubuntu/.cache/huggingface/hub

Transcription is slow

  • Use a smaller model (faster-distil-whisper-small.en)
  • Enable GPU acceleration if available
  • Check CPU/memory usage

Next Steps

Made with ❤️by Aldrick Bonaobra