Speaches Setup

Speaches is a self-hosted, OpenAI-compatible speech-to-text service that Intervu uses for transcription.

What is Speaches?

Speaches wraps OpenAI's Whisper model with an OpenAI-compatible API, allowing you to run transcription locally without sending audio to external services.

GitHub: speaches-ai/speaches
License: MIT
Models: Supports all HuggingFace Whisper models

Installation Methods

Option1: Docker Compose (Recommended)

The easiest way to run Speaches locally.

For CPU (no GPU)

bash

# Download compose files
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cpu.yaml

# Set environment
export COMPOSE_FILE=compose.cpu.yaml

# Start Speaches
docker compose up --detach

For NVIDIA GPU (CUDA)

bash

# Download compose files
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml

# Set environment
export COMPOSE_FILE=compose.cuda.yaml

# Start Speaches (requires nvidia-docker)
docker compose up --detach

Verify Installation

bash

# Check if running
docker ps

# Should show something like:
# CONTAINER ID   IMAGE                              COMMAND
# abc123def456   ghcr.io/speaches-ai/speaches:...   "..."

Option 2: Docker Run

Alternative if you prefer a single command.

bash

# CPU only
docker run \
  --rm \
  --detach \
  --publish 8000:8000 \
  --name speaches \
  --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
  ghcr.io/speaches-ai/speaches:latest-cpu

# With GPU
docker run \
  --rm \
  --detach \
  --publish 8000:8000 \
  --name speaches \
  --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
  --gpus=all \
  ghcr.io/speaches-ai/speaches:latest-cuda

Option 3: Python (Advanced)

For development or custom configurations.

bash

# Clone and setup
git clone https://github.com/speaches-ai/speaches.git
cd speaches

# Install uv package manager if needed
pip install uv

# Setup and run
uv python install
uv venv
source .venv/bin/activate
uv sync

# Start server
uvicorn --factory --host 0.0.0.0 speaches.main:create_app

Download a Transcription Model

Speaches needs a Whisper model to transcribe audio.

Install speaches-cli

bash

# Using uvx (recommended)
uvx speaches-cli --help

# Or using pip
pip install speaches-cli

List Available Models

bash

# Set base URL
export SPEACHES_BASE_URL="http://localhost:8000"

# List STT models
uvx speaches-cli registry ls --task automatic-speech-recognition | jq '.data | [].id'

Download a Model

bash

# Recommended for English
uvx speaches-cli model download Systran/faster-distil-whisper-small.en

# Alternative: Larger model (better accuracy, slower)
uvx speaches-cli model download Systran/faster-whisper-small.en

# Alternative: Multilingual
uvx speaches-cli model download openai/whisper-small

Verify Model Installation

bash

# List installed models
uvx speaches-cli model ls --task automatic-speech-recognition

# Should show your downloaded model

Configuration in Intervu

Once Speaches is running and a model is downloaded:

Open Intervu Settings (gear icon)
STT Endpoint: http://localhost:8000/v1/audio/transcriptions
STT Model: Systran/faster-distil-whisper-small.en
STT API Key: Leave empty (not required for local Speaches)
Click Test STT to verify connection

Model Selection

faster-distil-whisper-small.en — Fast, English-only, good for real-time
faster-whisper-medium.en — Better accuracy, slower
whisper-large-v3 — Best accuracy, requires more VRAM

Troubleshooting

Speaches won't start

bash

# Check logs
docker logs speaches

# Common issues:
# - Port 8000 already in use → Stop conflicting service
# - GPU not detected → Ensure nvidia-docker is installed

Model download fails

bash

# Manually download via HuggingFace
# Speaches will auto-detect cached models
export HF_HOME=/home/ubuntu/.cache/huggingface/hub

Transcription is slow

Use a smaller model (faster-distil-whisper-small.en)
Enable GPU acceleration if available
Check CPU/memory usage

Speaches Setup ​

What is Speaches? ​

Installation Methods ​

Option1: Docker Compose (Recommended) ​

For CPU (no GPU) ​

For NVIDIA GPU (CUDA) ​

Verify Installation ​

Option 2: Docker Run ​

Option 3: Python (Advanced) ​

Download a Transcription Model ​

Install speaches-cli ​

List Available Models ​

Download a Model ​

Verify Model Installation ​

Configuration in Intervu ​

Troubleshooting ​

Speaches won't start ​

Model download fails ​

Transcription is slow ​

Next Steps ​

Speaches Setup

What is Speaches?

Installation Methods

Option1: Docker Compose (Recommended)

For CPU (no GPU)

For NVIDIA GPU (CUDA)

Verify Installation

Option 2: Docker Run

Option 3: Python (Advanced)

Download a Transcription Model

Install speaches-cli

List Available Models

Download a Model

Verify Model Installation

Configuration in Intervu

Troubleshooting

Speaches won't start

Model download fails

Transcription is slow

Next Steps