AgentConn
V

Voicebox

Creative Free

About Voicebox

Voicebox is a local-first, open-source AI voice studio positioned as a free alternative to ElevenLabs and WisprFlow. Built as a Tauri (Rust) desktop app for macOS, Windows, Linux, and Docker, it processes everything offline and exposes a local REST API for pipeline integration. Clone a voice from a few seconds of audio, generate speech across 23 languages via 7 TTS engines, dictate into any text field with a global hotkey, or let MCP-aware agents speak back through an on-screen pill in a cloned voice. 31.9k GitHub stars, MIT licensed.

Key Features

  • Voice cloning from a few seconds of audio — create custom voice profiles locally
  • Text-to-speech generation across 23 languages with 7 TTS engine backends
  • Global hotkey dictation — speak into any text field on your system
  • MCP agent voice integration — Claude Code, Cursor, and other agents speak through an on-screen pill
  • Local REST API for programmatic speech generation in pipelines
  • Fully offline processing — no data leaves your machine
  • Cross-platform Tauri desktop app — macOS, Windows, Linux, Docker
  • MIT licensed, open source

Overview

Voicebox gives developers and agents a local, private voice layer. Where ElevenLabs requires cloud API calls and per-character billing, Voicebox runs entirely on your machine — cloning voices, synthesizing speech, and exposing a REST API that any local process can call. The Tauri (Rust) desktop app keeps the footprint small while supporting macOS, Windows, Linux, and Docker deployments.

The project has grown to 31.9k GitHub stars as developers adopt it for two overlapping use cases: personal productivity (dictation, voice notes) and agent infrastructure (giving AI agents audible output). The MCP integration is what sets it apart in the agent ecosystem — MCP-aware tools like Claude Code and Cursor can speak responses aloud in a cloned voice through Voicebox’s on-screen pill interface.

Key Capabilities

Voice cloning: Record or upload a few seconds of audio, and Voicebox creates a voice profile that can be used for all subsequent speech generation. Clone your own voice for dictation, or create character voices for content production. All processing happens locally.

Multi-engine TTS: Voicebox supports 7 text-to-speech backends, giving you flexibility between quality, speed, and language coverage. 23 languages are supported out of the box, with engine-specific strengths for different language families.

System-wide dictation: A global hotkey activates voice-to-text input into whatever application has focus. This works across the entire OS — editors, browsers, terminals, chat apps — without per-app plugins.

MCP agent voice: The standout feature for the agent ecosystem. MCP-aware agents can call Voicebox to speak responses aloud. An on-screen pill shows when the agent is speaking, and the voice is configurable — use a cloned voice or one of the built-in options. This turns silent text-based agents into conversational assistants.

Local REST API: Every capability is exposed through a REST API running on localhost. Integrate speech generation into scripts, CI pipelines, notification systems, or custom agent frameworks without touching the desktop UI.

Use Cases

Agent voice output: Give Claude Code, Cursor, or custom MCP agents an audible voice. Useful for accessibility, hands-free workflows, and making agent interactions feel more natural.

Content production: Generate voiceovers for videos, podcasts, or tutorials using cloned voices. Pair with HyperFrames or other video tools for fully automated content pipelines.

Developer productivity: Dictate code comments, commit messages, documentation, or chat responses without switching to a text input. The global hotkey works everywhere.

Who It’s For

Developers who want local, private voice synthesis without cloud dependencies. Agent builders adding voice output to MCP-compatible tools. Content creators who need voice generation without per-character API costs. Anyone who wants ElevenLabs-class voice capabilities running entirely on their own hardware.

Similar Agents