Hermes Agent TTS Providers: Which One to Pick

Ten Providers, One Question: Which Is Right for You?

Hermes Agent supports ten text-to-speech providers. That is more options than most AI agent frameworks offer, and it means you can dial in exactly the voice quality, cost, and privacy profile you need - from completely free and local to premium cloud-powered voices that are indistinguishable from a human speaker.

This post breaks down every supported provider, what it takes to set each one up, the quality trade-offs, and a clear recommendation for each type of use case.

The Full Provider List

Provider	Cost	API Key	Notes
Edge TTS	Free	No	Default, 400+ voices, solid quality
NeuTTS	Free	No	Fully local, supports voice cloning
Piper	Free	No	Lightweight offline engine
KittenTTS	Free	No	Lightweight local alternative
ElevenLabs	Paid	Yes	Best quality, voice cloning
OpenAI TTS	Paid	Yes	Fast, consistent, 6 voices
MiniMax	Paid	Yes	Strong Asian language support
Mistral Voxtral	Paid	Yes	Low latency focus
Google Gemini	Paid	Yes	Broad language coverage
xAI	Paid	Yes	Natural fit for Grok users

Four providers require no API key and no cost. Six require credentials and charge per character or per request. The right choice depends on whether you want zero friction, maximum quality, or something in between.

Free Providers

Edge TTS

Edge TTS is the default provider and the best starting point for most users. It uses Microsoft's neural speech synthesis infrastructure, requires no API key, no extra installation, and no cost.

Quality is genuinely good - noticeably better than old-school system TTS. It supports 400+ voices across dozens of languages, making it viable for multilingual setups without paying anything.

Configuration in ~/.hermes/config.yaml:

tts:
  provider: edge

That is it. If you have not configured TTS before, Hermes is already using Edge TTS by default.

NeuTTS

NeuTTS is the best free option for users who want fully local processing - nothing leaves your machine. It runs a neural TTS model locally via llama.cpp-style inference with GPU or CPU acceleration.

Setup requires a few extra steps compared to Edge TTS:

pip install neutts
sudo apt install espeak-ng   # Linux
brew install espeak-ng        # Mac

Then configure in config.yaml:

tts:
  provider: neutts
  model: neuphonic/neutts-air-q4-gguf
  device: cpu   # or cuda if you have a compatible GPU

Telegram users: NeuTTS outputs WAV files. Telegram requires Opus for voice bubbles. Hermes handles the conversion automatically if ffmpeg is installed:

sudo apt install ffmpeg   # Linux
brew install ffmpeg        # Mac

NeuTTS also supports voice cloning. Provide a short audio sample and its transcript:

tts:
  provider: neutts
  ref_audio: /path/to/your-voice-sample.wav
  ref_text: "This is the reference transcript for voice matching."

A terminal showing NeuTTS local inference running with a real-time audio waveform output alongside the model stats

Piper and KittenTTS

Piper is a fast, lightweight offline TTS engine originally developed for Home Assistant. KittenTTS is a newer local option with a similar philosophy. Both work without an internet connection after the initial model download and are good choices for resource-constrained environments or always-offline deployments.

Paid Providers

ElevenLabs

ElevenLabs produces the most natural-sounding voices available and has become the default choice for content creators who need audio that sounds human. If you are using your Hermes agent in customer-facing scenarios or producing audio content, ElevenLabs is the clear leader.

Setup:

pip install "hermes-agent[tts-premium]"

Add to ~/.hermes/.env:

ELEVENLABS_API_KEY=your_key_here

Configure in config.yaml:

tts:
  provider: elevenlabs
  voice_id: pNInz6obpgDQGcFmaJgB   # Adam (default)
  model_id: eleven_multilingual_v2

The voice_id is the main lever. ElevenLabs has hundreds of pre-built voices and supports cloning a custom voice from a short audio sample. Browse the voice library at elevenlabs.io and paste the ID into your config.

ElevenLabs produces Opus audio natively, which means no conversion step for Telegram voice bubbles - responses arrive faster compared to NeuTTS. Pricing is usage-based. For a personal agent with moderate traffic, the free tier (10,000 characters/month) is often enough.

OpenAI TTS

If you are already paying for OpenAI API access, TTS is a natural addition. OpenAI's six voices (alloy, echo, fable, onyx, nova, shimmer) are high quality, low latency, and consistent across languages.

Add to .env:

OPENAI_API_KEY=your_key_here

Configure:

tts:
  provider: openai
  voice: nova   # or alloy, echo, fable, onyx, shimmer

OpenAI TTS does not support voice cloning, but the base voices are reliable and the latency is excellent for real-time conversation use cases.

MiniMax, Mistral Voxtral, Google Gemini, xAI

These are the newer entries in Hermes's provider list, added as the ecosystem matured. MiniMax is particularly strong for Asian language TTS. Mistral Voxtral is optimized for low latency. Gemini benefits from Google's broad language coverage. xAI is the natural pick for users already in the Grok ecosystem.

Configuration follows the same pattern: set the provider name in config.yaml and add the corresponding API key to .env.

Which Provider Should You Choose?

Zero setup, zero cost - Edge TTS. Already configured, nothing to install.

Zero cost, local processing, privacy-first - NeuTTS with espeak-ng and ffmpeg.

Best voice quality, do not mind paying - ElevenLabs with a custom voice ID from the voice library.

Already on OpenAI's API - OpenAI TTS. Consistent and fast, reuses existing credentials.

Strong multilingual support including Asian languages - MiniMax.

Customer-facing or content creation use cases - ElevenLabs or OpenAI TTS. The quality difference over Edge TTS is clearly audible in these contexts.

Side-by-side comparison of TTS provider audio waveforms showing quality differences between Edge TTS, NeuTTS, and ElevenLabs

Switching Providers

Switching is a single line change in config.yaml. Update the provider field, add the relevant API key to .env if required, and restart the agent. You do not need to reprovision or reinstall anything.

Testing Your TTS Setup

From the Hermes CLI:

hermes
> /voice on
> Hello, this is a TTS test.

The agent will respond with spoken audio. If you hear nothing, confirm your system audio output is routed correctly and the provider's API key is present in .env.

Running Hermes Without the Configuration Overhead

Configuring TTS manually is straightforward, but it is still a setup step with platform-specific quirks - especially ffmpeg on Linux and Telegram Opus conversion. If you would rather skip it, Hermify comes with Edge TTS pre-configured and ready to use. You can switch to ElevenLabs through the dashboard settings - no SSH, no config files.