Voice

How Noema handles speech input, turn boundaries, interruption, and playback.

Voice is the primary interaction layer for Noema. It turns microphone input into dialogue, manages when a user has finished speaking, and routes assistant output back through speech.

Pipeline

microphone -> VAD -> ASR -> dialogue layer -> TTS -> playback

Responsibilities

  • Detect active speech with voice activity detection.
  • Convert speech to text with the configured ASR provider.
  • Decide when a turn is complete before sending it to the dialogue layer.
  • Interrupt playback when the user starts speaking again.
  • Synthesize final assistant replies through TTS.

Design Notes

The voice pipeline should stay independent from task execution. A user can speak naturally while a task is running, but task progress still belongs to the runtime.

On this page