Voice

Voice is the primary interaction layer for Noema. It turns microphone input into dialogue, manages when a user has finished speaking, and routes assistant output back through speech.

Pipeline

microphone -> VAD -> ASR -> dialogue layer -> TTS -> playback

Responsibilities

Detect active speech with voice activity detection.
Convert speech to text with the configured ASR provider.
Decide when a turn is complete before sending it to the dialogue layer.
Interrupt playback when the user starts speaking again.
Synthesize final assistant replies through TTS.

Design Notes

The voice pipeline should stay independent from task execution. A user can speak naturally while a task is running, but task progress still belongs to the runtime.

Pipeline

Responsibilities

Design Notes

On this page