Voice
How Noema handles speech input, turn boundaries, interruption, and playback.
Voice is the primary interaction layer for Noema. It turns microphone input into dialogue, manages when a user has finished speaking, and routes assistant output back through speech.
Pipeline
microphone -> VAD -> ASR -> dialogue layer -> TTS -> playbackResponsibilities
- Detect active speech with voice activity detection.
- Convert speech to text with the configured ASR provider.
- Decide when a turn is complete before sending it to the dialogue layer.
- Interrupt playback when the user starts speaking again.
- Synthesize final assistant replies through TTS.
Design Notes
The voice pipeline should stay independent from task execution. A user can speak naturally while a task is running, but task progress still belongs to the runtime.
