Your agent supports voice input and output, making it easy to communicate hands-free or multitask while interacting.
Voice transcription
Transcribe spoken audio to text using OpenAI’s Whisper model. Press the mic button in the chat to record audio, and your agent will transcribe it automatically and write it in the chat.
You can also drop an audio file (like a meeting recording) into the chat and ask your agent to transcribe it.
Example use cases:
- Talk instead of typing to your agent
- Drop a meeting recording file and ask the agent to transcribe it
- Dictate notes or instructions while on the go
Text to speech
Convert text to natural-sounding speech. Press the speak button under any chat message to hear it read aloud.
Speech synthesis uses either OpenAI or ElevenLabs voices, depending on your configuration. See the ElevenLabs integration for more voice options.
Example use cases:
- Listen to a long response while making coffee or taking a walk
- Have your agent read out a summary or report when you don’t feel like reading
- Accessibility for users who prefer audio
Walk and talk mode
Have a mostly hands-free conversation with your agent using walk and talk mode. Click Walk & Talk mode in the chat input to get started.
How it works:
- Press the rec button and start talking. Take your time — the agent will not interrupt you.
- Press the stop button when you are done talking. Keep walking while the agent transcribes your speech, generates a response, and creates a voice reply.
- Press play to hear the voice response. You can pause and replay as needed.
Walk and talk mode is perfect for:
- Brainstorming while on a walk
- Catching up on tasks during your commute
- Hands-free interaction when you can’t type
Walk and talk mode works best with headphones or earbuds for a seamless experience.