Introducing SlothSpeak: An open-source, bring-your-own-API-keys, mobile app for voice chat with LLMs that prioritizes response quality over latency.APK file available on GitHub in the releases. Currently only for Android. Is anyone interested in porting to iP…
Introducing SlothSpeak: An open-source, bring-your-own-keys, mobile app for voice chat with LLMs that prioritizes response quality over latency. Other voice apps use lighter models optimized for low latency. SlothSpeak makes the opposite trade-off: it uses the most capable AI models at maximum reasoning effort, accepts that responses may take minutes, and delivers the answer as natural-sounding speech.
The result is voice-driven access to the highest-quality AI reasoning available, without touching a keyboard. There is no server backend. The app calls AI provider APIs directly from the device. Multi-provider AI — Four providers, seven models. Switch between OpenAI GPT-5.2, Gemini 3.1 Pro, Claude Opus 4.6, and Grok 4. GPT-5.2 Pro, Gemini, and Claude support configurable reasoning effort/thinking level.
Multi-turn conversations — Ask follow-up questions with full context. OpenAI and Grok chain via previous_response_id; Gemini and Claude send explicit conversation history. Interactive voice mode — Hands-free continuous conversation using on-device voice activity detection (Silero VAD). After an answer plays back, the app asks "Do you have a follow-up?" and listens for the next question. Say "no", "I'm done", or stay silent to end the conversation.
Deep Research — OpenAI and Gemini deep research models perform extended async research with polling, delivering thorough answers backed by web sources. Web search and X search — Toggle web search for grounded, up-to-date answers (provider-dependent). Grok additionally supports X (Twitter) search. Citations — Responses with citations display numbered footnotes and a sources list. Clean text (without citation markers) is sent to TTS for natural reading.
13 TTS voices with audio preview — Choose from alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse, marin, or cedar. Preview each voice in Settings before selecting. Customizable TTS reading style — Edit the reading style instructions sent to the TTS model (e.g., tone, pacing, emphasis). Editable system prompts — Customize the standard Q&A and deep research system prompts in Settings.
Playback controls — Pause/resume, rewind 20s, forward 30s, seek bar across chunks, adjustable speed (0.5x–2.0x). Ping and pause — Optional notification sound and 3-second pause before answer playback begins. Audio focus management — Coexists with other audio apps. Other audio (e.g., Spotify) pauses during recording and playback, then resumes during the thinking/processing phase. Conversation history — Browse past conversations, replay audio, mark favorites, delete conversations, or delete audio only to reclaim storage.
Retry from failure — If any pipeline step fails, retry resumes from that step without re-running earlier steps. Mute mode — Mute the current answer to skip playback. The pipeline completes fully (transcription, LLM, TTS, DB save) but audio is not played. The conversation is still saved to history. Bluetooth audio — Automatically routes microphone input through a connected Bluetooth headset for recording.
Supports both classic SCO and BLE devices. Media session controls — Headset media buttons and notification transport controls (play, pause, stop) work during answer playback. Background processing — The entire pipeline runs inside an Android foreground service with a wake lock, so it continues even if the user navigates away. Encrypted API key storage — API keys for all four providers are stored with AES-256-GCM encryption via AndroidX Security-Crypto.
No intermediate server — The app does not phone home, collect analytics, or transmit telemetry. All API calls go directly from your device to the provider. Your questions and answers are subject to each provider's own data and privacy policies. Each provider requires its own API key, configured in Settings. All providers use OpenAI for STT and TTS, so an OpenAI API key is always required. An optional hands-free conversational mode that enables continuous multi-turn conversations without touching the screen.
When enabled, the app uses Silero VAD (Voice Activity Detection) to automatically detect when the user has finished speaking. After the answer plays back, the app asks "Do you have a follow-up?" and listens for the next question. The conversation ends when the user stays silent, says a dismissal phrase ("no", "I'm done", "that's all", etc.), or manually taps "Stop Conversation". Separate API clients per provider — Each AI provider has a different API shape.
Each client has its own Retrofit instance with tailored timeouts (LLM: 1500s for deep thinking, STT: 120s, TTS: 180s). Foreground service — The pipeline can take 5+ minutes end-to-end. A foreground service with a persistent notification ensures Android doesn't kill the process. The service holds a partial wake lock (65-minute timeout) to prevent CPU sleep during long LLM calls. Parallel TTS — TTS chunks are independent, so they're sent in parallel (up to 3 concurrent requests via a Semaphore).
Chunks are played back in order regardless of which finishes first. VAD-based end-of-speech detection — Interactive voice mode uses Silero VAD running on-device. The VadAudioRecorder captures raw 16-bit PCM at 16kHz via AudioRecord, feeding frames to the VAD model and accumulating WAV output. Uses 512-sample frames (32ms) in Mode.NORMAL. Audio focus lifecycle — Recording acquires exclusive transient focus.
Focus is released during processing so music can resume. Playback acquires transient focus again. In interactive mode, focus is held through the follow-up prompt and mic listening. Sentence-boundary text chunking — The TTS API has a 4096-character limit. The TextChunker walks backward from the limit to find sentence-ending punctuation, preventing mid-sentence cuts. Follow-ups via conversation chaining — OpenAI and Grok pass previous_response_id.
Gemini and Claude send explicit conversation history. Requires Android Studio with SDK 35. Open the project, let Gradle sync, and run on a connected device or use ./gradlew assembleDebug.
Summary
This report covers the latest developments in iphone. The information presented highlights key changes and updates that are relevant to those following this topic.
Original Source: Github.com | Author: SteveVeilStream | Published: February 24, 2026, 12:14 pm


Leave a Reply
You must be logged in to post a comment.