Hearby

ARCHITECTURE · OPEN

Where every
byte lives.

Last updated · 21 May 2026 · v1.0 · backend repo public soon

No black box. Here’s the full path of a whisper, end-to-end — from your ear back to your ear. Median round trip: ≤300ms.

MicAVAudio STTWhisper EdgeHaiku 4.5 TTSElevenLabs EarAirPods

Device → Edge → Ear · ≤300ms median round trip

The stack

Speech-to-text
Whisper-base · CoreML · int4 · on-device
Trigger detector
Phi-3-mini Q4 (fallback Llama-3.2-1B Q4) · on-device
Cue model
Anthropic Claude Haiku 4.5 · primary
Cue failover
OpenAI GPT-4o-mini · automatic on 5xx or >500ms
TTS
ElevenLabs Flash v2 · fallback AVSpeechSynthesizer
Backend
Cloudflare Workers (Hono) · Durable Objects per session · Neon Postgres for users/billing only
Auth
Clerk · passkey-first · JWT 15min · refresh silent
Billing
StoreKit 2 (iOS) · Stripe (web)
Enrichment
Brave Search · Apollo.io · request-scrubbed
Audit retention
90 days · metadata only · no quoted speech

1. On the device

The iOS app captures 16kHz PCM via AVAudioEngine, streams it through Whisper-base, and pipes transcript chunks into a local trigger detector. The trigger detector classifies the moment — silence, name, question, memory hit, tone shift — and emits a context blob with extracted facts. The trigger detector is also on-device. Transcripts and audio buffers stay in RAM and are dropped each chunk.

2. The cue request

When a trigger fires, the app sends a small JSON payload to our edge — about 400 bytes typical. Payload contains: the trigger type, extracted facts, the calendar event title (if opted in), and a session ID. It does not contain verbatim other-party speech.

3. The edge

A Cloudflare Worker receives the request, hydrates a Durable Object for the session (so we have 24h conversation memory available), and routes to the primary cue model. The whisper composer enforces the ≤7-word cap with a hard clamp before sending to TTS. We stream the TTS audio back to the device while the next trigger is already being decoded.

4. The privacy boundary

This boundary is the product. It’s defended by code, not by promises.

5. Geofencing

The GeofenceManager checks your coarse location against the canonical two-party-consent state list (CA, FL, IL, MD, MA, MT, NV, NH, PA, WA) and forces a mode selection at session start. Mid-session state changes are handled with a soft re-prompt. The list is hardcoded — it doesn’t change unless we cite a statute.

6. Failure modes

7. What we publish

The backend code goes open-source the day public TestFlight launches. Audit it. Verify the privacy claims yourself.

Technical questions: engineering@hearby.co

Read security