My Simple Speech To Text
August 24, 2025
The Zero-UI Transcription Stack: Native-feel on iOS/Mac, serverless-fast, nearly free
This project turns S3 into your “native” transcription workspace. Record on iPhone, tap Share → “Upload to S3,” and your text shows up moments later in the same Files view—on iOS and macOS—without opening an app. Under the hood: an S3 event triggers a Chalice-powered Lambda, Whisper transcribes, Gemini tidies to clean Markdown, the result is saved right back to S3, and your phone gets a push notification.
Why this is interesting
- Native UX, no app UI: The iOS Files app serves as the UI. Drag audio into
audio/
, read Markdown fromtranscriptions/
. Feels local; works everywhere. - Beautifully small serverless: Chalice makes Lambda ergonomic, so the whole pipeline stays compact and readable.
- High quality + low cost: Whisper-large-v3 for accuracy; Gemini 2.5 Pro for formatting; end-to-end cost ~ $0.01/min.
- Long-form resilient: Built for long voice notes that turn into essays.
How it works (end-to-end)
- Upload audio to S3
audio/
. - S3 event invokes Lambda (Chalice).
- Whisper (Groq) transcribes to text.
- Gemini formats to clean, paragraph-structured Markdown.
- Result saved to S3
transcriptions/filename.ext.txt
. - Phone notified via ntfy.sh; open in Files immediately.
The core flow in three steps (from the repo)
@app.on_s3_event("audio-to-transcribe1", events=["s3:ObjectCreated:*"], prefix="audio/")
def transcribe_audio(event):
"""
transcription = groq_client.audio.transcriptions.create(
file=(filename, file.read()),
model="whisper-large-v3",
response_format="verbose_json",
language="en",
temperature=0.0,
prompt="This is a audio recording. Please transcribe accurately with proper punctuation.",
)
formatted_text = generate_formatted_transcription(transcription.text)
input_key_without_prefix = event.key.replace("audio/", "")
output_key = f"transcriptions/{input_key_without_prefix}.txt"
# Save formatted transcription back to S3.
s3.put_object(
Bucket=event.bucket,
Key=output_key,
Body=formatted_text,
ContentType="text/plain",
)
What makes it different
- Files-as-API: Treat S3 like a shared folder synced to iOS/macOS. No custom app, no queue UI, no dashboards.
- Opinionated formatting: Gemini upgrades raw ASR into publishable Markdown—paragraph breaks, grammar, headings—ready to paste.
- Developer ergonomics:
chalice
for Lambda,uv
for fast Python envs. Minimal code, maximum leverage. - Observability-first: Clean logs, simple notifications (ntfy.sh), and a clear S3 structure for auditing.
Extending it
- Multi-language, diarization, or redaction.
- Per-prefix routing (e.g., different models per folder).
- Signed-link sharebacks or webhooks.
- Cost guardrails (duration caps) and retry/backoff policies.
Security notes
- Keep keys in environment variables.
- Restrict S3 prefixes and IAM permissions.
- Limit allowed input formats at the bucket level.
Style inspiration: Create a Django and React app with auto-generated Django types
- Key innovation: using S3 + Files as the UI, with Chalice to turn S3 events into a clean serverless pipeline.
- Interesting elements: native-feel UX, Whisper+Gemini two-stage quality, tiny operational surface area, and low per-minute cost.