Vibe Coding · Productivity · Voice Transcription
Noter
A browser-based speech-to-notecard app that turns live spoken words into editable, exportable notecards in real time — built entirely with browser-native APIs, no backend required.
Overview
Note-taking during a lecture is a split-attention problem.
When you're in a lecture or meeting, trying to type notes while listening means you're doing neither well. I wanted to build something that captured speech automatically so I could focus on understanding rather than transcribing — and that turned each natural pause in speech into a discrete, editable notecard rather than a wall of text.
The resulting app uses the Web Speech API for continuous transcription, organizes output into individual notecards per speech segment, tracks live session metrics, and lets you export the whole session as a formatted PDF. Everything runs client-side with no server, no account, and no data leaving your browser.
The Problem
Transcription tools exist. None of them think in notecards.
Most transcription tools produce a single continuous stream of text — useful as a record, but hard to review, edit, or turn into something shareable. I wanted the structure of notecards (discrete, skimmable, individually editable) with the capture speed of speech recognition.
The structure problem
A wall of auto-transcribed text requires heavy post-processing to be useful. I needed something that automatically chunked speech into logical units as it happened, without requiring manual editing after the fact.
The export problem
If notes can't leave the app, they're not really notes. The export needed to be clean, readable, and include metadata — timestamps, word counts, session duration — without requiring a backend to generate it.
Technical Architecture
Four APIs doing the work of a full backend.
The biggest constraint — and most interesting challenge — was achieving server-level features entirely in the browser. Transcription, PDF generation, canvas capture, and session persistence all needed to work without any external service.
onresult
event fires as speech segments complete, which triggers
automatic notecard creation. Managing interim vs final results
was the trickiest part.
Features
Designed around one core flow: speak, review, export.
- Live transcription to notecards — as you speak, each speech segment becomes its own card. Natural pauses define the boundaries, so cards map roughly to complete thoughts rather than arbitrary time slices.
- Inline editing — every notecard is immediately editable after creation. Mis-transcriptions, filler words, or incomplete sentences can be fixed before export without interrupting the session.
- Live session metrics — total word count, character count, and card count update continuously as you speak. Useful feedback for pacing in lectures and presentations.
- PDF export via jsPDF + html2canvas — exports the full session as a formatted document with timestamps, word counts, and all notecard content. Designed to be shareable without further editing.
- Session management — each session is timestamped from start. The entire session can be cleared and restarted without a page reload.
Challenges
Continuous streaming transcription is messier than it looks.
The Web Speech API fires two types of results — interim (still being processed) and final (committed). Interim results update rapidly and shouldn't create notecards; final results should. Getting that distinction right — and handling the edge cases where speech recognition restarts mid-sentence or returns a final result that's actually still partial — took significantly more work than the initial implementation.
The PDF export pipeline had its own challenges.
html2canvas
captures the DOM at a point in time, which means any late-rendering
fonts or images can produce blank areas in the export. Preloading
fonts and ensuring all content was fully rendered before triggering
the capture solved it — but finding that was a few hours of
debugging.
What I Learned
Designing for real-time input requires thinking differently.
Most UI design assumes the user is in control of input timing — they type, click, or tap when they're ready. With live speech transcription, input arrives continuously and unpredictably. Designing a UI that handles that gracefully — displaying interim results without flickering, creating cards at the right moments, staying visually calm while data is streaming — required a different mental model than standard form-based interfaces.
This project also gave me real experience with multi-library integration at the browser level. Coordinating jsPDF and html2canvas — libraries that each have their own async model — taught me to think carefully about execution order and render state in ways that pure vanilla projects don't surface.