Vibe Coding · Productivity · Voice Transcription

Noter

A browser-based speech-to-notecard app that turns live spoken words into editable, exportable notecards in real time — built entirely with browser-native APIs, no backend required.

RoleSolo Developer

TypeVibe Coding

StackWeb Speech API · jsPDF · html2canvas

GitHubView Repo →

Overview

Note-taking during a lecture is a split-attention problem.

When you're in a lecture or meeting, trying to type notes while listening means you're doing neither well. I wanted to build something that captured speech automatically so I could focus on understanding rather than transcribing — and that turned each natural pause in speech into a discrete, editable notecard rather than a wall of text.

The resulting app uses the Web Speech API for continuous transcription, organizes output into individual notecards per speech segment, tracks live session metrics, and lets you export the whole session as a formatted PDF. Everything runs client-side with no server, no account, and no data leaving your browser.

The Problem

Transcription tools exist. None of them think in notecards.

Most transcription tools produce a single continuous stream of text — useful as a record, but hard to review, edit, or turn into something shareable. I wanted the structure of notecards (discrete, skimmable, individually editable) with the capture speed of speech recognition.

📝

The structure problem

A wall of auto-transcribed text requires heavy post-processing to be useful. I needed something that automatically chunked speech into logical units as it happened, without requiring manual editing after the fact.

📤

The export problem

If notes can't leave the app, they're not really notes. The export needed to be clean, readable, and include metadata — timestamps, word counts, session duration — without requiring a backend to generate it.

Technical Architecture

Four APIs doing the work of a full backend.

The biggest constraint — and most interesting challenge — was achieving server-level features entirely in the browser. Transcription, PDF generation, canvas capture, and session persistence all needed to work without any external service.

🎙️

Web Speech API

Drives continuous real-time transcription. The onresult event fires as speech segments complete, which triggers automatic notecard creation. Managing interim vs final results was the trickiest part.

📄

jsPDF

Generates the PDF export entirely client-side. I structured the output with session metadata in the header, one notecard per block, and timestamps alongside each entry for easy review.

🖼️

html2canvas

Captures the rendered notecard layout as a canvas image before handing off to jsPDF — preserving formatting and visual structure in the exported document rather than just raw text.

💾

Client-Side State

All session data lives in memory during use. No localStorage dependency — each session is intentionally ephemeral unless exported, which felt right for the use case of lecture or meeting notes.

Features

Designed around one core flow: speak, review, export.

Live transcription to notecards — as you speak, each speech segment becomes its own card. Natural pauses define the boundaries, so cards map roughly to complete thoughts rather than arbitrary time slices.
Inline editing — every notecard is immediately editable after creation. Mis-transcriptions, filler words, or incomplete sentences can be fixed before export without interrupting the session.
Live session metrics — total word count, character count, and card count update continuously as you speak. Useful feedback for pacing in lectures and presentations.
PDF export via jsPDF + html2canvas — exports the full session as a formatted document with timestamps, word counts, and all notecard content. Designed to be shareable without further editing.
Session management — each session is timestamped from start. The entire session can be cleared and restarted without a page reload.

Challenges

Continuous streaming transcription is messier than it looks.

The Web Speech API fires two types of results — interim (still being processed) and final (committed). Interim results update rapidly and shouldn't create notecards; final results should. Getting that distinction right — and handling the edge cases where speech recognition restarts mid-sentence or returns a final result that's actually still partial — took significantly more work than the initial implementation.

The PDF export pipeline had its own challenges. html2canvas captures the DOM at a point in time, which means any late-rendering fonts or images can produce blank areas in the export. Preloading fonts and ensuring all content was fully rendered before triggering the capture solved it — but finding that was a few hours of debugging.

What I Learned

Designing for real-time input requires thinking differently.

Most UI design assumes the user is in control of input timing — they type, click, or tap when they're ready. With live speech transcription, input arrives continuously and unpredictably. Designing a UI that handles that gracefully — displaying interim results without flickering, creating cards at the right moments, staying visually calm while data is streaming — required a different mental model than standard form-based interfaces.

This project also gave me real experience with multi-library integration at the browser level. Coordinating jsPDF and html2canvas — libraries that each have their own async model — taught me to think carefully about execution order and render state in ways that pure vanilla projects don't surface.

Good real-time UX feels inevitable — like the interface already knew what you were going to say.

Web Speech API jsPDF html2canvas Real-Time UI Client-Side Architecture PDF Export Accessibility

Back to Case Studies

Mindful →

View Case Study