We present Transkrip, an open-source cross-platform desktop application for local, privacy-first transcription of audio and video files. Transkrip combines a modern Electron-based shell with a React 19 / TypeScript 6 renderer and integrates whisper.cpp as its on-device automatic speech recognition (ASR) engine. By performing every stage of the pipeline — file ingestion, format conversion, speech recognition, and document export — entirely on the user's own machine, Transkrip eliminates the privacy, cost, and connectivity concerns that typically accompany cloud-based transcription services. The application supports common audio and video containers (MP3, WAV, M4A, MP4, MOV), provides a persistent local history backed by SQLite via better-sqlite3, and exports results to .txt, .docx, and .pdf. We describe the motivation, system architecture, technology choices, and practical deployment considerations that enable Transkrip to deliver a native desktop experience on macOS, Windows, and Linux while retaining a strict no-cloud guarantee.
Modern automatic speech recognition systems, driven by large transformer-based models, have achieved near human-level accuracy across dozens of languages. Yet the vast majority of production transcription workflows remain tied to cloud APIs, requiring users to upload potentially sensitive audio to remote servers in exchange for convenience. For journalists protecting sources, researchers handling interview recordings, medical or legal professionals working with confidential material, and enterprises with data-residency obligations, cloud-based transcription is often unacceptable.
Transkrip addresses this gap with a polished, cross-platform desktop application that performs every stage of transcription locally. Built on Electron 41 with a React 19 and Tailwind CSS 4 interface, Transkrip wraps the highly optimized whisper.cpp engine into a friendly user experience that requires no cloud accounts, no API keys, and — after an initial model download — no network connectivity at all.
The primary contributions of this work are:
better-sqlite3), allowing users to browse and re-export previously transcribed sessions..txt, .docx, and .pdf documents without any server-side rendering.
OpenAI's Whisper (Radford et al., 2022) established a robust multilingual ASR baseline by training on 680,000 hours of weakly supervised audio. Several derivative projects have optimized Whisper for local inference: faster-whisper (Guillaumie, 2023) rebuilds the model on top of CTranslate2 for efficient CPU execution, mlx-whisper (Apple MLX Team, 2024) exploits Apple Silicon GPUs via the MLX framework, and whisper.cpp (Gerganov, 2022) provides a dependency-light C/C++ port that runs efficiently across desktop platforms using integer quantization and SIMD acceleration.
Prior offline transcription tools either target power users through command-line interfaces or ship as heavy Python applications with complex dependency chains. Transkrip differs by providing a first-class native desktop UX on top of whisper.cpp, coupled with a modern web technology renderer — trading the Python ecosystem's flexibility for smaller footprint, faster startup, and easier installation for end users.
Transkrip adopts the standard Electron two-process model. The renderer process hosts the React application and handles all user interactions, while the main process manages file system access, spawns the ASR subprocess, and owns the SQLite database. A preload script exposes a narrow, typed IPC surface between the two, following the principle of least privilege.
| Process | Responsibilities | Key Modules |
|---|---|---|
| Renderer (React 19 / TS) | UI, drag-and-drop, settings, history view, progress updates | UploadZone, Settings, HistoryList |
| Preload | Typed IPC bridge, sandboxed API exposure | preload.ts |
| Main (Electron 41) | File I/O, subprocess management, DB, lifecycle | main.ts, whisper.ts, database.ts |
Stage 1 — Ingestion: Users drop audio or video files into the upload zone. Supported containers include MP3, WAV, M4A, MP4, and MOV. The renderer forwards file paths through IPC to the main process.
Stage 2 — Normalization: The main process invokes ffmpeg as a subprocess to decode the input and resample it to 16 kHz mono PCM, the canonical format expected by whisper.cpp.
Stage 3 — Inference: A whisper.cpp child process runs the selected model against the normalized audio, streaming progress updates back over stdout. The renderer receives these events via IPC and updates the progress UI in real time.
Stage 4 — Segment Assembly: Timestamped segments produced by Whisper are concatenated into a single document. Metadata (language, model size, duration, source filename) is captured alongside the transcript.
Stage 5 — Persistence and Export: The completed session is persisted to SQLite via better-sqlite3. Users can then export to .txt (plain text), .docx (via the docx library), or .pdf (via jspdf), all generated locally without any server round-trip.
| Layer | Technology | Role |
|---|---|---|
| Desktop shell | Electron 41 | Cross-platform native window, process model, packaging |
| Renderer framework | React 19, TypeScript 6 | Component model, type safety |
| Build tooling | Vite 8 | Dev server, bundling, HMR |
| Styling | Tailwind CSS 4 | Utility-first responsive UI |
| ASR engine | whisper.cpp | On-device Whisper inference |
| Audio normalization | ffmpeg | Decoding and 16 kHz PCM conversion |
| Persistence | better-sqlite3 | Local history database |
| Document export | docx, jspdf, file-saver | TXT / DOCX / PDF generation |
| Packaging | electron-builder | Installers for macOS, Windows, Linux |
Transkrip is distributed through GitHub Releases as pre-built installers for macOS, Windows, and Linux. System dependencies (ffmpeg and whisper.cpp) are installed via the platform's standard package manager. On macOS, for example:
brew install ffmpeg brew install whisper-cpp
Once the dependencies are present, installing Transkrip is a single-click operation. On first launch, the application verifies the presence of the required binaries and guides the user through any missing steps.
Transkrip was designed from the ground up with privacy as a non-negotiable property. Specifically, the system guarantees:
These properties make Transkrip appropriate for legal transcription, medical dictation, confidential interviews, and any scenario where audio content is subject to confidentiality or regulatory constraints.
whisper.cpp leverages Accelerate and ARM NEON on Apple Silicon for fast inference.whisper.cpp running on CPU with AVX acceleration..deb packages, compatible with most modern distributions.Transkrip is well suited to a variety of professional workflows:
Transkrip demonstrates that privacy-first, production-quality transcription can be packaged as a polished desktop application accessible to non-technical users. By combining Electron, React, TypeScript, and whisper.cpp, the project delivers a zero-cloud workflow without sacrificing the ergonomics users expect from modern software.
Planned future directions include:
whisper.cpp.The source code is available at https://github.com/romizone/transkrip, and pre-built installers are published at github.com/romizone/transkrip/releases/tag/v1.0.0.