Transkrip: A Privacy-First Cross-Platform Desktop Application for Local AI-Powered Audio and Video Transcription

Romi Nur Ismanto
Independent AI Research Lab, Jakarta, Indonesia
rominur@gmail.com
April 2026

Abstract

We present Transkrip, an open-source cross-platform desktop application for local, privacy-first transcription of audio and video files. Transkrip combines a modern Electron-based shell with a React 19 / TypeScript 6 renderer and integrates whisper.cpp as its on-device automatic speech recognition (ASR) engine. By performing every stage of the pipeline — file ingestion, format conversion, speech recognition, and document export — entirely on the user's own machine, Transkrip eliminates the privacy, cost, and connectivity concerns that typically accompany cloud-based transcription services. The application supports common audio and video containers (MP3, WAV, M4A, MP4, MOV), provides a persistent local history backed by SQLite via better-sqlite3, and exports results to .txt, .docx, and .pdf. We describe the motivation, system architecture, technology choices, and practical deployment considerations that enable Transkrip to deliver a native desktop experience on macOS, Windows, and Linux while retaining a strict no-cloud guarantee.

Keywords: automatic speech recognition, Whisper, whisper.cpp, Electron, React, TypeScript, privacy-preserving AI, offline transcription, desktop application, ffmpeg

1. Introduction

Modern automatic speech recognition systems, driven by large transformer-based models, have achieved near human-level accuracy across dozens of languages. Yet the vast majority of production transcription workflows remain tied to cloud APIs, requiring users to upload potentially sensitive audio to remote servers in exchange for convenience. For journalists protecting sources, researchers handling interview recordings, medical or legal professionals working with confidential material, and enterprises with data-residency obligations, cloud-based transcription is often unacceptable.

Transkrip addresses this gap with a polished, cross-platform desktop application that performs every stage of transcription locally. Built on Electron 41 with a React 19 and Tailwind CSS 4 interface, Transkrip wraps the highly optimized whisper.cpp engine into a friendly user experience that requires no cloud accounts, no API keys, and — after an initial model download — no network connectivity at all.

The primary contributions of this work are:

2. Related Work

OpenAI's Whisper (Radford et al., 2022) established a robust multilingual ASR baseline by training on 680,000 hours of weakly supervised audio. Several derivative projects have optimized Whisper for local inference: faster-whisper (Guillaumie, 2023) rebuilds the model on top of CTranslate2 for efficient CPU execution, mlx-whisper (Apple MLX Team, 2024) exploits Apple Silicon GPUs via the MLX framework, and whisper.cpp (Gerganov, 2022) provides a dependency-light C/C++ port that runs efficiently across desktop platforms using integer quantization and SIMD acceleration.

Prior offline transcription tools either target power users through command-line interfaces or ship as heavy Python applications with complex dependency chains. Transkrip differs by providing a first-class native desktop UX on top of whisper.cpp, coupled with a modern web technology renderer — trading the Python ecosystem's flexibility for smaller footprint, faster startup, and easier installation for end users.

3. System Architecture

Transkrip adopts the standard Electron two-process model. The renderer process hosts the React application and handles all user interactions, while the main process manages file system access, spawns the ASR subprocess, and owns the SQLite database. A preload script exposes a narrow, typed IPC surface between the two, following the principle of least privilege.

Upload → ffmpeg Normalization → whisper.cpp Inference → Segment Assembly → Export (TXT / DOCX / PDF)

3.1 Process Separation

Table 1: Responsibilities of the Electron main and renderer processes
ProcessResponsibilitiesKey Modules
Renderer (React 19 / TS)UI, drag-and-drop, settings, history view, progress updatesUploadZone, Settings, HistoryList
PreloadTyped IPC bridge, sandboxed API exposurepreload.ts
Main (Electron 41)File I/O, subprocess management, DB, lifecyclemain.ts, whisper.ts, database.ts

3.2 Processing Pipeline

Stage 1 — Ingestion: Users drop audio or video files into the upload zone. Supported containers include MP3, WAV, M4A, MP4, and MOV. The renderer forwards file paths through IPC to the main process.

Stage 2 — Normalization: The main process invokes ffmpeg as a subprocess to decode the input and resample it to 16 kHz mono PCM, the canonical format expected by whisper.cpp.

Stage 3 — Inference: A whisper.cpp child process runs the selected model against the normalized audio, streaming progress updates back over stdout. The renderer receives these events via IPC and updates the progress UI in real time.

Stage 4 — Segment Assembly: Timestamped segments produced by Whisper are concatenated into a single document. Metadata (language, model size, duration, source filename) is captured alongside the transcript.

Stage 5 — Persistence and Export: The completed session is persisted to SQLite via better-sqlite3. Users can then export to .txt (plain text), .docx (via the docx library), or .pdf (via jspdf), all generated locally without any server round-trip.

4. Technology Stack

Table 2: Core technology stack of Transkrip
LayerTechnologyRole
Desktop shellElectron 41Cross-platform native window, process model, packaging
Renderer frameworkReact 19, TypeScript 6Component model, type safety
Build toolingVite 8Dev server, bundling, HMR
StylingTailwind CSS 4Utility-first responsive UI
ASR enginewhisper.cppOn-device Whisper inference
Audio normalizationffmpegDecoding and 16 kHz PCM conversion
Persistencebetter-sqlite3Local history database
Document exportdocx, jspdf, file-saverTXT / DOCX / PDF generation
Packagingelectron-builderInstallers for macOS, Windows, Linux

5. Installation and Distribution

Transkrip is distributed through GitHub Releases as pre-built installers for macOS, Windows, and Linux. System dependencies (ffmpeg and whisper.cpp) are installed via the platform's standard package manager. On macOS, for example:

brew install ffmpeg
brew install whisper-cpp

Once the dependencies are present, installing Transkrip is a single-click operation. On first launch, the application verifies the presence of the required binaries and guides the user through any missing steps.

6. Privacy and Security Considerations

Transkrip was designed from the ground up with privacy as a non-negotiable property. Specifically, the system guarantees:

These properties make Transkrip appropriate for legal transcription, medical dictation, confidential interviews, and any scenario where audio content is subject to confidentiality or regulatory constraints.

7. Platform Compatibility

8. Use Cases

Transkrip is well suited to a variety of professional workflows:

9. Conclusion and Future Work

Transkrip demonstrates that privacy-first, production-quality transcription can be packaged as a polished desktop application accessible to non-technical users. By combining Electron, React, TypeScript, and whisper.cpp, the project delivers a zero-cloud workflow without sacrificing the ergonomics users expect from modern software.

Planned future directions include:

The source code is available at https://github.com/romizone/transkrip, and pre-built installers are published at github.com/romizone/transkrip/releases/tag/v1.0.0.

References

  1. Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. arXiv preprint arXiv:2212.04356.
  2. Gerganov, G. (2022). whisper.cpp: Port of OpenAI's Whisper model in C/C++. GitHub Repository. https://github.com/ggerganov/whisper.cpp
  3. Guillaumie, G. (2023). faster-whisper: Faster Whisper transcription with CTranslate2. GitHub Repository. https://github.com/SYSTRAN/faster-whisper
  4. Apple MLX Team (2024). mlx-whisper: Whisper inference on Apple Silicon using MLX. GitHub Repository. https://github.com/ml-explore/mlx-examples
  5. Electron Maintainers (2024). Electron: Build cross-platform desktop apps with JavaScript, HTML, and CSS. GitHub Repository. https://github.com/electron/electron
  6. FFmpeg Developers (2024). FFmpeg: A complete, cross-platform solution to record, convert and stream audio and video. https://ffmpeg.org
  7. Meta Open Source (2024). React 19. React Documentation. https://react.dev