We present AI News Presenter Simulator, a browser-based virtual news anchor application that combines animated SVG avatars with real-time Text-to-Speech (TTS) synthesis and synchronized subtitles. The system renders a professional female presenter avatar with lip-sync mouth animation, automatic eye blink cycles, subtle breathing motion, and intensified glow effects during broadcast. Audio generation leverages the Web Speech API with bilingual support for Bahasa Indonesia and English, featuring adjustable speech rate (0.5x–2x), pitch control (0.5–2.0), multiple voice selection per language, and automatic voice switching on language change. The TV studio experience includes a running BREAKING news ticker, blinking LIVE badge, real-time clock display, SVG news desk, audio equalizer animation, floating particles, spotlight effects, and glassmorphism UI styling. Real-time subtitles with word-level highlighting provide visual reinforcement of spoken content. The entire application is built as a single HTML file with zero external dependencies using vanilla HTML5, CSS3, and JavaScript, deployed on Vercel with edge CDN distribution. We describe the avatar animation system, TTS integration pipeline, subtitle synchronization mechanism, and responsive design architecture.
The broadcasting industry has increasingly explored virtual presenters as cost-effective alternatives to human anchors for routine news delivery, weather updates, and informational segments. Traditional virtual presenter systems require substantial infrastructure—video generation models, GPU-intensive lip-sync networks such as Wav2Lip or SadTalker, dedicated backend servers, and complex audio-video synchronization pipelines. These requirements place virtual presenter technology beyond the reach of educators, small newsrooms, and individual creators.
AI News Presenter Simulator demonstrates that a compelling virtual news anchor experience can be achieved entirely within the browser using standard web technologies. By combining SVG-based avatar animation with the Web Speech API for text-to-speech synthesis, the system eliminates the need for external servers, GPU hardware, or third-party API subscriptions. The result is a zero-cost, zero-dependency application that runs on any modern browser.
The key contributions of this work are:
SpeechSynthesisUtterance boundary events.Virtual presenter systems have evolved along two trajectories: deep learning-based approaches and web-based approaches. On the deep learning side, SadTalker (Zhang et al., 2023) generates realistic talking head videos from a single image and audio input using 3D motion coefficients, while Wav2Lip (Prajwal et al., 2020) achieves accurate lip-sync by training on the LRS2 dataset. These systems produce photorealistic results but require GPU inference, introducing latency, cost, and infrastructure dependencies.
The Web Speech API (W3C, 2012) provides browser-native text-to-speech synthesis without server-side processing. While the synthesis quality varies across browsers and operating systems, modern implementations on Chrome (Google TTS), Edge (Microsoft Azure TTS), and Safari (Apple TTS) deliver acceptable quality for informational content. The API exposes SpeechSynthesisUtterance events including boundary events that report the character offset of each spoken word—a capability we leverage for subtitle synchronization.
SVG animation for character representation has been explored in educational contexts and interactive storytelling. Unlike raster-based approaches, SVG avatars scale to any resolution without quality loss, render efficiently on low-powered devices, and can be manipulated programmatically through CSS and JavaScript. The combination of SVG avatars with TTS has not been extensively explored for news presentation scenarios, which is the gap this work addresses.
AI News Presenter Simulator is architected as a single-page application contained entirely within one HTML file. The application comprises four major subsystems: the SVG Avatar Engine, the TTS Pipeline, the Subtitle Synchronizer, and the Studio UI Layer. All subsystems communicate through a shared JavaScript state object and DOM event listeners.
The presenter avatar is a hand-crafted inline SVG element with individually addressable components for the face, hair, eyes, mouth, blazer, and body. Each component uses SVG gradient fills for depth and realism. The animation system manages four concurrent animation loops:
| Animation | Trigger | Mechanism | Frequency |
|---|---|---|---|
| Lip-sync (mouth) | TTS speaking state | CSS class toggle on SVG mouth path | ~150ms cycle |
| Eye blink | Interval timer | SVG ellipse ry attribute animation | Every 3–5 seconds (randomized) |
| Breathing | CSS animation (infinite) | Subtle translateY on torso group | 4-second cycle |
| Broadcast glow | TTS speaking state | CSS box-shadow pulse on avatar container | 2-second pulse cycle |
The lip-sync animation operates by toggling between open and closed mouth SVG paths. When the TTS engine is actively speaking, a JavaScript interval alternates the mouth state at approximately 150ms intervals, creating the visual impression of speech. The mouth animation is synchronized with the TTS state rather than individual phonemes—a deliberate design choice that avoids the complexity of phoneme-to-viseme mapping while maintaining visual believability at the application's typical viewing distance.
The TTS subsystem wraps the Web Speech API SpeechSynthesis interface with language management, voice selection, and parameter control:
| Parameter | Range | Default | Control |
|---|---|---|---|
| Language | id-ID, en-US/en-GB | id-ID | Toggle button |
| Speech Rate | 0.5x – 2.0x | 1.0x | Range slider |
| Pitch | 0.5 – 2.0 | 1.0 | Range slider |
| Voice | Available system voices | First matching voice | Dropdown select |
When the user switches language, the system performs a cascading update: the voice list is re-filtered for the new locale, the UI labels (buttons, headers, placeholders) switch to the corresponding language, the news template presets update, and any active broadcast is stopped. Voice availability depends on the user's operating system and browser—Chrome on macOS typically offers 20+ voices, while Chrome on Linux may offer fewer options.
Real-time subtitle display with word-level highlighting is achieved through the SpeechSynthesisUtterance.onboundary event. This event fires at word boundaries during speech synthesis, providing the character index and length of the currently spoken word. The subtitle system:
<span> elements, one per word.boundary events and maps the reported character offset to the corresponding word span.The word highlight uses a distinct background color and increased font weight, providing clear visual indication of the current reading position. This feature is particularly valuable for language learners and hearing-impaired users who benefit from simultaneous audio and visual text presentation.
The visual design replicates the aesthetics of a professional TV news studio using pure CSS techniques:
| Element | Implementation | Behavior |
|---|---|---|
| LIVE Badge | CSS animation (opacity pulse) | Blinks continuously during broadcast |
| News Ticker | CSS translateX animation | Continuous horizontal scroll with BREAKING prefix |
| Real-time Clock | JavaScript setInterval | Updates every second (HH:MM:SS WIB format) |
| News Desk | SVG with gradient fills | Static foreground element in front of avatar |
| Audio Equalizer | CSS animation (scaleY randomized) | Animated bars during broadcast, static when idle |
| Floating Particles | CSS keyframe animation | Subtle floating dots for ambient depth |
| Spotlight | CSS radial-gradient overlay | Intensifies during broadcast mode |
The control panel and subtitle container use glassmorphism styling—semi-transparent backgrounds with backdrop-filter: blur()—creating a modern, layered visual effect. This design choice allows the studio background and avatar to remain partially visible behind the controls, reinforcing the immersive broadcast environment.
The layout adapts to three viewport categories using CSS media queries:
| Viewport | Width | Layout Adaptation |
|---|---|---|
| Desktop | ≥ 1024px | Full studio layout with side-by-side controls |
| Tablet | 768px – 1023px | Stacked layout, reduced ticker speed |
| Mobile | < 768px | Single column, touch-optimized controls, compact avatar |
| Layer | Technology | Purpose |
|---|---|---|
| Markup | HTML5 | Document structure, semantic elements, inline SVG avatar |
| Styling | CSS3 | Animations, glassmorphism, gradients, responsive layout (600+ lines) |
| Logic | Vanilla JavaScript | TTS control, animation orchestration, state management |
| TTS Engine | Web Speech API | Browser-native speech synthesis (zero cost, no API key) |
| Avatar | Inline SVG | Vector-based presenter with gradient fills and CSS animation |
| Typography | Google Fonts | Playfair Display, Source Sans 3, JetBrains Mono |
| Hosting | Vercel | Static deployment with edge CDN and CI/CD |
| Source Control | GitHub | Version control with automatic Vercel deployment on push |
A defining architectural decision is the zero-dependency approach: the entire application—HTML structure, CSS styling (600+ lines), SVG avatar definition, and JavaScript logic—resides in a single index.html file. No npm packages, no build step, no bundler, no framework. This choice yields several advantages:
The application includes four pre-built news script templates to demonstrate its capabilities without requiring users to write content:
| Template | Language | Topic |
|---|---|---|
| Berita Utama | Indonesian | General news headline |
| Berita Ekonomi | Indonesian | Economic/financial news |
| Tech News | English | Technology news |
| World News | English | International news |
Templates automatically switch when the user changes language, ensuring the displayed content always matches the selected TTS language. Users can also write custom scripts in the text area for any topic.
The project is deployed on Vercel with the following configuration:
{
"rewrites": [{ "source": "/(.*)", "destination": "/index.html" }],
"headers": [{
"source": "/(.*)",
"headers": [
{ "key": "Cache-Control", "value": "public, max-age=3600, s-maxage=86400" },
{ "key": "X-Content-Type-Options", "value": "nosniff" },
{ "key": "X-Frame-Options", "value": "DENY" }
]
}]
}
The configuration enables SPA routing, sets appropriate cache headers for static content, and applies security headers to prevent content-type sniffing and clickjacking attacks. Vercel's edge CDN ensures low-latency delivery globally.
git clone https://github.com/romizone/ai-news-presenter.git cd ai-news-presenter npx serve . # Or: python3 -m http.server 3000
Due to the zero-dependency architecture, the application can also be opened directly as a local file (open index.html) without a web server, though some browsers may restrict Web Speech API access in the file:// protocol.
The Web Speech API is the primary compatibility constraint. The following table summarizes TTS support across major browsers:
| Browser | TTS Support | Indonesian Voices | Boundary Events |
|---|---|---|---|
| Chrome (Desktop) | Full | Yes (Google TTS) | Yes |
| Edge (Desktop) | Full | Yes (Azure TTS) | Yes |
| Safari (macOS/iOS) | Full | Limited | Partial |
| Firefox | Partial | OS-dependent | Limited |
| Chrome (Android) | Full | Yes | Yes |
Chrome and Edge provide the best experience due to their comprehensive voice collections and reliable boundary event firing. The word highlighting feature degrades gracefully on browsers with limited boundary event support—subtitles still display but without word-level tracking.
ai-news-presenter/ ├── index.html # Complete application (single-file, all-in-one) │ ├── <style> # CSS: animations, glassmorphism, responsive (600+ lines) │ ├── <body> # HTML: structure + inline SVG avatar │ └── <script> # JS: TTS, animation, state management ├── package.json # Project metadata and npm scripts ├── vercel.json # Vercel routing and cache configuration ├── LICENSE # MIT License └── README.md # Documentation
The current v1.0.0 release establishes the foundation for more advanced virtual presenter capabilities. Planned enhancements include:
AI News Presenter Simulator demonstrates that a compelling virtual news anchor experience can be built entirely with standard web technologies—HTML5, CSS3, and vanilla JavaScript—without external dependencies, server infrastructure, or API costs. The combination of SVG avatar animation, Web Speech API synthesis, and word-level subtitle synchronization creates an engaging broadcast simulation that runs on any modern browser.
The zero-dependency, single-file architecture makes the application immediately deployable, easily maintainable, and fully transparent for educational purposes. The bilingual support for Bahasa Indonesia and English, combined with four pre-built news templates, provides a ready-to-use tool for educators, content creators, and developers exploring virtual presenter technology.
The complete source code is available at https://github.com/romizone/ai-news-presenter and a live demo is accessible at https://files-navy-three.vercel.app.