AI News Presenter Simulator: Virtual News Anchor with Bilingual Text-to-Speech and Animated SVG Avatar

Romi Nur Ismanto
Independent AI Research Lab, Jakarta, Indonesia
rominur@gmail.com
February 2026

Abstract

We present AI News Presenter Simulator, a browser-based virtual news anchor application that combines animated SVG avatars with real-time Text-to-Speech (TTS) synthesis and synchronized subtitles. The system renders a professional female presenter avatar with lip-sync mouth animation, automatic eye blink cycles, subtle breathing motion, and intensified glow effects during broadcast. Audio generation leverages the Web Speech API with bilingual support for Bahasa Indonesia and English, featuring adjustable speech rate (0.5x–2x), pitch control (0.5–2.0), multiple voice selection per language, and automatic voice switching on language change. The TV studio experience includes a running BREAKING news ticker, blinking LIVE badge, real-time clock display, SVG news desk, audio equalizer animation, floating particles, spotlight effects, and glassmorphism UI styling. Real-time subtitles with word-level highlighting provide visual reinforcement of spoken content. The entire application is built as a single HTML file with zero external dependencies using vanilla HTML5, CSS3, and JavaScript, deployed on Vercel with edge CDN distribution. We describe the avatar animation system, TTS integration pipeline, subtitle synchronization mechanism, and responsive design architecture.

Keywords: text-to-speech, TTS, Web Speech API, SVG animation, virtual presenter, news anchor, lip-sync, bilingual, subtitle synchronization, single-page application, zero dependencies, browser-based

1. Introduction

The broadcasting industry has increasingly explored virtual presenters as cost-effective alternatives to human anchors for routine news delivery, weather updates, and informational segments. Traditional virtual presenter systems require substantial infrastructure—video generation models, GPU-intensive lip-sync networks such as Wav2Lip or SadTalker, dedicated backend servers, and complex audio-video synchronization pipelines. These requirements place virtual presenter technology beyond the reach of educators, small newsrooms, and individual creators.

AI News Presenter Simulator demonstrates that a compelling virtual news anchor experience can be achieved entirely within the browser using standard web technologies. By combining SVG-based avatar animation with the Web Speech API for text-to-speech synthesis, the system eliminates the need for external servers, GPU hardware, or third-party API subscriptions. The result is a zero-cost, zero-dependency application that runs on any modern browser.

The key contributions of this work are:

2. Related Work

Virtual presenter systems have evolved along two trajectories: deep learning-based approaches and web-based approaches. On the deep learning side, SadTalker (Zhang et al., 2023) generates realistic talking head videos from a single image and audio input using 3D motion coefficients, while Wav2Lip (Prajwal et al., 2020) achieves accurate lip-sync by training on the LRS2 dataset. These systems produce photorealistic results but require GPU inference, introducing latency, cost, and infrastructure dependencies.

The Web Speech API (W3C, 2012) provides browser-native text-to-speech synthesis without server-side processing. While the synthesis quality varies across browsers and operating systems, modern implementations on Chrome (Google TTS), Edge (Microsoft Azure TTS), and Safari (Apple TTS) deliver acceptable quality for informational content. The API exposes SpeechSynthesisUtterance events including boundary events that report the character offset of each spoken word—a capability we leverage for subtitle synchronization.

SVG animation for character representation has been explored in educational contexts and interactive storytelling. Unlike raster-based approaches, SVG avatars scale to any resolution without quality loss, render efficiently on low-powered devices, and can be manipulated programmatically through CSS and JavaScript. The combination of SVG avatars with TTS has not been extensively explored for news presentation scenarios, which is the gap this work addresses.

3. System Architecture

AI News Presenter Simulator is architected as a single-page application contained entirely within one HTML file. The application comprises four major subsystems: the SVG Avatar Engine, the TTS Pipeline, the Subtitle Synchronizer, and the Studio UI Layer. All subsystems communicate through a shared JavaScript state object and DOM event listeners.

3.1 Application Flow

User Input (Text + Language) → TTS Engine (Web Speech API) → Avatar Animation Controller → Lip-Sync + Glow Effects → Subtitle Synchronizer (Word Boundary Events) → Real-Time Display (Ticker + Clock + Equalizer)

3.2 SVG Avatar Engine

The presenter avatar is a hand-crafted inline SVG element with individually addressable components for the face, hair, eyes, mouth, blazer, and body. Each component uses SVG gradient fills for depth and realism. The animation system manages four concurrent animation loops:

Table 1: Avatar animation subsystems
Animation Trigger Mechanism Frequency
Lip-sync (mouth) TTS speaking state CSS class toggle on SVG mouth path ~150ms cycle
Eye blink Interval timer SVG ellipse ry attribute animation Every 3–5 seconds (randomized)
Breathing CSS animation (infinite) Subtle translateY on torso group 4-second cycle
Broadcast glow TTS speaking state CSS box-shadow pulse on avatar container 2-second pulse cycle

The lip-sync animation operates by toggling between open and closed mouth SVG paths. When the TTS engine is actively speaking, a JavaScript interval alternates the mouth state at approximately 150ms intervals, creating the visual impression of speech. The mouth animation is synchronized with the TTS state rather than individual phonemes—a deliberate design choice that avoids the complexity of phoneme-to-viseme mapping while maintaining visual believability at the application's typical viewing distance.

3.3 Text-to-Speech Pipeline

The TTS subsystem wraps the Web Speech API SpeechSynthesis interface with language management, voice selection, and parameter control:

Table 2: TTS configuration parameters
Parameter Range Default Control
Language id-ID, en-US/en-GB id-ID Toggle button
Speech Rate 0.5x – 2.0x 1.0x Range slider
Pitch 0.5 – 2.0 1.0 Range slider
Voice Available system voices First matching voice Dropdown select

When the user switches language, the system performs a cascading update: the voice list is re-filtered for the new locale, the UI labels (buttons, headers, placeholders) switch to the corresponding language, the news template presets update, and any active broadcast is stopped. Voice availability depends on the user's operating system and browser—Chrome on macOS typically offers 20+ voices, while Chrome on Linux may offer fewer options.

3.4 Subtitle Synchronization

Real-time subtitle display with word-level highlighting is achieved through the SpeechSynthesisUtterance.onboundary event. This event fires at word boundaries during speech synthesis, providing the character index and length of the currently spoken word. The subtitle system:

  1. Pre-renders the full transcript as a sequence of <span> elements, one per word.
  2. Listens for boundary events and maps the reported character offset to the corresponding word span.
  3. Applies a highlight CSS class to the current word, creating a karaoke-style visual effect.
  4. Auto-scrolls the subtitle container to keep the highlighted word visible.

The word highlight uses a distinct background color and increased font weight, providing clear visual indication of the current reading position. This feature is particularly valuable for language learners and hearing-impaired users who benefit from simultaneous audio and visual text presentation.

4. Studio UI Design

The visual design replicates the aesthetics of a professional TV news studio using pure CSS techniques:

4.1 Visual Components

Table 3: Studio UI elements
Element Implementation Behavior
LIVE Badge CSS animation (opacity pulse) Blinks continuously during broadcast
News Ticker CSS translateX animation Continuous horizontal scroll with BREAKING prefix
Real-time Clock JavaScript setInterval Updates every second (HH:MM:SS WIB format)
News Desk SVG with gradient fills Static foreground element in front of avatar
Audio Equalizer CSS animation (scaleY randomized) Animated bars during broadcast, static when idle
Floating Particles CSS keyframe animation Subtle floating dots for ambient depth
Spotlight CSS radial-gradient overlay Intensifies during broadcast mode

4.2 Glassmorphism Design

The control panel and subtitle container use glassmorphism styling—semi-transparent backgrounds with backdrop-filter: blur()—creating a modern, layered visual effect. This design choice allows the studio background and avatar to remain partially visible behind the controls, reinforcing the immersive broadcast environment.

4.3 Responsive Layout

The layout adapts to three viewport categories using CSS media queries:

Table 4: Responsive breakpoints
Viewport Width Layout Adaptation
Desktop ≥ 1024px Full studio layout with side-by-side controls
Tablet 768px – 1023px Stacked layout, reduced ticker speed
Mobile < 768px Single column, touch-optimized controls, compact avatar

5. Implementation Details

5.1 Technology Stack

Table 5: Core technology stack
Layer Technology Purpose
Markup HTML5 Document structure, semantic elements, inline SVG avatar
Styling CSS3 Animations, glassmorphism, gradients, responsive layout (600+ lines)
Logic Vanilla JavaScript TTS control, animation orchestration, state management
TTS Engine Web Speech API Browser-native speech synthesis (zero cost, no API key)
Avatar Inline SVG Vector-based presenter with gradient fills and CSS animation
Typography Google Fonts Playfair Display, Source Sans 3, JetBrains Mono
Hosting Vercel Static deployment with edge CDN and CI/CD
Source Control GitHub Version control with automatic Vercel deployment on push

5.2 Zero-Dependency Architecture

A defining architectural decision is the zero-dependency approach: the entire application—HTML structure, CSS styling (600+ lines), SVG avatar definition, and JavaScript logic—resides in a single index.html file. No npm packages, no build step, no bundler, no framework. This choice yields several advantages:

5.3 News Template System

The application includes four pre-built news script templates to demonstrate its capabilities without requiring users to write content:

Table 6: Pre-built news templates
Template Language Topic
Berita Utama Indonesian General news headline
Berita Ekonomi Indonesian Economic/financial news
Tech News English Technology news
World News English International news

Templates automatically switch when the user changes language, ensuring the displayed content always matches the selected TTS language. Users can also write custom scripts in the text area for any topic.

6. Deployment

6.1 Vercel Deployment (Recommended)

The project is deployed on Vercel with the following configuration:

{
  "rewrites": [{ "source": "/(.*)", "destination": "/index.html" }],
  "headers": [{
    "source": "/(.*)",
    "headers": [
      { "key": "Cache-Control", "value": "public, max-age=3600, s-maxage=86400" },
      { "key": "X-Content-Type-Options", "value": "nosniff" },
      { "key": "X-Frame-Options", "value": "DENY" }
    ]
  }]
}

The configuration enables SPA routing, sets appropriate cache headers for static content, and applies security headers to prevent content-type sniffing and clickjacking attacks. Vercel's edge CDN ensures low-latency delivery globally.

6.2 Local Development

git clone https://github.com/romizone/ai-news-presenter.git
cd ai-news-presenter
npx serve .
# Or: python3 -m http.server 3000

Due to the zero-dependency architecture, the application can also be opened directly as a local file (open index.html) without a web server, though some browsers may restrict Web Speech API access in the file:// protocol.

7. Browser Compatibility

The Web Speech API is the primary compatibility constraint. The following table summarizes TTS support across major browsers:

Table 7: Browser TTS compatibility
Browser TTS Support Indonesian Voices Boundary Events
Chrome (Desktop) Full Yes (Google TTS) Yes
Edge (Desktop) Full Yes (Azure TTS) Yes
Safari (macOS/iOS) Full Limited Partial
Firefox Partial OS-dependent Limited
Chrome (Android) Full Yes Yes

Chrome and Edge provide the best experience due to their comprehensive voice collections and reliable boundary event firing. The word highlighting feature degrades gracefully on browsers with limited boundary event support—subtitles still display but without word-level tracking.

8. Project Structure

ai-news-presenter/
├── index.html              # Complete application (single-file, all-in-one)
│   ├── <style>             # CSS: animations, glassmorphism, responsive (600+ lines)
│   ├── <body>              # HTML: structure + inline SVG avatar
│   └── <script>            # JS: TTS, animation, state management
├── package.json            # Project metadata and npm scripts
├── vercel.json             # Vercel routing and cache configuration
├── LICENSE                 # MIT License
└── README.md               # Documentation

9. Future Work

The current v1.0.0 release establishes the foundation for more advanced virtual presenter capabilities. Planned enhancements include:

10. Conclusion

AI News Presenter Simulator demonstrates that a compelling virtual news anchor experience can be built entirely with standard web technologies—HTML5, CSS3, and vanilla JavaScript—without external dependencies, server infrastructure, or API costs. The combination of SVG avatar animation, Web Speech API synthesis, and word-level subtitle synchronization creates an engaging broadcast simulation that runs on any modern browser.

The zero-dependency, single-file architecture makes the application immediately deployable, easily maintainable, and fully transparent for educational purposes. The bilingual support for Bahasa Indonesia and English, combined with four pre-built news templates, provides a ready-to-use tool for educators, content creators, and developers exploring virtual presenter technology.

The complete source code is available at https://github.com/romizone/ai-news-presenter and a live demo is accessible at https://files-navy-three.vercel.app.

References

  1. Zhang, W., Cun, X., Wang, X., et al. (2023). SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  2. Prajwal, K.R., Mukhopadhyay, R., Namboodiri, V.P., and Jawahar, C.V. (2020). A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild. Proceedings of the 28th ACM International Conference on Multimedia.
  3. W3C. (2012). Web Speech API Specification. World Wide Web Consortium. https://wicg.github.io/speech-api/
  4. MDN Web Docs. (2024). SpeechSynthesisUtterance: boundary event. Mozilla Developer Network. https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance
  5. Vercel Inc. (2024). Vercel Documentation: Edge Network. Vercel. https://vercel.com/docs
  6. Dahlbäck, N., Jönsson, A., and Ahrenberg, L. (1993). Wizard of Oz Studies: Why and How. Knowledge-Based Systems, 6(4), 258–266.
  7. Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. (2000). Embodied Conversational Agents. MIT Press.
  8. W3C. (2011). Scalable Vector Graphics (SVG) 1.1 (Second Edition). World Wide Web Consortium. https://www.w3.org/TR/SVG11/