Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 84 tok/s

Gemini 2.5 Pro 37 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 86 tok/s Pro

GPT OSS 120B 468 tok/s Pro

Kimi K2 229 tok/s Pro

2000 character limit reached

ViDA-MAN: Visual Dialog with Digital Humans (2110.13384v1)

Published 26 Oct 2021 in cs.CV

Abstract: We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries. Compared to traditional text or voice-based system, ViDA-MAN offers human-like interactions (e.g, vivid voice, natural facial expression and body gestures). Given a speech request, the demonstration is able to response with high quality videos in sub-second latency. To deliver immersive user experience, ViDA-MAN seamlessly integrates multi-modal techniques including Acoustic Speech Recognition (ASR), multi-turn dialog, Text To Speech (TTS), talking heads video generation. Backed with large knowledge base, ViDA-MAN is able to chat with users on a number of topics including chit-chat, weather, device control, News recommendations, booking hotels, as well as answering questions via structured knowledge.

Citations (4)

View on Semantic Scholar

Collections

Summary

The paper introduces a multimodal system that integrates ASR, TTS, and real-time video generation to create human-like digital interactions.
It demonstrates sub-second latency performance, ensuring immediate and immersive responses during natural conversations.
The system’s versatility across various topics highlights its potential for applications in virtual assistance and enhanced digital human interaction.

The paper "ViDA-MAN: Visual Dialog with Digital Humans" introduces a system designed for enhancing interactions with digital humans through multi-modal communication. ViDA-MAN focuses on creating more human-like and immersive experiences by providing real-time audio-visual responses to spoken inquiries.

Key Features of ViDA-MAN:

Multimodal Interaction:
- ViDA-MAN integrates several technologies to allow for seamless interaction, including Acoustic Speech Recognition (ASR), multi-turn dialog systems, Text To Speech (TTS), and talking heads video generation.
- These components work together to interpret speech and generate corresponding facial expressions and gestures, providing a more engaging user experience than traditional text or voice-based systems.
Real-Time Performance:
- The system performs with sub-second latency, meaning it can process speech requests and respond with high-quality video almost instantaneously. This rapid response time is crucial for maintaining an immersive user experience.
Human-Like Interaction Capabilities:
- By generating vivid voices and natural facial expressions, ViDA-MAN can mimic human interactions effectively. This includes not only casual conversations but also expressive body language, which contributes to a more genuine dialog with users.
Wide Range of Topics:
- Supported by a comprehensive knowledge base, ViDA-MAN is capable of engaging in diverse topics such as chit-chat, weather updates, device control, news recommendations, and hotel bookings. It can also answer structured inquiries, enhancing its utility as an informative agent.
Application Potential:
- The approach taken by ViDA-MAN suggests it can be used in various contexts where natural human-interaction is desired with digital agents, expanding possibilities for virtual assistance across different industries.

This work represents a significant advancement in the field of digital human interaction, offering a sophisticated platform for multimodal communication that closely mimics natural human behaviors.

ViDA-MAN: Visual Dialog with Digital Humans (2110.13384v1)

Collections

Summary

Key Features of ViDA-MAN:

Paper Prompts

Follow-up Questions

Authors (10)

ViDA-MAN: Visual Dialog with Digital Humans (2110.13384v1)

Collections

Summary

Key Features of ViDA-MAN:

Paper Prompts

Follow-up Questions

Related Papers

Authors (10)