Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

102 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

6 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

169 1 5

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework (2403.13248v3)

Published 20 Mar 2024 in cs.CV

Abstract: Text-to-video generation has made significant strides, but replicating the capabilities of advanced systems like OpenAI Sora remains challenging due to their closed-source nature. Existing open-source methods struggle to achieve comparable performance, often hindered by ineffective agent collaboration and inadequate training data quality. In this paper, we introduce Mora, a novel multi-agent framework that leverages existing open-source modules to replicate Sora functionalities. We address these fundamental limitations by proposing three key techniques: (1) multi-agent fine-tuning with a self-modulation factor to enhance inter-agent coordination, (2) a data-free training strategy that uses large models to synthesize training data, and (3) a human-in-the-loop mechanism combined with multimodal LLMs for data filtering to ensure high-quality training datasets. Our comprehensive experiments on six video generation tasks demonstrate that Mora achieves performance comparable to Sora on VBench, outperforming existing open-source methods across various tasks. Specifically, in the text-to-video generation task, Mora achieved a Video Quality score of 0.800, surpassing Sora 0.797 and outperforming all other baseline models across six key metrics. Additionally, in the image-to-video generation task, Mora achieved a perfect Dynamic Degree score of 1.00, demonstrating exceptional capability in enhancing motion realism and achieving higher Imaging Quality than Sora. These results highlight the potential of collaborative multi-agent systems and human-in-the-loop mechanisms in advancing text-to-video generation. Our code is available at \url{https://github.com/lichao-sun/Mora}.

View on arXiv

References (89)

Authors (13)

Zhengqing Yuan (17 papers)
Ruoxi Chen (22 papers)
Zhaoxu Li (7 papers)
Haolong Jia (3 papers)
Lifang He (98 papers)
Chi Wang (93 papers)
Lichao Sun (186 papers)
Yixin Liu (108 papers)
Yihan Cao (14 papers)
Weixiang Sun (20 papers)
Bin Lin (33 papers)
Li Yuan (141 papers)
Yanfang Ye (67 papers)

Citations (17)

View on Semantic Scholar

Summary

Introducing TrustGPT: Benchmarks for Assessing the Ethical Dimensions of LLMs

Overview of TrustGPT

Within the rapidly evolving landscape of natural language processing technologies, LLMs have emerged as powerful tools capable of performing a wide range of tasks. However, alongside their significant benefits, these models pose ethical challenges that necessitate a careful evaluation of their societal impacts. To bridge this gap, researchers Yue Huang, Qihui Zhang, and Lichao Sun have proposed TrustGPT, a comprehensive benchmark designed to assess LLMs across three critical ethical dimensions: toxicity, bias, and value alignment.

Key Contributions

TrustGPT stands out through its targeted approach to evaluating LLMs by focusing on:

Toxicity: Identifying and measuring the extent to which LLMs can generate harmful or inappropriate content based on various social norms.
Bias: Investigating biases within LLMs by analyzing their responses across different demographic groups and quantifying any identified disparities.
Value Alignment: Examining how well the outputs of LLMs align with human ethical values, categorized into active and passive alignments.

This meticulously designed benchmark offers researchers and developers a structured framework for scrutinizing the ethical impacts of their LLMs, thereby paving the way for the development of more responsible and socially-aware language technologies.

Empirical Evaluation and Insights

Utilizing TrustGPT, the authors conducted a thorough evaluation of eight state-of-the-art LLMs, including the well-known ChatGPT and LLaMA models. The empirical analysis shed light on several crucial aspects:

Toxicity assessments revealed varying levels of potential harmful content generation among the models, with certain models displaying a higher propensity for toxicity under specific prompts.
Bias metrics underscored the existence of significant biases in several models, particularly towards specific demographic groups, highlighting the urgent need for bias mitigation strategies.
Value alignment tasks illustrated the challenges models face in aligning their outputs with human ethical standards, especially under complex or ambiguous scenarios.

These findings underscore the importance of ongoing efforts to address ethical considerations in the development and deployment of LLMs.

The Path Forward

The introduction of TrustGPT marks a significant step towards a more ethical and responsible approach to LLM development. By highlighting the potential risks and ethical dilemmas associated with these technologies, this research encourages the AI community to prioritize the development of LLMs that not only excel in task performance but also adhere to societal norms and values.

Future research directions inspired by TrustGPT could include the exploration of more nuanced ethical frameworks, the development of advanced bias mitigation techniques, and the creation of more sophisticated models capable of navigating the complex landscape of human ethics.

In conclusion, TrustGPT serves as a valuable tool for the AI research community, offering insights into the ethical dimensions of LLMs and guiding the development of more ethical, transparent, and equitable language technologies.

PDF Markdown

Tweets

https://twitter.com/_akhaliq/status/1770626120234250424

https://twitter.com/AdeenaY8/status/1771128003902177724

https://twitter.com/javaeeeee1/status/1770780778089062669

https://twitter.com/NeuralLLama/status/1772741927957373159

https://twitter.com/CSVisionPapers/status/1770827079027732622

YouTube

Show All Videos

[2403.13248] Mora: Enabling Generalist Video Generation via A Multi-Agent Framework (1 point, 0 comments)