Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiahao Mei

MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio

Mar 07, 2025

Xuenan Xu, Jiahao Mei, Chenliang Li, Yuning Wu, Ming Yan, Shaopeng Lai, Ji Zhang, Mengyue Wu

Abstract:The rapid advancement of large language models (LLMs) and artificial intelligence-generated content (AIGC) has accelerated AI-native applications, such as AI-based storybooks that automate engaging story production for children. However, challenges remain in improving story attractiveness, enriching storytelling expressiveness, and developing open-source evaluation benchmarks and frameworks. Therefore, we propose and opensource MM-StoryAgent, which creates immersive narrated video storybooks with refined plots, role-consistent images, and multi-channel audio. MM-StoryAgent designs a multi-agent framework that employs LLMs and diverse expert tools (generative models and APIs) across several modalities to produce expressive storytelling videos. The framework enhances story attractiveness through a multi-stage writing pipeline. In addition, it improves the immersive storytelling experience by integrating sound effects with visual, music and narrative assets. MM-StoryAgent offers a flexible, open-source platform for further development, where generative modules can be substituted. Both objective and subjective evaluation regarding textual story quality and alignment between modalities validate the effectiveness of our proposed MM-StoryAgent system. The demo and source code are available.

Via

Access Paper or Ask Questions

WritingBench: A Comprehensive Benchmark for Generative Writing

Mar 07, 2025

Yuning Wu, Jiahao Mei, Ming Yan, Chenliang Li, SHaopeng Lai, Yuran Ren, Zijia Wang, Ji Zhang, Mengyue Wu, Qin Jin(+1 more)

Abstract:Recent advancements in large language models (LLMs) have significantly enhanced text generation capabilities, yet evaluating their performance in generative writing remains a challenge. Existing benchmarks primarily focus on generic text generation or limited in writing tasks, failing to capture the diverse requirements of high-quality written contents across various domains. To bridge this gap, we present WritingBench, a comprehensive benchmark designed to evaluate LLMs across 6 core writing domains and 100 subdomains, encompassing creative, persuasive, informative, and technical writing. We further propose a query-dependent evaluation framework that empowers LLMs to dynamically generate instance-specific assessment criteria. This framework is complemented by a fine-tuned critic model for criteria-aware scoring, enabling evaluations in style, format and length. The framework's validity is further demonstrated by its data curation capability, which enables 7B-parameter models to approach state-of-the-art (SOTA) performance. We open-source the benchmark, along with evaluation tools and modular framework components, to advance the development of LLMs in writing.

Via

Access Paper or Ask Questions

Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Oct 10, 2024

Kaiyuan Liu, Jiahao Mei, Hengyu Zhang, Yihuai Zhang, Xingjiao Wu, Daoguo Dong, Liang He

Figure 1 for Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Figure 2 for Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Figure 3 for Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Figure 4 for Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Abstract:Although Chinese calligraphy generation has achieved style transfer, generating calligraphy by specifying the calligrapher, font, and character style remains challenging. To address this, we propose a new Chinese calligraphy generation model 'Moyun' , which replaces the Unet in the Diffusion model with Vision Mamba and introduces the TripleLabel control mechanism to achieve controllable calligraphy generation. The model was tested on our large-scale dataset 'Mobao' of over 1.9 million images, and the results demonstrate that 'Moyun' can effectively control the generation process and produce calligraphy in the specified style. Even for calligraphy the calligrapher has not written, 'Moyun' can generate calligraphy that matches the style of the calligrapher.

Via

Access Paper or Ask Questions

Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Sep 25, 2024

Xian Wang, Jin Zhou, Yuanli Feng, Jiahao Mei, Jiming Chen, Shuo Li

Figure 1 for Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Figure 2 for Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Figure 3 for Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Figure 4 for Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Abstract:Recent innovations in autonomous drones have facilitated time-optimal flight in single-drone configurations and enhanced maneuverability in multi-drone systems through the application of optimal control and learning-based methods. However, few studies have achieved time-optimal motion planning for multi-drone systems, particularly during highly agile maneuvers or in dynamic scenarios. This paper presents a decentralized policy network for time-optimal multi-drone flight using multi-agent reinforcement learning. To strike a balance between flight efficiency and collision avoidance, we introduce a soft collision penalty inspired by optimization-based methods. By customizing PPO in a centralized training, decentralized execution (CTDE) fashion, we unlock higher efficiency and stability in training, while ensuring lightweight implementation. Extensive simulations show that, despite slight performance trade-offs compared to single-drone systems, our multi-drone approach maintains near-time-optimal performance with low collision rates. Real-world experiments validate our method, with two quadrotors using the same network as simulation achieving a maximum speed of 13.65 m/s and a maximum body rate of 13.4 rad/s in a 5.5 m * 5.5 m * 2.0 m space across various tracks, relying entirely on onboard computation.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

TEAdapter: Supply abundant guidance for controllable text-to-music generation

Aug 09, 2024

Jialing Zou, Jiahao Mei, Xudong Nan, Jinghua Li, Daoguo Dong, Liang He

Figure 1 for TEAdapter: Supply abundant guidance for controllable text-to-music generation

Figure 2 for TEAdapter: Supply abundant guidance for controllable text-to-music generation

Figure 3 for TEAdapter: Supply abundant guidance for controllable text-to-music generation

Figure 4 for TEAdapter: Supply abundant guidance for controllable text-to-music generation

Abstract:Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In addition, we explore the controllable generation of extended music by leveraging TEAdapter control groups trained on data of distinct structural functionalities. In general, we consider controls over global, elemental, and structural levels. Experimental results demonstrate that the proposed TEAdapter enables multiple precise controls and ensures high-quality music generation. Our module is also lightweight and transferable to any diffusion model architecture. Available code and demos will be found soon at https://github.com/Ashley1101/TEAdapter.

* 2024 IEEE International Conference on Multimedia and Expo (ICME 2024)
* Accepted by ICME'24: IEEE International Conference on Multimedia and Expo

Via

Access Paper or Ask Questions

Online Time-Optimal Trajectory Generation for Two Quadrotors with Multi-Waypoints Constraints

Feb 28, 2024

Fangguo Zhao, Jiahao Mei, Jin Zhou, Jiming Chen, Shuo Li

Figure 1 for Online Time-Optimal Trajectory Generation for Two Quadrotors with Multi-Waypoints Constraints

Figure 2 for Online Time-Optimal Trajectory Generation for Two Quadrotors with Multi-Waypoints Constraints

Figure 3 for Online Time-Optimal Trajectory Generation for Two Quadrotors with Multi-Waypoints Constraints

Figure 4 for Online Time-Optimal Trajectory Generation for Two Quadrotors with Multi-Waypoints Constraints

Abstract:The autonomous quadrotor's flying speed has kept increasing in the past 5 years, especially in the field of autonomous drone racing. However, the majority of the research mainly focuses on the aggressive flight of a single quadrotor. In this letter, we propose a novel method called Pairwise Model Predictive Control (PMPC) that can guide two quadrotors online to fly through the waypoints with minimum time without collisions. The flight task is first modeled as a nonlinear optimization problem and then an efficient two-step mass point velocity search method is used to provide initial values and references to improve the solving efficiency so that the method can run online with a frequency of 50 Hz and can handle dynamic waypoints. The simulation and real-world experiments validate the feasibility of the proposed method and in the real-world experiments, the two quadrotors can achieve a top speed of 8.1m/s in a 6-waypoint racing track in a compact flying arena of 6m*4m*2m.

Via

Access Paper or Ask Questions

Imitation Learning-Based Online Time-Optimal Control with Multiple-Waypoint Constraints for Quadrotors

Feb 18, 2024

Jin Zhou, Jiahao Mei, Fangguo Zhao, Jiming Chen, Shuo Li

Abstract:Over the past decade, there has been a remarkable surge in utilizing quadrotors for various purposes due to their simple structure and aggressive maneuverability, such as search and rescue, delivery and autonomous drone racing, etc. One of the key challenges preventing quadrotors from being widely used in these scenarios is online waypoint-constrained time-optimal trajectory generation and control technique. This letter proposes an imitation learning-based online solution to efficiently navigate the quadrotor through multiple waypoints with time-optimal performance. The neural networks (WN&CNets) are trained to learn the control law from the dataset generated by the time-consuming CPC algorithm and then deployed to generate the optimal control commands online to guide the quadrotors. To address the challenge of limited training data and the hover maneuver at the final waypoint, we propose a transition phase strategy that utilizes polynomials to help the quadrotor 'jump over' the stop-and-go maneuver when switching waypoints. Our method is demonstrated in both simulation and real-world experiments, achieving a maximum speed of 7 m/s while navigating through 7 waypoints in a confined space of 6.0 m * 4.0 m * 2.0 m. The results show that with a slight loss in optimality, the WN&CNets significantly reduce the processing time and enable online optimal control for multiple-waypoint-constrained flight tasks.

Via

Access Paper or Ask Questions