Picture for Mengjie Zhao

Mengjie Zhao

VinaBench: Benchmark for Faithful and Consistent Visual Narratives

Add code
Mar 26, 2025
Viaarxiv icon

Cross-Modal Learning for Music-to-Music-Video Description Generation

Add code
Mar 14, 2025
Viaarxiv icon

DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning

Add code
Feb 18, 2025
Viaarxiv icon

OpenMU: Your Swiss Army Knife for Music Understanding

Add code
Oct 21, 2024
Viaarxiv icon

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

Add code
Oct 08, 2024
Figure 1 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Figure 2 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Figure 3 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Figure 4 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Viaarxiv icon

Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models

Add code
Oct 02, 2024
Figure 1 for Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Figure 2 for Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Figure 3 for Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Figure 4 for Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Viaarxiv icon

Graph Neural Networks for Virtual Sensing in Complex Systems: Addressing Heterogeneous Temporal Dynamics

Add code
Jul 26, 2024
Viaarxiv icon

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

Add code
Jun 26, 2024
Figure 1 for SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Figure 2 for SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Figure 3 for SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Figure 4 for SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Viaarxiv icon

ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark

Add code
Jun 17, 2024
Figure 1 for ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark
Figure 2 for ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark
Figure 3 for ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark
Figure 4 for ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark
Viaarxiv icon

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Add code
May 23, 2024
Figure 1 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Figure 2 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Figure 3 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Figure 4 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Viaarxiv icon