Picture for Ruoxi Chen

Ruoxi Chen

Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment

Add code
Nov 26, 2024
Viaarxiv icon

Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination

Add code
Nov 15, 2024
Viaarxiv icon

Investigating and Defending Shortcut Learning in Personalized Diffusion Models

Add code
Jun 27, 2024
Viaarxiv icon

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Add code
Mar 22, 2024
Figure 1 for Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Figure 2 for Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Figure 3 for Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Figure 4 for Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Viaarxiv icon

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Add code
Feb 28, 2024
Figure 1 for Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Figure 2 for Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Figure 3 for Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Figure 4 for Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Viaarxiv icon

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Add code
Feb 07, 2024
Viaarxiv icon

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

Add code
Feb 05, 2024
Viaarxiv icon

AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking

Add code
Mar 25, 2023
Viaarxiv icon

Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection

Add code
Jun 17, 2022
Figure 1 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Figure 2 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Figure 3 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Figure 4 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Viaarxiv icon

Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization

Add code
May 01, 2022
Figure 1 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Figure 2 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Figure 3 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Figure 4 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Viaarxiv icon