Picture for Ruoxi Chen

Ruoxi Chen

Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment

Add code
Nov 26, 2024
Viaarxiv icon

Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination

Add code
Nov 15, 2024
Figure 1 for Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination
Figure 2 for Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination
Figure 3 for Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination
Figure 4 for Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination
Viaarxiv icon

Investigating and Defending Shortcut Learning in Personalized Diffusion Models

Add code
Jun 27, 2024
Viaarxiv icon

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Add code
Mar 22, 2024
Figure 1 for Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Figure 2 for Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Figure 3 for Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Figure 4 for Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Viaarxiv icon

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Add code
Feb 28, 2024
Figure 1 for Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Figure 2 for Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Figure 3 for Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Figure 4 for Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Viaarxiv icon

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Add code
Feb 07, 2024
Viaarxiv icon

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

Add code
Feb 05, 2024
Viaarxiv icon

AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking

Add code
Mar 25, 2023
Viaarxiv icon

Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection

Add code
Jun 17, 2022
Figure 1 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Figure 2 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Figure 3 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Figure 4 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Viaarxiv icon

Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization

Add code
May 01, 2022
Figure 1 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Figure 2 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Figure 3 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Figure 4 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Viaarxiv icon