Picture for Yue Cao

Yue Cao

Learning Novel Skills from Language-Generated Demonstrations

Add code
Dec 12, 2024
Viaarxiv icon

MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents

Add code
Dec 11, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Viaarxiv icon

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

Add code
Nov 28, 2024
Figure 1 for SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Figure 2 for SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Figure 3 for SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Figure 4 for SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Viaarxiv icon

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Add code
Nov 15, 2024
Figure 1 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 2 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 3 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 4 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Viaarxiv icon

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

Add code
Oct 15, 2024
Figure 1 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Figure 2 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Figure 3 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Figure 4 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Viaarxiv icon

BA-Net: Bridge Attention in Deep Neural Networks

Add code
Oct 10, 2024
Viaarxiv icon

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

Add code
Jul 22, 2024
Figure 1 for MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Figure 2 for MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Figure 3 for MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Figure 4 for MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Viaarxiv icon

Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection

Add code
Mar 20, 2024
Figure 1 for Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
Figure 2 for Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
Figure 3 for Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
Figure 4 for Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
Viaarxiv icon

Correlation-Embedded Transformer Tracking: A Single-Branch Framework

Add code
Jan 23, 2024
Viaarxiv icon