Picture for Cho-Jui Hsieh

Cho-Jui Hsieh

Text is All You Need for Vision-Language Model Jailbreaking

Add code
Jan 31, 2026
Viaarxiv icon

LoL: Longer than Longer, Scaling Video Generation to Hour

Add code
Jan 23, 2026
Viaarxiv icon

FlexAct: Why Learn when you can Pick?

Add code
Jan 10, 2026
Viaarxiv icon

Towards Building efficient Routed systems for Retrieval

Add code
Jan 10, 2026
Viaarxiv icon

Understanding Reward Hacking in Text-to-Image Reinforcement Learning

Add code
Jan 06, 2026
Viaarxiv icon

Uncertainty-Guided Selective Adaptation Enables Cross-Platform Predictive Fluorescence Microscopy

Add code
Nov 15, 2025
Viaarxiv icon

Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models

Add code
Nov 15, 2025
Viaarxiv icon

DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation

Add code
Oct 16, 2025
Figure 1 for DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
Figure 2 for DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
Figure 3 for DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
Figure 4 for DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
Viaarxiv icon

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Add code
Oct 02, 2025
Viaarxiv icon

Concepts or Skills? Rethinking Instruction Selection for Multi-modal Models

Add code
Aug 14, 2025
Viaarxiv icon