Picture for Jinyi Hu

Jinyi Hu

Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

Add code
Dec 11, 2024
Viaarxiv icon

ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer

Add code
Dec 10, 2024
Viaarxiv icon

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon

AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

Add code
Aug 31, 2024
Figure 1 for AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Figure 2 for AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Figure 3 for AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Figure 4 for AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Viaarxiv icon

GUICourse: From General Vision Language Models to Versatile GUI Agents

Add code
Jun 17, 2024
Figure 1 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Figure 2 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Figure 3 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Figure 4 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Viaarxiv icon

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

Add code
Jun 08, 2024
Figure 1 for Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Figure 2 for Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Figure 3 for Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Figure 4 for Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Viaarxiv icon

LEGENT: Open Platform for Embodied Agents

Add code
Apr 28, 2024
Figure 1 for LEGENT: Open Platform for Embodied Agents
Figure 2 for LEGENT: Open Platform for Embodied Agents
Figure 3 for LEGENT: Open Platform for Embodied Agents
Figure 4 for LEGENT: Open Platform for Embodied Agents
Viaarxiv icon

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

Add code
Feb 21, 2024
Figure 1 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Figure 2 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Figure 3 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Figure 4 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Viaarxiv icon

Exploring Perceptual Limitation of Multimodal Large Language Models

Add code
Feb 12, 2024
Figure 1 for Exploring Perceptual Limitation of Multimodal Large Language Models
Figure 2 for Exploring Perceptual Limitation of Multimodal Large Language Models
Figure 3 for Exploring Perceptual Limitation of Multimodal Large Language Models
Figure 4 for Exploring Perceptual Limitation of Multimodal Large Language Models
Viaarxiv icon

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Add code
Dec 01, 2023
Viaarxiv icon