Picture for Wenhao Huang

Wenhao Huang

SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents

Add code
Dec 17, 2024
Viaarxiv icon

FullStack Bench: Evaluating LLMs as Full Stack Coders

Add code
Dec 03, 2024
Figure 1 for FullStack Bench: Evaluating LLMs as Full Stack Coders
Figure 2 for FullStack Bench: Evaluating LLMs as Full Stack Coders
Figure 3 for FullStack Bench: Evaluating LLMs as Full Stack Coders
Figure 4 for FullStack Bench: Evaluating LLMs as Full Stack Coders
Viaarxiv icon

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation

Add code
Dec 02, 2024
Figure 1 for Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Figure 2 for Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Figure 3 for Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Figure 4 for Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Viaarxiv icon

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Add code
Oct 29, 2024
Figure 1 for AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Figure 2 for AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Figure 3 for AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Figure 4 for AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Viaarxiv icon

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Add code
Oct 17, 2024
Figure 1 for A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Figure 2 for A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Figure 3 for A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Figure 4 for A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Viaarxiv icon

Can MLLMs Understand the Deep Implication Behind Chinese Images?

Add code
Oct 17, 2024
Figure 1 for Can MLLMs Understand the Deep Implication Behind Chinese Images?
Figure 2 for Can MLLMs Understand the Deep Implication Behind Chinese Images?
Figure 3 for Can MLLMs Understand the Deep Implication Behind Chinese Images?
Figure 4 for Can MLLMs Understand the Deep Implication Behind Chinese Images?
Viaarxiv icon

PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment

Add code
Oct 17, 2024
Figure 1 for PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
Figure 2 for PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
Figure 3 for PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
Figure 4 for PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
Viaarxiv icon

ING-VP: MLLMs cannot Play Easy Vision-based Games Yet

Add code
Oct 09, 2024
Figure 1 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 2 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 3 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 4 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Viaarxiv icon

PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

Add code
Oct 09, 2024
Viaarxiv icon

MIO: A Foundation Model on Multimodal Tokens

Add code
Sep 26, 2024
Figure 1 for MIO: A Foundation Model on Multimodal Tokens
Figure 2 for MIO: A Foundation Model on Multimodal Tokens
Figure 3 for MIO: A Foundation Model on Multimodal Tokens
Figure 4 for MIO: A Foundation Model on Multimodal Tokens
Viaarxiv icon