Picture for Juan Carlos Niebles

Juan Carlos Niebles

ViUniT: Visual Unit Tests for More Robust Visual Programming

Add code
Dec 12, 2024
Viaarxiv icon

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

Add code
Dec 10, 2024
Viaarxiv icon

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

Add code
Dec 09, 2024
Viaarxiv icon

Streaming Detection of Queried Event Start

Add code
Dec 04, 2024
Viaarxiv icon

SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs

Add code
Nov 20, 2024
Viaarxiv icon

IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos

Add code
Nov 18, 2024
Viaarxiv icon

PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

Add code
Oct 24, 2024
Figure 1 for PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Figure 2 for PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Figure 3 for PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Figure 4 for PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Viaarxiv icon

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Add code
Oct 21, 2024
Figure 1 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 2 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 3 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 4 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Viaarxiv icon

xLAM: A Family of Large Action Models to Empower AI Agent Systems

Add code
Sep 05, 2024
Viaarxiv icon

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Add code
Aug 22, 2024
Figure 1 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 2 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 3 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 4 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Viaarxiv icon