Picture for Silvio Savarese

Silvio Savarese

ViUniT: Visual Unit Tests for More Robust Visual Programming

Add code
Dec 12, 2024
Viaarxiv icon

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

Add code
Dec 10, 2024
Viaarxiv icon

CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval

Add code
Nov 19, 2024
Figure 1 for CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval
Figure 2 for CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval
Figure 3 for CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval
Figure 4 for CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval
Viaarxiv icon

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Add code
Nov 12, 2024
Viaarxiv icon

CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models

Add code
Nov 07, 2024
Figure 1 for CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
Figure 2 for CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
Figure 3 for CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
Figure 4 for CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
Viaarxiv icon

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Add code
Nov 06, 2024
Viaarxiv icon

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments

Add code
Nov 04, 2024
Viaarxiv icon

Asynchronous Tool Usage for Real-Time Agents

Add code
Oct 28, 2024
Viaarxiv icon

PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

Add code
Oct 24, 2024
Figure 1 for PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Figure 2 for PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Figure 3 for PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Figure 4 for PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Viaarxiv icon

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Add code
Oct 21, 2024
Figure 1 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 2 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 3 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 4 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Viaarxiv icon