Picture for Gabriel Sarch

Gabriel Sarch

Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames

Add code
May 30, 2025
Figure 1 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Figure 2 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Figure 3 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Figure 4 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Viaarxiv icon

Grounded Reinforcement Learning for Visual Reasoning

Add code
May 29, 2025
Figure 1 for Grounded Reinforcement Learning for Visual Reasoning
Figure 2 for Grounded Reinforcement Learning for Visual Reasoning
Figure 3 for Grounded Reinforcement Learning for Visual Reasoning
Figure 4 for Grounded Reinforcement Learning for Visual Reasoning
Viaarxiv icon

Grounding Task Assistance with Multimodal Cues from a Single Demonstration

Add code
May 02, 2025
Figure 1 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Figure 2 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Figure 3 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Figure 4 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Viaarxiv icon

ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights

Add code
Jun 20, 2024
Viaarxiv icon

Neural Representations of Dynamic Visual Stimuli

Add code
Jun 04, 2024
Figure 1 for Neural Representations of Dynamic Visual Stimuli
Figure 2 for Neural Representations of Dynamic Visual Stimuli
Figure 3 for Neural Representations of Dynamic Visual Stimuli
Figure 4 for Neural Representations of Dynamic Visual Stimuli
Viaarxiv icon

HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models

Add code
Apr 29, 2024
Viaarxiv icon

ODIN: A Single Model for 2D and 3D Perception

Add code
Jan 04, 2024
Figure 1 for ODIN: A Single Model for 2D and 3D Perception
Figure 2 for ODIN: A Single Model for 2D and 3D Perception
Figure 3 for ODIN: A Single Model for 2D and 3D Perception
Figure 4 for ODIN: A Single Model for 2D and 3D Perception
Viaarxiv icon

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

Add code
Oct 23, 2023
Figure 1 for Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Figure 2 for Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Figure 3 for Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Figure 4 for Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Viaarxiv icon

3D View Prediction Models of the Dorsal Visual Stream

Add code
Sep 04, 2023
Figure 1 for 3D View Prediction Models of the Dorsal Visual Stream
Figure 2 for 3D View Prediction Models of the Dorsal Visual Stream
Viaarxiv icon

TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

Add code
Jul 21, 2022
Figure 1 for TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors
Figure 2 for TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors
Figure 3 for TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors
Figure 4 for TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors
Viaarxiv icon