Picture for Sai Rajeswar

Sai Rajeswar

Grammar Search for Multi-Agent Systems

Add code
Dec 16, 2025
Viaarxiv icon

Grounding Computer Use Agents on Human Demonstrations

Add code
Nov 10, 2025
Viaarxiv icon

AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs

Add code
Sep 11, 2025
Viaarxiv icon

LALM-Eval: An Open-Source Toolkit for Holistic Evaluation of Large Audio Language Models

Add code
Sep 09, 2025
Viaarxiv icon

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Add code
May 27, 2025
Figure 1 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 2 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 3 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 4 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Viaarxiv icon

Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA

Add code
May 22, 2025
Figure 1 for Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA
Figure 2 for Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA
Figure 3 for Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA
Figure 4 for Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA
Viaarxiv icon

StarFlow: Generating Structured Workflow Outputs From Sketch Images

Add code
Mar 27, 2025
Figure 1 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Figure 2 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Figure 3 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Figure 4 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Viaarxiv icon

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Add code
Mar 19, 2025
Viaarxiv icon

PairBench: A Systematic Framework for Selecting Reliable Judge VLMs

Add code
Feb 21, 2025
Viaarxiv icon

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Add code
Feb 03, 2025
Figure 1 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 2 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 3 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 4 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Viaarxiv icon