Picture for Deepanway Ghosal

Deepanway Ghosal

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

Add code
Dec 17, 2024
Viaarxiv icon

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Add code
Oct 17, 2024
Figure 1 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Figure 2 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Figure 3 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Figure 4 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Viaarxiv icon

Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning

Add code
Oct 16, 2024
Figure 1 for Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
Figure 2 for Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
Figure 3 for Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
Figure 4 for Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
Viaarxiv icon

Improving Text-To-Audio Models with Synthetic Captions

Add code
Jun 18, 2024
Viaarxiv icon

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Add code
Apr 16, 2024
Viaarxiv icon

PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

Add code
Mar 20, 2024
Viaarxiv icon

Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

Add code
Mar 13, 2024
Viaarxiv icon

Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs' Mathematical Competency through Ontology-guided Perturbations

Add code
Jan 17, 2024
Viaarxiv icon

Mustango: Toward Controllable Text-to-Music Generation

Add code
Nov 14, 2023
Viaarxiv icon

Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts

Add code
Oct 31, 2023
Viaarxiv icon