Picture for Deepanway Ghosal

Deepanway Ghosal

The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

Add code
Feb 03, 2025
Viaarxiv icon

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

Add code
Dec 17, 2024
Viaarxiv icon

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Add code
Oct 17, 2024
Figure 1 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Figure 2 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Figure 3 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Figure 4 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Viaarxiv icon

Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning

Add code
Oct 16, 2024
Figure 1 for Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
Figure 2 for Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
Figure 3 for Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
Figure 4 for Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
Viaarxiv icon

Improving Text-To-Audio Models with Synthetic Captions

Add code
Jun 18, 2024
Figure 1 for Improving Text-To-Audio Models with Synthetic Captions
Figure 2 for Improving Text-To-Audio Models with Synthetic Captions
Figure 3 for Improving Text-To-Audio Models with Synthetic Captions
Figure 4 for Improving Text-To-Audio Models with Synthetic Captions
Viaarxiv icon

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Add code
Apr 16, 2024
Figure 1 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Figure 2 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Figure 3 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Figure 4 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Viaarxiv icon

PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

Add code
Mar 20, 2024
Viaarxiv icon

Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

Add code
Mar 13, 2024
Viaarxiv icon

Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs' Mathematical Competency through Ontology-guided Perturbations

Add code
Jan 17, 2024
Viaarxiv icon

Mustango: Toward Controllable Text-to-Music Generation

Add code
Nov 14, 2023
Figure 1 for Mustango: Toward Controllable Text-to-Music Generation
Figure 2 for Mustango: Toward Controllable Text-to-Music Generation
Figure 3 for Mustango: Toward Controllable Text-to-Music Generation
Figure 4 for Mustango: Toward Controllable Text-to-Music Generation
Viaarxiv icon