Picture for Navonil Majumder

Navonil Majumder

NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

Add code
Nov 18, 2025
Figure 1 for NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
Figure 2 for NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
Figure 3 for NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
Figure 4 for NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
Viaarxiv icon

10 Open Challenges Steering the Future of Vision-Language-Action Models

Add code
Nov 08, 2025
Viaarxiv icon

JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment

Add code
Jul 28, 2025
Figure 1 for JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
Figure 2 for JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
Figure 3 for JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
Figure 4 for JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
Viaarxiv icon

Lessons from Training Grounded LLMs with Verifiable Rewards

Add code
Jun 18, 2025
Figure 1 for Lessons from Training Grounded LLMs with Verifiable Rewards
Figure 2 for Lessons from Training Grounded LLMs with Verifiable Rewards
Figure 3 for Lessons from Training Grounded LLMs with Verifiable Rewards
Figure 4 for Lessons from Training Grounded LLMs with Verifiable Rewards
Viaarxiv icon

NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks

Add code
Apr 28, 2025
Figure 1 for NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Figure 2 for NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Figure 3 for NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Figure 4 for NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Viaarxiv icon

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Add code
Dec 30, 2024
Viaarxiv icon

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Add code
Sep 17, 2024
Figure 1 for Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Figure 2 for Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Figure 3 for Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Figure 4 for Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Viaarxiv icon

Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

Add code
Jun 25, 2024
Figure 1 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Figure 2 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Figure 3 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Figure 4 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Viaarxiv icon

Improving Text-To-Audio Models with Synthetic Captions

Add code
Jun 18, 2024
Figure 1 for Improving Text-To-Audio Models with Synthetic Captions
Figure 2 for Improving Text-To-Audio Models with Synthetic Captions
Figure 3 for Improving Text-To-Audio Models with Synthetic Captions
Figure 4 for Improving Text-To-Audio Models with Synthetic Captions
Viaarxiv icon

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Add code
Apr 16, 2024
Figure 1 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Figure 2 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Figure 3 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Figure 4 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Viaarxiv icon