Picture for Mohit Bansal

Mohit Bansal

Shammie

TimeRefine: Temporal Grounding with Time Refining Video LLM

Add code
Dec 12, 2024
Figure 1 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 2 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 3 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 4 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Viaarxiv icon

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Add code
Dec 11, 2024
Viaarxiv icon

QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization

Add code
Dec 10, 2024
Viaarxiv icon

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Add code
Dec 07, 2024
Viaarxiv icon

Reverse Thinking Makes LLMs Stronger Reasoners

Add code
Nov 29, 2024
Figure 1 for Reverse Thinking Makes LLMs Stronger Reasoners
Figure 2 for Reverse Thinking Makes LLMs Stronger Reasoners
Figure 3 for Reverse Thinking Makes LLMs Stronger Reasoners
Figure 4 for Reverse Thinking Makes LLMs Stronger Reasoners
Viaarxiv icon

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

Add code
Nov 25, 2024
Figure 1 for DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
Figure 2 for DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
Figure 3 for DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
Figure 4 for DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
Viaarxiv icon

VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

Add code
Nov 22, 2024
Figure 1 for VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
Figure 2 for VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
Figure 3 for VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
Figure 4 for VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
Viaarxiv icon

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level

Add code
Nov 15, 2024
Figure 1 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Figure 2 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Figure 3 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Figure 4 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Viaarxiv icon

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Add code
Nov 07, 2024
Figure 1 for M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Figure 2 for M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Figure 3 for M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Figure 4 for M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Viaarxiv icon

Self-Consistency Preference Optimization

Add code
Nov 06, 2024
Figure 1 for Self-Consistency Preference Optimization
Figure 2 for Self-Consistency Preference Optimization
Figure 3 for Self-Consistency Preference Optimization
Figure 4 for Self-Consistency Preference Optimization
Viaarxiv icon