Picture for Kwonjoon Lee

Kwonjoon Lee

GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks

Add code
Mar 09, 2025
Viaarxiv icon

Can Hallucination Correction Improve Video-Language Alignment?

Add code
Feb 20, 2025
Viaarxiv icon

Generalized Mission Planning for Heterogeneous Multi-Robot Teams via LLM-constructed Hierarchical Trees

Add code
Jan 27, 2025
Viaarxiv icon

Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data

Add code
Nov 05, 2024
Figure 1 for Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
Figure 2 for Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
Figure 3 for Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
Figure 4 for Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
Viaarxiv icon

Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge

Add code
Nov 05, 2024
Figure 1 for Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge
Figure 2 for Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge
Figure 3 for Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge
Figure 4 for Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge
Viaarxiv icon

Symbolic Graph Inference for Compound Scene Understanding

Add code
Oct 30, 2024
Figure 1 for Symbolic Graph Inference for Compound Scene Understanding
Figure 2 for Symbolic Graph Inference for Compound Scene Understanding
Figure 3 for Symbolic Graph Inference for Compound Scene Understanding
Viaarxiv icon

M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

Add code
Jul 19, 2024
Figure 1 for M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
Figure 2 for M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
Figure 3 for M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
Figure 4 for M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
Viaarxiv icon

Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

Add code
Jul 14, 2024
Viaarxiv icon

Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models

Add code
May 30, 2024
Viaarxiv icon

Vamos: Versatile Action Models for Video Understanding

Add code
Nov 22, 2023
Viaarxiv icon