Picture for Dohwan Ko

Dohwan Ko

ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models

Add code
Mar 26, 2025
Viaarxiv icon

LLaMo: Large Language Model-based Molecular Graph Assistant

Add code
Oct 31, 2024
Viaarxiv icon

Large Language Models are Temporal and Causal Reasoners for Video Question Answering

Add code
Nov 06, 2023
Viaarxiv icon

Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

Add code
Aug 18, 2023
Viaarxiv icon

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

Add code
Mar 23, 2023
Viaarxiv icon

Video-Text Representation Learning via Differentiable Weak Temporal Alignment

Add code
Mar 31, 2022
Figure 1 for Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Figure 2 for Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Figure 3 for Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Figure 4 for Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Viaarxiv icon