Picture for Yu Kong

Yu Kong

H-MoRe: Learning Human-centric Motion Representation for Action Analysis

Add code
Apr 14, 2025
Viaarxiv icon

Are We Merely Justifying Results ex Post Facto? Quantifying Explanatory Inversion in Post-Hoc Model Explanations

Add code
Apr 11, 2025
Viaarxiv icon

Window Token Concatenation for Efficient Visual Large Language Models

Add code
Apr 05, 2025
Viaarxiv icon

Visual Large Language Models for Generalized and Specialized Applications

Add code
Jan 06, 2025
Figure 1 for Visual Large Language Models for Generalized and Specialized Applications
Figure 2 for Visual Large Language Models for Generalized and Specialized Applications
Figure 3 for Visual Large Language Models for Generalized and Specialized Applications
Figure 4 for Visual Large Language Models for Generalized and Specialized Applications
Viaarxiv icon

LiDAR-based End-to-end Temporal Perception for Vehicle-Infrastructure Cooperation

Add code
Nov 22, 2024
Viaarxiv icon

Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection

Add code
Nov 17, 2024
Figure 1 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 2 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 3 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 4 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Viaarxiv icon

A Survey of Multimodal Sarcasm Detection

Add code
Oct 24, 2024
Figure 1 for A Survey of Multimodal Sarcasm Detection
Figure 2 for A Survey of Multimodal Sarcasm Detection
Figure 3 for A Survey of Multimodal Sarcasm Detection
Figure 4 for A Survey of Multimodal Sarcasm Detection
Viaarxiv icon

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Add code
Sep 22, 2024
Figure 1 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 2 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 3 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 4 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Viaarxiv icon

SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding

Add code
Jul 06, 2024
Viaarxiv icon

Facial Affective Behavior Analysis with Instruction Tuning

Add code
Apr 07, 2024
Viaarxiv icon