Picture for Yu Kong

Yu Kong

Visual Large Language Models for Generalized and Specialized Applications

Add code
Jan 06, 2025
Viaarxiv icon

LiDAR-based End-to-end Temporal Perception for Vehicle-Infrastructure Cooperation

Add code
Nov 22, 2024
Viaarxiv icon

Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection

Add code
Nov 17, 2024
Figure 1 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 2 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 3 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 4 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Viaarxiv icon

A Survey of Multimodal Sarcasm Detection

Add code
Oct 24, 2024
Viaarxiv icon

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Add code
Sep 22, 2024
Figure 1 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 2 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 3 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 4 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Viaarxiv icon

SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding

Add code
Jul 06, 2024
Viaarxiv icon

Facial Affective Behavior Analysis with Instruction Tuning

Add code
Apr 07, 2024
Viaarxiv icon

The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

Add code
Feb 20, 2024
Viaarxiv icon

CSGNN: Conquering Noisy Node labels via Dynamic Class-wise Selection

Add code
Nov 20, 2023
Viaarxiv icon

Latent Space Energy-based Model for Fine-grained Open Set Recognition

Add code
Sep 19, 2023
Viaarxiv icon