Picture for Sheng Jin

Sheng Jin

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

Add code
Nov 04, 2024
Figure 1 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Figure 2 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Figure 3 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Figure 4 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Viaarxiv icon

Frame-Voyager: Learning to Query Frames for Video Large Language Models

Add code
Oct 07, 2024
Viaarxiv icon

FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation

Add code
Sep 05, 2024
Figure 1 for FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation
Figure 2 for FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation
Figure 3 for FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation
Figure 4 for FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation
Viaarxiv icon

Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution

Add code
Jul 23, 2024
Viaarxiv icon

ESOD: Efficient Small Object Detection on High-Resolution Images

Add code
Jul 23, 2024
Viaarxiv icon

Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions

Add code
Jul 23, 2024
Viaarxiv icon

TCFormer: Visual Recognition via Token Clustering Transformer

Add code
Jul 16, 2024
Viaarxiv icon

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Add code
Jul 14, 2024
Viaarxiv icon

F-LMM: Grounding Frozen Large Multimodal Models

Add code
Jun 09, 2024
Viaarxiv icon

MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders

Add code
May 13, 2024
Viaarxiv icon