Picture for Weifeng Ge

Weifeng Ge

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

Add code
Oct 04, 2024
Viaarxiv icon

TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

Add code
Aug 28, 2024
Viaarxiv icon

Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection

Add code
Aug 28, 2024
Viaarxiv icon

Reading Relevant Feature from Global Representation Memory for Visual Object Tracking

Add code
Feb 26, 2024
Viaarxiv icon

Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering

Add code
Jan 28, 2024
Figure 1 for Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Figure 2 for Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Figure 3 for Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Figure 4 for Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Viaarxiv icon

Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge

Add code
Jan 19, 2024
Figure 1 for Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Figure 2 for Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Figure 3 for Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Figure 4 for Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Viaarxiv icon

ColoristaNet for Photorealistic Video Style Transfer

Add code
Dec 21, 2022
Viaarxiv icon

RankDNN: Learning to Rank for Few-shot Learning

Add code
Nov 29, 2022
Viaarxiv icon

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Add code
Mar 20, 2022
Figure 1 for FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
Figure 2 for FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
Figure 3 for FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
Figure 4 for FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
Viaarxiv icon

Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning

Add code
Mar 17, 2022
Figure 1 for Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
Figure 2 for Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
Figure 3 for Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
Figure 4 for Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
Viaarxiv icon