Picture for Zhenzhen Hu

Zhenzhen Hu

Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval

Add code
Oct 09, 2024
Viaarxiv icon

UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos

Add code
Sep 10, 2024
Figure 1 for UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos
Figure 2 for UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos
Figure 3 for UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos
Figure 4 for UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos
Viaarxiv icon

Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

Add code
Sep 09, 2024
Viaarxiv icon

Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering

Add code
Oct 27, 2023
Viaarxiv icon

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning

Add code
Jul 19, 2023
Viaarxiv icon

Compact Bidirectional Transformer for Image Captioning

Add code
Jan 06, 2022
Figure 1 for Compact Bidirectional Transformer for Image Captioning
Figure 2 for Compact Bidirectional Transformer for Image Captioning
Figure 3 for Compact Bidirectional Transformer for Image Captioning
Figure 4 for Compact Bidirectional Transformer for Image Captioning
Viaarxiv icon

Semi-Autoregressive Transformer for Image Captioning

Add code
Jun 17, 2021
Figure 1 for Semi-Autoregressive Transformer for Image Captioning
Figure 2 for Semi-Autoregressive Transformer for Image Captioning
Figure 3 for Semi-Autoregressive Transformer for Image Captioning
Figure 4 for Semi-Autoregressive Transformer for Image Captioning
Viaarxiv icon

More Grounded Image Captioning by Distilling Image-Text Matching Model

Add code
Apr 01, 2020
Figure 1 for More Grounded Image Captioning by Distilling Image-Text Matching Model
Figure 2 for More Grounded Image Captioning by Distilling Image-Text Matching Model
Figure 3 for More Grounded Image Captioning by Distilling Image-Text Matching Model
Figure 4 for More Grounded Image Captioning by Distilling Image-Text Matching Model
Viaarxiv icon

Quality-aware Unpaired Image-to-Image Translation

Add code
Mar 15, 2019
Figure 1 for Quality-aware Unpaired Image-to-Image Translation
Figure 2 for Quality-aware Unpaired Image-to-Image Translation
Figure 3 for Quality-aware Unpaired Image-to-Image Translation
Figure 4 for Quality-aware Unpaired Image-to-Image Translation
Viaarxiv icon