Picture for Yidi Li

Yidi Li

STNet: Deep Audio-Visual Fusion Network for Robust Speaker Tracking

Add code
Oct 08, 2024
Viaarxiv icon

PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

Add code
Aug 26, 2024
Viaarxiv icon

Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

Add code
Aug 26, 2024
Viaarxiv icon

Joint Adversarial and Collaborative Learning for Self-Supervised Action Recognition

Add code
Jul 15, 2023
Figure 1 for Joint Adversarial and Collaborative Learning for Self-Supervised Action Recognition
Figure 2 for Joint Adversarial and Collaborative Learning for Self-Supervised Action Recognition
Figure 3 for Joint Adversarial and Collaborative Learning for Self-Supervised Action Recognition
Figure 4 for Joint Adversarial and Collaborative Learning for Self-Supervised Action Recognition
Viaarxiv icon

Feature Completion Transformer for Occluded Person Re-identification

Add code
Mar 03, 2023
Viaarxiv icon

DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic Layouts

Add code
Mar 22, 2022
Figure 1 for DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic Layouts
Figure 2 for DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic Layouts
Figure 3 for DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic Layouts
Figure 4 for DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic Layouts
Viaarxiv icon

Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking

Add code
Dec 14, 2021
Figure 1 for Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking
Figure 2 for Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking
Figure 3 for Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking
Figure 4 for Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking
Viaarxiv icon