Picture for Yazhou Yao

Yazhou Yao

Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition

Add code
Apr 14, 2025
Viaarxiv icon

Efficient Token Compression for Vision Transformer with Spatial Information Preserved

Add code
Mar 30, 2025
Figure 1 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Figure 2 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Figure 3 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Figure 4 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Viaarxiv icon

Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels

Add code
Feb 27, 2025
Figure 1 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Figure 2 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Figure 3 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Figure 4 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Viaarxiv icon

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Add code
Dec 25, 2024
Figure 1 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 2 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 3 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 4 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Viaarxiv icon

The Key of Understanding Vision Tasks: Explanatory Instructions

Add code
Dec 24, 2024
Figure 1 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 2 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 3 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 4 for The Key of Understanding Vision Tasks: Explanatory Instructions
Viaarxiv icon

FTMoMamba: Motion Generation with Frequency and Text State Space Models

Add code
Nov 26, 2024
Viaarxiv icon

UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation

Add code
Nov 25, 2024
Viaarxiv icon

COMOGen: A Controllable Text-to-3D Multi-object Generation Framework

Add code
Sep 01, 2024
Figure 1 for COMOGen: A Controllable Text-to-3D Multi-object Generation Framework
Figure 2 for COMOGen: A Controllable Text-to-3D Multi-object Generation Framework
Figure 3 for COMOGen: A Controllable Text-to-3D Multi-object Generation Framework
Figure 4 for COMOGen: A Controllable Text-to-3D Multi-object Generation Framework
Viaarxiv icon

Relating CNN-Transformer Fusion Network for Change Detection

Add code
Jul 03, 2024
Figure 1 for Relating CNN-Transformer Fusion Network for Change Detection
Figure 2 for Relating CNN-Transformer Fusion Network for Change Detection
Figure 3 for Relating CNN-Transformer Fusion Network for Change Detection
Figure 4 for Relating CNN-Transformer Fusion Network for Change Detection
Viaarxiv icon

Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation

Add code
Jul 03, 2024
Viaarxiv icon