Picture for Dan Guo

Dan Guo

A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli

Add code
Mar 20, 2025
Viaarxiv icon

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Add code
Feb 11, 2025
Viaarxiv icon

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring

Add code
Jan 16, 2025
Figure 1 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Figure 2 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Figure 3 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Figure 4 for AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Viaarxiv icon

Linguistics-Vision Monotonic Consistent Network for Sign Language Production

Add code
Dec 22, 2024
Viaarxiv icon

MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights

Add code
Dec 21, 2024
Figure 1 for MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Figure 2 for MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Figure 3 for MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Figure 4 for MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Viaarxiv icon

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

Add code
Dec 19, 2024
Figure 1 for Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Figure 2 for Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Figure 3 for Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Figure 4 for Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Viaarxiv icon

Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production

Add code
Dec 19, 2024
Figure 1 for Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
Figure 2 for Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
Figure 3 for Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
Figure 4 for Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
Viaarxiv icon

Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing

Add code
Dec 17, 2024
Viaarxiv icon

Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration

Add code
Dec 17, 2024
Figure 1 for Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Figure 2 for Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Figure 3 for Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Figure 4 for Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Viaarxiv icon

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding

Add code
Dec 17, 2024
Figure 1 for ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Figure 2 for ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Figure 3 for ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Figure 4 for ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Viaarxiv icon