Picture for Yaoxian Song

Yaoxian Song

Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Add code
Oct 14, 2024
Figure 1 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Figure 2 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Figure 3 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Figure 4 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Viaarxiv icon

3D Question Answering for City Scene Understanding

Add code
Jul 24, 2024
Viaarxiv icon

Multi-Task Domain Adaptation for Language Grounding with 3D Objects

Add code
Jul 03, 2024
Figure 1 for Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Figure 2 for Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Figure 3 for Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Figure 4 for Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Viaarxiv icon

Flickr30K-CFQ: A Compact and Fragmented Query Dataset for Text-image Retrieval

Add code
Apr 01, 2024
Viaarxiv icon

Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding

Add code
Jan 27, 2023
Viaarxiv icon

Human-in-the-loop Robotic Grasping using BERT Scene Representation

Add code
Sep 28, 2022
Figure 1 for Human-in-the-loop Robotic Grasping using BERT Scene Representation
Figure 2 for Human-in-the-loop Robotic Grasping using BERT Scene Representation
Figure 3 for Human-in-the-loop Robotic Grasping using BERT Scene Representation
Figure 4 for Human-in-the-loop Robotic Grasping using BERT Scene Representation
Viaarxiv icon

Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning

Add code
Sep 01, 2020
Figure 1 for Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning
Figure 2 for Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning
Figure 3 for Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning
Figure 4 for Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning
Viaarxiv icon

Deep Robotic Prediction with hierarchical RGB-D Fusion

Add code
Sep 17, 2019
Figure 1 for Deep Robotic Prediction with hierarchical RGB-D Fusion
Figure 2 for Deep Robotic Prediction with hierarchical RGB-D Fusion
Figure 3 for Deep Robotic Prediction with hierarchical RGB-D Fusion
Figure 4 for Deep Robotic Prediction with hierarchical RGB-D Fusion
Viaarxiv icon