Picture for Komei Sugiura

Komei Sugiura

Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations

Add code
Oct 01, 2024
Viaarxiv icon

DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning

Add code
Sep 28, 2024
Figure 1 for DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Figure 2 for DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Figure 3 for DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Figure 4 for DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Viaarxiv icon

DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions

Add code
Aug 15, 2024
Viaarxiv icon

Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks

Add code
Jul 18, 2024
Figure 1 for Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Figure 2 for Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Figure 3 for Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Figure 4 for Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Viaarxiv icon

Layer-Wise Relevance Propagation with Conservation Property for ResNet

Add code
Jul 12, 2024
Figure 1 for Layer-Wise Relevance Propagation with Conservation Property for ResNet
Figure 2 for Layer-Wise Relevance Propagation with Conservation Property for ResNet
Figure 3 for Layer-Wise Relevance Propagation with Conservation Property for ResNet
Figure 4 for Layer-Wise Relevance Propagation with Conservation Property for ResNet
Viaarxiv icon

Co-Scale Cross-Attentional Transformer for Rearrangement Target Detection

Add code
Jul 06, 2024
Viaarxiv icon

Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models

Add code
Jul 01, 2024
Figure 1 for Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Figure 2 for Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Figure 3 for Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Figure 4 for Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Viaarxiv icon

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Add code
Feb 28, 2024
Viaarxiv icon

Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine

Add code
Dec 26, 2023
Viaarxiv icon

DialMAT: Dialogue-Enabled Transformer with Moment-Based Adversarial Training

Add code
Nov 12, 2023
Viaarxiv icon