Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chaoyi Ai

An analysis of HOI: using a training-free method with multimodal visual foundation models when only the test set is available, without the training set

Aug 11, 2024

Chaoyi Ai

Figure 1 for An analysis of HOI: using a training-free method with multimodal visual foundation models when only the test set is available, without the training set

Figure 2 for An analysis of HOI: using a training-free method with multimodal visual foundation models when only the test set is available, without the training set

Figure 3 for An analysis of HOI: using a training-free method with multimodal visual foundation models when only the test set is available, without the training set

Figure 4 for An analysis of HOI: using a training-free method with multimodal visual foundation models when only the test set is available, without the training set

Abstract:Human-Object Interaction (HOI) aims to identify the pairs of humans and objects in images and to recognize their relationships, ultimately forming $\langle human, object, verb \rangle$ triplets. Under default settings, HOI performance is nearly saturated, with many studies focusing on long-tail distribution and zero-shot/few-shot scenarios. Let us consider an intriguing problem:``What if there is only test dataset without training dataset, using multimodal visual foundation model in a training-free manner? '' This study uses two experimental settings: grounding truth and random arbitrary combinations. We get some interesting conclusion and find that the open vocabulary capabilities of the multimodal visual foundation model are not yet fully realized. Additionally, replacing the feature extraction with grounding DINO further confirms these findings.

Via

Access Paper or Ask Questions

Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Jul 26, 2024

Chaoyi Ai, Yong Jiang, Shen Huang, Pengjun Xie, Kewei Tu

Figure 1 for Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Figure 2 for Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Figure 3 for Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Figure 4 for Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Abstract:Named entity recognition (NER) models often struggle with noisy inputs, such as those with spelling mistakes or errors generated by Optical Character Recognition processes, and learning a robust NER model is challenging. Existing robust NER models utilize both noisy text and its corresponding gold text for training, which is infeasible in many real-world applications in which gold text is not available. In this paper, we consider a more realistic setting in which only noisy text and its NER labels are available. We propose to retrieve relevant text of the noisy text from a knowledge corpus and use it to enhance the representation of the original noisy input. We design three retrieval methods: sparse retrieval based on lexicon similarity, dense retrieval based on semantic similarity, and self-retrieval based on task-specific text. After retrieving relevant text, we concatenate the retrieved text with the original noisy text and encode them with a transformer network, utilizing self-attention to enhance the contextual token representations of the noisy text using the retrieved text. We further employ a multi-view training framework that improves robust NER without retrieving text during inference. Experiments show that our retrieval-augmented model achieves significant improvements in various noisy NER settings.

Via

Access Paper or Ask Questions