Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Robust Domain Generalization for Multi-modal Object Recognition

Aug 11, 2024

Yuxin Qiao, Keqin Li, Junhong Lin, Rong Wei, Chufeng Jiang, Yang Luo, Haoyu Yang

Figure 1 for Robust Domain Generalization for Multi-modal Object Recognition

Figure 2 for Robust Domain Generalization for Multi-modal Object Recognition

Figure 3 for Robust Domain Generalization for Multi-modal Object Recognition

Figure 4 for Robust Domain Generalization for Multi-modal Object Recognition

Share this with someone who'll enjoy it:

Abstract:In multi-label classification, machine learning encounters the challenge of domain generalization when handling tasks with distributions differing from the training data. Existing approaches primarily focus on vision object recognition and neglect the integration of natural language. Recent advancements in vision-language pre-training leverage supervision from extensive visual-language pairs, enabling learning across diverse domains and enhancing recognition in multi-modal scenarios. However, these approaches face limitations in loss function utilization, generality across backbones, and class-aware visual fusion. This paper proposes solutions to these limitations by inferring the actual loss, broadening evaluations to larger vision-language backbones, and introducing Mixup-CLIPood, which incorporates a novel mix-up loss for enhanced class-aware visual fusion. Our method demonstrates superior performance in domain generalization across multiple datasets.

* 6 pages, 2 figures. This is a preprint version of the article. The final version will be published in the proceedings of the IEEE conference

View paper on

Share this with someone who'll enjoy it:

Title:Robust Domain Generalization for Multi-modal Object Recognition

Paper and Code