Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Large Language Models as Visual Cross-Domain Learners

Jan 06, 2024

Shuhao Chen, Yulong Zhang, Weisen Jiang, Jiangang Lu, Yu Zhang

Figure 1 for Large Language Models as Visual Cross-Domain Learners

Figure 2 for Large Language Models as Visual Cross-Domain Learners

Figure 3 for Large Language Models as Visual Cross-Domain Learners

Figure 4 for Large Language Models as Visual Cross-Domain Learners

Share this with someone who'll enjoy it:

Abstract:Recent advances achieved by deep learning models rely on the independent and identically distributed assumption, hindering their applications in real-world scenarios with domain shifts. To address the above issues, cross-domain learning aims at extracting domain-invariant knowledge to reduce the domain shift between training and testing data. However, in visual cross-domain learning, traditional methods concentrate solely on the image modality, neglecting the use of the text modality to alleviate the domain shift. In this work, we propose Large Language models as Visual cross-dOmain learners (LLaVO). LLaVO uses vision-language models to convert images into detailed textual descriptions. A large language model is then finetuned on textual descriptions of the source/target domain generated by a designed instruction template. Extensive experimental results on various cross-domain tasks under the domain generalization and unsupervised domain adaptation settings have demonstrated the effectiveness of the proposed method.

View paper on

Share this with someone who'll enjoy it:

Title:Large Language Models as Visual Cross-Domain Learners

Paper and Code