Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

Aug 14, 2024

Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, Jingdi Lei, Qian Tan, Cai Zhou, Wei Liu, Weiyun Wang, Zhe Chen(+7 more)

Figure 1 for Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

Figure 2 for Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

Figure 3 for Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

Figure 4 for Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

Share this with someone who'll enjoy it:

Abstract:In this technical report, we propose ChemVLM, the first open-source multimodal large language model dedicated to the fields of chemistry, designed to address the incompatibility between chemical image understanding and text analysis. Built upon the VIT-MLP-LLM architecture, we leverage ChemLLM-20B as the foundational large model, endowing our model with robust capabilities in understanding and utilizing chemical text knowledge. Additionally, we employ InternVIT-6B as a powerful image encoder. We have curated high-quality data from the chemical domain, including molecules, reaction formulas, and chemistry examination data, and compiled these into a bilingual multimodal question-answering dataset. We test the performance of our model on multiple open-source benchmarks and three custom evaluation sets. Experimental results demonstrate that our model achieves excellent performance, securing state-of-the-art results in five out of six involved tasks. Our model can be found at https://huggingface.co/AI4Chem/ChemVLM-26B.

* Techical report

View paper on

Share this with someone who'll enjoy it:

Title:Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

Paper and Code