Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An Improved Baseline for Reasoning Segmentation with Large Language Model

Jan 03, 2024

Senqiao Yang, Tianyuan Qu, Xin Lai, Zhuotao Tian, Bohao Peng, Shu Liu, Jiaya Jia

Figure 1 for An Improved Baseline for Reasoning Segmentation with Large Language Model

Figure 2 for An Improved Baseline for Reasoning Segmentation with Large Language Model

Figure 3 for An Improved Baseline for Reasoning Segmentation with Large Language Model

Figure 4 for An Improved Baseline for Reasoning Segmentation with Large Language Model

Share this with someone who'll enjoy it:

Abstract:While LISA effectively bridges the gap between segmentation and large language models to enable reasoning segmentation, it poses certain limitations: unable to distinguish different instances of the target region, and constrained by the pre-defined textual response formats. In this work, we introduce LISA++, an update to the existing LISA model, focusing on improving core functionalities while keeping the base architecture intact. The main enhancements in LISA++ include: \textbf{1) Enhanced Segmentation}: The instance segmentation ability has been added, providing a more detailed scene analysis along with the existing multi-region semantic segmentation. \textbf{2) More Natural Conversation}: Improved capability for multi-turn dialogue, with the ability to incorporate segmentation results directly into text responses, i.e., Segmentation in Dialogue (SiD). These improvements are achieved by curating the existing samples of generic segmentation datasets, aimed specifically at enhancing the segmentation and conversational skills without structural change and additional data sources. Comparative analysis with the original LISA model shows significant advancements in these areas, positioning LISA++ as a notable upgrade in visual understanding and interaction. LISA++'s adaptability and improved features highlight the versatility of the mask-as-embedding paradigm proposed by LISA, and the potential as a foundational model for diverse applications.

* Tech report. The LaTex compilation crash was fixed

View paper on

Share this with someone who'll enjoy it:

Title:An Improved Baseline for Reasoning Segmentation with Large Language Model

Paper and Code