Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qiangong Zhou

Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2

Dec 02, 2024

Zhiting Wang, Qiangong Zhou, Zongyang Liu

Figure 1 for Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2

Figure 2 for Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2

Figure 3 for Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2

Figure 4 for Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2

Abstract:Segment Anything Model 2 (SAM2) demonstrates exceptional performance in video segmentation and refinement of segmentation results. We anticipate that it can further evolve to achieve higher levels of automation for practical applications. Building upon SAM2, we conducted a series of practices that ultimately led to the development of a fully automated pipeline, termed Det-SAM2, in which object prompts are automatically generated by a detection model to facilitate inference and refinement by SAM2. This pipeline enables inference on infinitely long video streams with constant VRAM and RAM usage, all while preserving the same efficiency and accuracy as the original SAM2. This technical report focuses on the construction of the overall Det-SAM2 framework and the subsequent engineering optimization applied to SAM2. We present a case demonstrating an application built on the Det-SAM2 framework: AI refereeing in a billiards scenario, derived from our business context. The project at \url{https://github.com/motern88/Det-SAM2}.

Via

Access Paper or Ask Questions

HiLight: Technical Report on the Motern AI Video Language Model

Jul 11, 2024

Zhiting Wang, Qiangong Zhou, Kangjie Yang, Zongyang Liu, Xin Mao

Figure 1 for HiLight: Technical Report on the Motern AI Video Language Model

Figure 2 for HiLight: Technical Report on the Motern AI Video Language Model

Figure 3 for HiLight: Technical Report on the Motern AI Video Language Model

Abstract:This technical report presents the implementation of a state-of-the-art video encoder for video-text modal alignment and a video conversation framework called HiLight, which features dual visual towers. The work is divided into two main parts: 1.alignment of video and text modalities; 2.convenient and efficient way to interact with users. Our goal is to address the task of video comprehension in the context of billiards. The report includes a discussion of the concepts and the final solution developed during the task's implementation.

Via

Access Paper or Ask Questions