Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xipeng Cao

Text-Video Multi-Grained Integration for Video Moment Montage

Dec 12, 2024

Zhihui Yin, Ye Ma, Xipeng Cao, Bo Wang, Quan Chen, Peng Jiang

Abstract:The proliferation of online short video platforms has driven a surge in user demand for short video editing. However, manually selecting, cropping, and assembling raw footage into a coherent, high-quality video remains laborious and time-consuming. To accelerate this process, we focus on a user-friendly new task called Video Moment Montage (VMM), which aims to accurately locate the corresponding video segments based on a pre-provided narration text and then arrange these video clips to create a complete video that aligns with the corresponding descriptions. The challenge lies in extracting precise temporal segments while ensuring intra-sentence and inter-sentence context consistency, as a single script sentence may require trimming and assembling multiple video clips. To address this problem, we present a novel \textit{Text-Video Multi-Grained Integration} method (TV-MGI) that efficiently fuses text features from the script with both shot-level and frame-level video features, which enables the global and fine-grained alignment between the video content and the corresponding textual descriptions in the script. To facilitate further research in this area, we introduce the Multiple Sentences with Shots Dataset (MSSD), a large-scale dataset designed explicitly for the VMM task. We conduct extensive experiments on the MSSD dataset to demonstrate the effectiveness of our framework compared to baseline methods.

Via

Access Paper or Ask Questions

Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation

May 11, 2024

Shengyuan Liu, Bo Wang, Ye Ma, Te Yang, Xipeng Cao, Quan Chen, Han Li, Di Dong, Peng Jiang

Abstract:Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these limitations, we propose a subject-driven generation framework and introduce training-free guidance to intervene in the generative process during inference time. This approach strengthens the attention map, allowing for precise attribute binding and feature injection for each subject. Notably, our method exhibits exceptional zero-shot generation ability, especially in the challenging task of compositional generation. Furthermore, we propose a novel metric GroundingScore to evaluate subject alignment thoroughly. The obtained quantitative results serve as compelling evidence showcasing the effectiveness of our proposed method. The code will be released soon.

* 26 pages, 13 figures

Via

Access Paper or Ask Questions

Explainable Student Performance Prediction With Personalized Attention for Explaining Why A Student Fails

Oct 15, 2021

Kun Niu, Xipeng Cao, Yicong Yu

Figure 1 for Explainable Student Performance Prediction With Personalized Attention for Explaining Why A Student Fails

Figure 2 for Explainable Student Performance Prediction With Personalized Attention for Explaining Why A Student Fails

Figure 3 for Explainable Student Performance Prediction With Personalized Attention for Explaining Why A Student Fails

Figure 4 for Explainable Student Performance Prediction With Personalized Attention for Explaining Why A Student Fails

Abstract:As student failure rates continue to increase in higher education, predicting student performance in the following semester has become a significant demand. Personalized student performance prediction helps educators gain a comprehensive view of student status and effectively intervene in advance. However, existing works scarcely consider the explainability of student performance prediction, which educators are most concerned about. In this paper, we propose a novel Explainable Student performance prediction method with Personalized Attention (ESPA) by utilizing relationships in student profiles and prior knowledge of related courses. The designed Bidirectional Long Short-Term Memory (BiLSTM) architecture extracts the semantic information in the paths with specific patterns. As for leveraging similar paths' internal relations, a local and global-level attention mechanism is proposed to distinguish the influence of different students or courses for making predictions. Hence, valid reasoning on paths can be applied to predict the performance of students. The ESPA consistently outperforms the other state-of-the-art models for student performance prediction, and the results are intuitively explainable. This work can help educators better understand the different impacts of behavior on students' studies.

* AAAI 2021 Workshop on AI Education/TIPCE 2021

Via

Access Paper or Ask Questions