Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lan Yang

VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention

Mar 14, 2025

Jiangning Wei, Lixiong Qin, Bo Yu, Tianjian Zou, Chuhan Yan, Dandan Xiao, Yang Yu, Lan Yang, Ke Li, Jun Liu

Abstract:Action recognition is a crucial task in artificial intelligence, with significant implications across various domains. We initially perform a comprehensive analysis of seven prominent action recognition methods across five widely-used datasets. This analysis reveals a critical, yet previously overlooked, observation: as the velocity of actions increases, the performance of these methods variably declines, undermining their robustness. This decline in performance poses significant challenges for their application in real-world scenarios. Building on these findings, we introduce the Velocity-Aware Action Recognition (VA-AR) framework to obtain robust action representations across different velocities. Our principal insight is that rapid actions (e.g., the giant circle backward in uneven bars or a smash in badminton) occur within short time intervals, necessitating smaller temporal attention windows to accurately capture intricate changes. Conversely, slower actions (e.g., drinking water or wiping face) require larger windows to effectively encompass the broader context. VA-AR employs a Mixture of Window Attention (MoWA) strategy, dynamically adjusting its attention window size based on the action's velocity. This adjustment enables VA-AR to obtain a velocity-aware representation, thereby enhancing the accuracy of action recognition. Extensive experiments confirm that VA-AR achieves state-of-the-art performance on the same five datasets, demonstrating VA-AR's effectiveness across a broad spectrum of action recognition scenarios.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Dec 17, 2024

Zhipeng Chen, Lan Yang, Yonggang Qi, Honggang Zhang, Kaiyue Pang, Ke Li, Yi-Zhe Song

Figure 1 for VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Figure 2 for VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Figure 3 for VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Figure 4 for VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Abstract:Despite the rapid advancements in text-to-image (T2I) synthesis, enabling precise visual control remains a significant challenge. Existing works attempted to incorporate multi-facet controls (text and sketch), aiming to enhance the creative control over generated images. However, our pilot study reveals that the expressive power of humans far surpasses the capabilities of current methods. Users desire a more versatile approach that can accommodate their diverse creative intents, ranging from controlling individual subjects to manipulating the entire scene composition. We present VersaGen, a generative AI agent that enables versatile visual control in T2I synthesis. VersaGen admits four types of visual controls: i) single visual subject; ii) multiple visual subjects; iii) scene background; iv) any combination of the three above or merely no control at all. We train an adaptor upon a frozen T2I model to accommodate the visual information into the text-dominated diffusion process. We introduce three optimization strategies during the inference phase of VersaGen to improve generation results and enhance user experience. Comprehensive experiments on COCO and Sketchy validate the effectiveness and flexibility of VersaGen, as evidenced by both qualitative and quantitative results.

* The paper has been accepted by AAAI 2025. Paper code: https://github.com/FelixChan9527/VersaGen_official

Via

Access Paper or Ask Questions

Wired Perspectives: Multi-View Wire Art Embraces Generative AI

Nov 26, 2023

Zhiyu Qu, Lan Yang, Honggang Zhang, Tao Xiang, Kaiyue Pang, Yi-Zhe Song

Abstract:Creating multi-view wire art (MVWA), a static 3D sculpture with diverse interpretations from different viewpoints, is a complex task even for skilled artists. In response, we present DreamWire, an AI system enabling everyone to craft MVWA easily. Users express their vision through text prompts or scribbles, freeing them from intricate 3D wire organisation. Our approach synergises 3D B\'ezier curves, Prim's algorithm, and knowledge distillation from diffusion models or their variants (e.g., ControlNet). This blend enables the system to represent 3D wire art, ensuring spatial continuity and overcoming data scarcity. Extensive evaluation and analysis are conducted to shed insight on the inner workings of the proposed system, including the trade-off between connectivity and visual aesthetics.

* Project page: https://dreamwireart.github.io

Via

Access Paper or Ask Questions

Vector Quantization by Minimizing Kullback-Leibler Divergence

Jan 30, 2015

Lan Yang, Jingbin Wang, Yujin Tu, Prarthana Mahapatra, Nelson Cardoso

Figure 1 for Vector Quantization by Minimizing Kullback-Leibler Divergence

Figure 2 for Vector Quantization by Minimizing Kullback-Leibler Divergence

Figure 3 for Vector Quantization by Minimizing Kullback-Leibler Divergence

Figure 4 for Vector Quantization by Minimizing Kullback-Leibler Divergence

Abstract:This paper proposes a new method for vector quantization by minimizing the Kullback-Leibler Divergence between the class label distributions over the quantization inputs, which are original vectors, and the output, which is the quantization subsets of the vector set. In this way, the vector quantization output can keep as much information of the class label as possible. An objective function is constructed and we also developed an iterative algorithm to minimize it. The new method is evaluated on bag-of-features based image classification problem.

Via

Access Paper or Ask Questions