Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xianglin Huang

ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On

Jun 06, 2025

Jinjuan Wang, Wenzhang Sun, Ming Li, Yun Zheng, Fanyao Li, Zhulin Tao, Donglin Di, Hao Li, Wei Chen, Xianglin Huang

Abstract:Video virtual try-on aims to seamlessly replace the clothing of a person in a source video with a target garment. Despite significant progress in this field, existing approaches still struggle to maintain continuity and reproduce garment details. In this paper, we introduce ChronoTailor, a diffusion-based framework that generates temporally consistent videos while preserving fine-grained garment details. By employing a precise spatio-temporal attention mechanism to guide the integration of fine-grained garment features, ChronoTailor achieves robust try-on performance. First, ChronoTailor leverages region-aware spatial guidance to steer the evolution of spatial attention and employs an attention-driven temporal feature fusion mechanism to generate more continuous temporal features. This dual approach not only enables fine-grained local editing but also effectively mitigates artifacts arising from video dynamics. Second, ChronoTailor integrates multi-scale garment features to preserve low-level visual details and incorporates a garment-pose feature alignment to ensure temporal continuity during dynamic motion. Additionally, we collect StyleDress, a new dataset featuring intricate garments, varied environments, and diverse poses, offering advantages over existing public datasets, and will be publicly available for research. Extensive experiments show that ChronoTailor maintains spatio-temporal continuity and preserves garment details during motion, significantly outperforming previous methods.

Via

Access Paper or Ask Questions

Integrate Temporal Graph Learning into LLM-based Temporal Knowledge Graph Model

Jan 21, 2025

He Chang, Jie Wu, Zhulin Tao, Yunshan Ma, Xianglin Huang, Tat-Seng Chua

Abstract:Temporal Knowledge Graph Forecasting (TKGF) aims to predict future events based on the observed events in history. Recently, Large Language Models (LLMs) have exhibited remarkable capabilities, generating significant research interest in their application for reasoning over temporal knowledge graphs (TKGs). Existing LLM-based methods have integrated retrieved historical facts or static graph representations into LLMs. Despite the notable performance of LLM-based methods, they are limited by the insufficient modeling of temporal patterns and ineffective cross-modal alignment between graph and language, hindering the ability of LLMs to fully grasp the temporal and structural information in TKGs. To tackle these issues, we propose a novel framework TGL-LLM to integrate temporal graph learning into LLM-based temporal knowledge graph model. Specifically, we introduce temporal graph learning to capture the temporal and relational patterns and obtain the historical graph embedding. Furthermore, we design a hybrid graph tokenization to sufficiently model the temporal patterns within LLMs. To achieve better alignment between graph and language, we employ a two-stage training paradigm to finetune LLMs on high-quality and diverse data, thereby resulting in better performance. Extensive experiments on three real-world datasets show that our approach outperforms a range of state-of-the-art (SOTA) methods.

Via

Access Paper or Ask Questions

Progressive Feedback-Enhanced Transformer for Image Forgery Localization

Nov 15, 2023

Haochen Zhu, Gang Cao, Xianglin Huang

Abstract:Blind detection of the forged regions in digital images is an effective authentication means to counter the malicious use of local image editing techniques. Existing encoder-decoder forensic networks overlook the fact that detecting complex and subtle tampered regions typically requires more feedback information. In this paper, we propose a Progressive FeedbACk-enhanced Transformer (ProFact) network to achieve coarse-to-fine image forgery localization. Specifically, the coarse localization map generated by an initial branch network is adaptively fed back to the early transformer encoder layers for enhancing the representation of positive features while suppressing interference factors. The cascaded transformer network, combined with a contextual spatial pyramid module, is designed to refine discriminative forensic features for improving the forgery localization accuracy and reliability. Furthermore, we present an effective strategy to automatically generate large-scale forged image samples close to real-world forensic scenarios, especially in realistic and coherent processing. Leveraging on such samples, a progressive and cost-effective two-stage training protocol is applied to the ProFact network. The extensive experimental results on nine public forensic datasets show that our proposed localizer greatly outperforms the state-of-the-art on the generalization ability and robustness of image forgery localization. Code will be publicly available at https://github.com/multimediaFor/ProFact.

Via

Access Paper or Ask Questions

Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments

May 11, 2021

Xiaolong Wei, LiFang Yang, Xianglin Huang, Gang Cao, Tao Zhulin, Zhengyang Du, Jing An

Figure 1 for Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments

Figure 2 for Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments

Figure 3 for Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments

Figure 4 for Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments

Abstract:At present, attention mechanism has been widely applied to the fields of deep learning models. Structural models that based on attention mechanism can not only record the relationships between features position, but also can measure the importance of different features based on their weights. By establishing dynamically weighted parameters for choosing relevant and irrelevant features, the key information can be strengthened, and the irrelevant information can be weakened. Therefore, the efficiency of deep learning algorithms can be significantly elevated and improved. Although transformers have been performed very well in many fields including reinforcement learning, there are still many problems and applications can be solved and made with transformers within this area. MARL (known as Multi-Agent Reinforcement Learning) can be recognized as a set of independent agents trying to adapt and learn through their way to reach the goal. In order to emphasize the relationship between each MDP decision in a certain time period, we applied the hierarchical coding method and validated the effectiveness of this method. This paper proposed a hierarchical transformers MADDPG based on RNN which we call it Hierarchical RNNs-Based Transformers MADDPG(HRTMADDPG). It consists of a lower level encoder based on RNNs that encodes multiple step sizes in each time sequence, and it also consists of an upper sequence level encoder based on transformer for learning the correlations between multiple sequences so that we can capture the causal relationship between sub-time sequences and make HRTMADDPG more efficient.

* 10 pages

Via

Access Paper or Ask Questions

Resampling detection of recompressed images via dual-stream convolutional neural network

Jan 15, 2019

Antao Zhou, Gang Cao, Xianglin Huang, Gege Song, Lifang Yang

Figure 1 for Resampling detection of recompressed images via dual-stream convolutional neural network

Figure 2 for Resampling detection of recompressed images via dual-stream convolutional neural network

Figure 3 for Resampling detection of recompressed images via dual-stream convolutional neural network

Figure 4 for Resampling detection of recompressed images via dual-stream convolutional neural network

Abstract:Resampling detection plays an important role in identifying image tampering, such as image splicing. Currently, the resampling detection is still difficult in recompressed images, which are yielded by applying resampling and post-JPEG compression to primary JPEG images. Although low quality primary compression benefits the detection, it remains rather challenging due to the widespread use of middle/high quality compression in imaging devices. In this paper, we propose a novel deep learning approach to learn resampling features directly from the recompressed images. To this end, a noise extraction layer based on low-order high pass filters is deployed to yield the image noise residual domain, which is more beneficial to extract manipulation trail features. A dual-stream convolutional neural network (CNN) is presented to capture the resampling trails along different directions, where the horizontal and vertical streams are interleaved and concatenated. Lastly, the learned features are fed into Sigmoid/Softmax layer, which is used as a binary/multiple classifier for achieving the blind detection or parameter estimation of resampling operations, respectively. Extensive experimental results demonstrate that our proposed method could detect resampling effectively in recompressed images and outperform the state-of-the-art detectors.

Via

Access Paper or Ask Questions

Acceleration of Histogram-Based Contrast Enhancement via Selective Downsampling

Nov 20, 2017

Gang Cao, Huawei Tian, Lifang Yu, Xianglin Huang, Yongbin Wang

Figure 1 for Acceleration of Histogram-Based Contrast Enhancement via Selective Downsampling

Figure 2 for Acceleration of Histogram-Based Contrast Enhancement via Selective Downsampling

Figure 3 for Acceleration of Histogram-Based Contrast Enhancement via Selective Downsampling

Figure 4 for Acceleration of Histogram-Based Contrast Enhancement via Selective Downsampling

Abstract:In this paper, we propose a general framework to accelerate the universal histogram-based image contrast enhancement (CE) algorithms. Both spatial and gray-level selective down- sampling of digital images are adopted to decrease computational cost, while the visual quality of enhanced images is still preserved and without apparent degradation. Mapping function calibration is novelly proposed to reconstruct the pixel mapping on the gray levels missed by downsampling. As two case studies, accelerations of histogram equalization (HE) and the state-of-the-art global CE algorithm, i.e., spatial mutual information and PageRank (SMIRANK), are presented detailedly. Both quantitative and qualitative assessment results have verified the effectiveness of our proposed CE acceleration framework. In typical tests, computational efficiencies of HE and SMIRANK have been speeded up by about 3.9 and 13.5 times, respectively.

* accepted by IET Image Processing

Via

Access Paper or Ask Questions

Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction

Sep 13, 2017

Gang Cao, Lihui Huang, Huawei Tian, Xianglin Huang, Yongbin Wang, Ruicong Zhi

Figure 1 for Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction

Figure 2 for Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction

Figure 3 for Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction

Figure 4 for Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction

Abstract:As an efficient image contrast enhancement (CE) tool, adaptive gamma correction (AGC) was previously proposed by relating gamma parameter with cumulative distribution function (CDF) of the pixel gray levels within an image. ACG deals well with most dimmed images, but fails for globally bright images and the dimmed images with local bright regions. Such two categories of brightness-distorted images are universal in real scenarios, such as improper exposure and white object regions. In order to attenuate such deficiencies, here we propose an improved AGC algorithm. The novel strategy of negative images is used to realize CE of the bright images, and the gamma correction modulated by truncated CDF is employed to enhance the dimmed ones. As such, local over-enhancement and structure distortion can be alleviated. Both qualitative and quantitative experimental results show that our proposed method yields consistently good CE results.

Via

Access Paper or Ask Questions