Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiqiang Yuan

Control-CLIP: Decoupling Category and Style Guidance in CLIP for Specific-Domain Generation

Feb 17, 2025

Zexi Jia, Chuanwei Huang, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Jinchao Zhang, Jie Zhou

Abstract:Text-to-image diffusion models have shown remarkable capabilities of generating high-quality images closely aligned with textual inputs. However, the effectiveness of text guidance heavily relies on the CLIP text encoder, which is trained to pay more attention to general content but struggles to capture semantics in specific domains like styles. As a result, generation models tend to fail on prompts like "a photo of a cat in Pokemon style" in terms of simply producing images depicting "a photo of a cat". To fill this gap, we propose Control-CLIP, a novel decoupled CLIP fine-tuning framework that enables the CLIP model to learn the meaning of category and style in a complement manner. With specially designed fine-tuning tasks on minimal data and a modified cross-attention mechanism, Control-CLIP can precisely guide the diffusion model to a specific domain. Moreover, the parameters of the diffusion model remain unchanged at all, preserving the original generation performance and diversity. Experiments across multiple domains confirm the effectiveness of our approach, particularly highlighting its robust plug-and-play capability in generating content with various specific styles.

Via

Access Paper or Ask Questions

Semantic to Structure: Learning Structural Representations for Infringement Detection

Feb 11, 2025

Chuanwei Huang, Zexi Jia, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Jinchao Zhang, Jie Zhou

Abstract:Structural information in images is crucial for aesthetic assessment, and it is widely recognized in the artistic field that imitating the structure of other works significantly infringes on creators' rights. The advancement of diffusion models has led to AI-generated content imitating artists' structural creations, yet effective detection methods are still lacking. In this paper, we define this phenomenon as "structural infringement" and propose a corresponding detection method. Additionally, we develop quantitative metrics and create manually annotated datasets for evaluation: the SIA dataset of synthesized data, and the SIR dataset of real data. Due to the current lack of datasets for structural infringement detection, we propose a new data synthesis strategy based on diffusion models and LLM, successfully training a structural infringement detection model. Experimental results show that our method can successfully detect structural infringements and achieve notable improvements on annotated test sets.

Via

Access Paper or Ask Questions

Channel Sounding Using Multiplicative Arrays Based on Successive Interference Cancellation Principle

Jan 19, 2025

Zhangzhang Jiang, Zhiqiang Yuan, Chunhui Li, Le Yu, Wei Fan

Figure 1 for Channel Sounding Using Multiplicative Arrays Based on Successive Interference Cancellation Principle

Figure 2 for Channel Sounding Using Multiplicative Arrays Based on Successive Interference Cancellation Principle

Figure 3 for Channel Sounding Using Multiplicative Arrays Based on Successive Interference Cancellation Principle

Figure 4 for Channel Sounding Using Multiplicative Arrays Based on Successive Interference Cancellation Principle

Abstract:Ultra-massive multiple-input and multiple-output (MIMO) systems have been seen as the key radio technology for the advancement of wireless communication systems, due to its capability to better utilize the spatial dimension of the propagation channels. Channel sounding is essential for developing accurate and realistic channel models for the massive MIMO systems. However, channel sounding with large-scale antenna systems has faced significant challenges in practice. The real antenna array based (RAA) sounder suffers from high complexity and cost, while virtual antenna array (VAA) solutions are known for its long measurement time. Notably, these issues will become more pronounced as the antenna array configuration gets larger for future radio systems. In this paper, we propose the concept of multiplicative array (MA) for channel sounding applications to achieve large antenna aperture size with reduced number of required antenna elements. The unique characteristics of the MA are exploited for wideband spatial channel sounding purposes, supported by both one-path and multi-path numerical simulations. To address the fake paths and distortion in the angle delay profile issues inherent for MA in multipath channel sounding, a novel channel parameter estimation algorithm for MA based on successive interference cancellation (SIC) principle is proposed. Both numerical simulations and experimental validation results are provided to demonstrate the effectiveness and robustness of the proposed SIC algorithm for the MA. This research contributes significantly to the channel sounding and characterization of massive MIMO systems for future applications.

Via

Access Paper or Ask Questions

WalkVLM:Aid Visually Impaired People Walking by Vision Language Model

Dec 30, 2024

Zhiqiang Yuan, Ting Zhang, Jiapei Zhang, Jie Zhou, Jinchao Zhang

Abstract:Approximately 200 million individuals around the world suffer from varying degrees of visual impairment, making it crucial to leverage AI technology to offer walking assistance for these people. With the recent progress of vision-language models (VLMs), employing VLMs to improve this field has emerged as a popular research topic. However, most existing methods are studied on self-built question-answering datasets, lacking a unified training and testing benchmark for walk guidance. Moreover, in blind walking task, it is necessary to perform real-time streaming video parsing and generate concise yet informative reminders, which poses a great challenge for VLMs that suffer from redundant responses and low inference efficiency. In this paper, we firstly release a diverse, extensive, and unbiased walking awareness dataset, containing 12k video-manual annotation pairs from Europe and Asia to provide a fair training and testing benchmark for blind walking task. Furthermore, a WalkVLM model is proposed, which employs chain of thought for hierarchical planning to generate concise but informative reminders and utilizes temporal-aware adaptive prediction to reduce the temporal redundancy of reminders. Finally, we have established a solid benchmark for blind walking task and verified the advantages of WalkVLM in stream video processing for this task compared to other VLMs. Our dataset and code will be released at anonymous link https://walkvlm2024.github.io.

Via

Access Paper or Ask Questions

ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation

Dec 30, 2024

Ting Zhang, Zhiqiang Yuan, Yeshuang Zhu, Jinchao Zhang

Figure 1 for ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation

Figure 2 for ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation

Figure 3 for ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation

Figure 4 for ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation

Abstract:High-quality animated stickers usually contain transparent channels, which are often ignored by current video generation models. To generate fine-grained animated transparency channels, existing methods can be roughly divided into video matting algorithms and diffusion-based algorithms. The methods based on video matting have poor performance in dealing with semi-open areas in stickers, while diffusion-based methods are often used to model a single image, which will lead to local flicker when modeling animated stickers. In this paper, we firstly propose an ILDiff method to generate animated transparent channels through implicit layout distillation, which solves the problems of semi-open area collapse and no consideration of temporal information in existing methods. Secondly, we create the Transparent Animated Sticker Dataset (TASD), which contains 0.32M high-quality samples with transparent channel, to provide data support for related fields. Extensive experiments demonstrate that ILDiff can produce finer and smoother transparent channels compared to other methods such as Matting Anything and Layer Diffusion. Our code and dataset will be released at link https://xiaoyuan1996.github.io.

Via

Access Paper or Ask Questions

TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation

Sep 30, 2024

Zhiqiang Yuan, Weitong Chen, Hanlin Wang, Kai Yu, Xin Peng, Yiling Lou

Abstract:Code translation converts code from one programming language to another while maintaining its original functionality, which is crucial for software migration, system refactoring, and cross-platform development. Traditional rule-based methods rely on manually-written rules, which can be time-consuming and often result in less readable code. To overcome this, learning-based methods have been developed, leveraging parallel data to train models for automated code translation. More recently, the advance of Large Language Models (LLMs) further boosts learning-based code translation. Although promising, LLM-translated program still suffers from diverse quality issues (e.g., syntax errors and semantic errors). In particular, it can be challenging for LLMs to self-debug these errors when simply provided with the corresponding error messages. In this work, we propose a novel LLM-based multi-agent system TRANSAGENT, which enhances LLM-based code translation by fixing the syntax errors and semantic errors with the synergy between four LLM-based agents, including Initial Code Translator, Syntax Error Fixer, Code Aligner, and Semantic Error Fixer. The main insight of TRANSAGENT is to first localize the error code block in the target program based on the execution alignment between the target and source program, which can narrow down the fixing space and thus lower down the fixing difficulties. To evaluate TRANSAGENT, we first construct a new benchmark from recent programming tasks to mitigate the potential data leakage issue. On our benchmark, TRANSAGENT outperforms the latest LLM-based code translation technique UniTrans in both translation effectiveness and efficiency; additionally, our evaluation on different LLMs show the generalization of TRANSAGENT and our ablation study shows the contribution of each agent.

Via

Access Paper or Ask Questions

Near-Field Channel Characterization for Mid-band ELAA Systems: Sounding, Parameter Estimation, and Modeling

May 10, 2024

Wei Fan, Zhiqiang Yuan, Yejian Lyu, Jianhua Zhang, Gert Pedersen, Jonathan Borrill, Fengchun Zhang

Figure 1 for Near-Field Channel Characterization for Mid-band ELAA Systems: Sounding, Parameter Estimation, and Modeling

Figure 2 for Near-Field Channel Characterization for Mid-band ELAA Systems: Sounding, Parameter Estimation, and Modeling

Figure 3 for Near-Field Channel Characterization for Mid-band ELAA Systems: Sounding, Parameter Estimation, and Modeling

Figure 4 for Near-Field Channel Characterization for Mid-band ELAA Systems: Sounding, Parameter Estimation, and Modeling

Abstract:6G communication will greatly benefit from using extremely large-scale antenna arrays (ELAAs) and new mid-band spectrums (7-24 GHz). These techniques require a thorough exploration of the challenges and potentials of the associated near-field (NF) phenomena. It is crucial to develop accurate NF channel models that include spherical wave propagation and spatial non-stationarity (SnS). However, channel measurement campaigns for mid-band ELAA systems have rarely been reported in the state-of-the-art. To this end, this work develops a channel sounder dedicated to mid-band ELAA systems based on a distributed modular vector network analyzer incorporating radio-over-fiber (RoF), phase compensation, and virtual antenna array schemes. This novel channel-sounding testbed based on off-the-shelf VNA has the potential to enable large-scale experimentation due to its generic and easy-accessible nature. The main challenges and solutions for developing NF channel models for mid-band ELAA systems are discussed, including channel sounders, multipath parameter estimation algorithms, and channel modeling frameworks. Besides, the study reports a measurement campaign in an indoor scenario using a 720-element virtual uniform circular array ELAA operating at {16-20} GHz, highlighting the presence of spherical wavefronts and spatial non-stationary effects. The effectiveness of the proposed near-field channel parameter estimator and channel modeling framework is also demonstrated using the measurement data.

* Submitted to IEEE Communication Magazine

Via

Access Paper or Ask Questions

Digital Twin Channel for 6G: Concepts, Architectures and Potential Applications

Mar 31, 2024

Heng Wang, Jianhua Zhang, Gaofeng Nie, Li Yu, Zhiqiang Yuan, Tongjie Li, Jialin Wang, Guangyi Liu

Figure 1 for Digital Twin Channel for 6G: Concepts, Architectures and Potential Applications

Figure 2 for Digital Twin Channel for 6G: Concepts, Architectures and Potential Applications

Figure 3 for Digital Twin Channel for 6G: Concepts, Architectures and Potential Applications

Figure 4 for Digital Twin Channel for 6G: Concepts, Architectures and Potential Applications

Abstract:Digital twin channel (DTC) is the real-time mapping of a wireless channel from the physical world to the digital world, which is expected to provide significant performance enhancements for the sixth-generation (6G) air-interface design. In this work, we first define five evolution levels of channel twins with the progression of wireless communication. The fifth level, autonomous DTC, is elaborated with multi-dimensional factors such as methodology, characterization precision, and data category. Then, we provide detailed insights into the requirements and architecture of a complete DTC for 6G. Subsequently, a sensing-enhanced real-time channel prediction platform and experimental validations are exhibited. Finally, drawing from the vision of the 6G network, we explore the potential applications and the open issues in future DTC research.

* 7 pages, 5 figures, 15 references. It is submitted to IEEE journal

Via

Access Paper or Ask Questions

Frequency Compensated Diffusion Model for Real-scene Dehazing

Aug 21, 2023

Jing Wang, Songtao Wu, Kuanhong Xu, Zhiqiang Yuan

Figure 1 for Frequency Compensated Diffusion Model for Real-scene Dehazing

Figure 2 for Frequency Compensated Diffusion Model for Real-scene Dehazing

Figure 3 for Frequency Compensated Diffusion Model for Real-scene Dehazing

Figure 4 for Frequency Compensated Diffusion Model for Real-scene Dehazing

Abstract:Due to distribution shift, deep learning based methods for image dehazing suffer from performance degradation when applied to real-world hazy images. In this paper, we consider a dehazing framework based on conditional diffusion models for improved generalization to real haze. First, we find that optimizing the training objective of diffusion models, i.e., Gaussian noise vectors, is non-trivial. The spectral bias of deep networks hinders the higher frequency modes in Gaussian vectors from being learned and hence impairs the reconstruction of image details. To tackle this issue, we design a network unit, named Frequency Compensation block (FCB), with a bank of filters that jointly emphasize the mid-to-high frequencies of an input signal. We demonstrate that diffusion models with FCB achieve significant gains in both perceptual and distortion metrics. Second, to further boost the generalization performance, we propose a novel data synthesis pipeline, HazeAug, to augment haze in terms of degree and diversity. Within the framework, a solid baseline for blind dehazing is set up where models are trained on synthetic hazy-clean pairs, and directly generalize to real data. Extensive evaluations show that the proposed dehazing diffusion model significantly outperforms state-of-the-art methods on real-world images.

* 16 pages

Via

Access Paper or Ask Questions

Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

Aug 02, 2023

Zhiqiang Yuan, Junwei Liu, Qiancheng Zi, Mingwei Liu, Xin Peng, Yiling Lou

Figure 1 for Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

Figure 2 for Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

Figure 3 for Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

Figure 4 for Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

Abstract:In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks. We have the following main findings. First, for the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks and sometimes even better than small SOTA models specifically fine-tuned on each downstream task. We also find that larger instructed LLMs are not always better on code-related tasks. Second, for the few-shot setting, we find that adding demonstration examples substantially helps instructed LLMs perform better on most code comprehension and generation tasks; however, the examples would sometimes induce unstable or even worse performance. Furthermore, we find widely-used BM25-based shot selection strategy significantly outperforms the basic random selection or fixed selection only on generation problems. Third, for the fine-tuning setting, we find that fine-tuning could further improve the model performance on downstream code comprehension and generation tasks compared to the zero-shot/one-shot performance. In addition, after being fine-tuned on the same downstream task dataset, instructed LLMs outperform both the small SOTA models and similar-scaled LLMs without instruction tuning. Based on our findings, we further present practical implications on model and usage recommendation, performance and cost trade-offs, and future direction.

Via

Access Paper or Ask Questions