Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zihan Liao

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Nov 19, 2024

Zhehan Kan, Ce Zhang, Zihan Liao, Yapeng Tian, Wenming Yang, Junyuan Xiao, Xu Li, Dongmei Jiang, Yaowei Wang, Qingmin Liao

Figure 1 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Figure 2 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Figure 3 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Figure 4 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Abstract:Large Vision-Language Model (LVLM) systems have demonstrated impressive vision-language reasoning capabilities but suffer from pervasive and severe hallucination issues, posing significant risks in critical domains such as healthcare and autonomous systems. Despite previous efforts to mitigate hallucinations, a persistent issue remains: visual defect from vision-language misalignment, creating a bottleneck in visual processing capacity. To address this challenge, we develop Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs (CATCH), based on the Information Bottleneck theory. CATCH introduces Complementary Visual Decoupling (CVD) for visual information separation, Non-Visual Screening (NVS) for hallucination detection, and Adaptive Token-level Contrastive Decoding (ATCD) for hallucination mitigation. CATCH addresses issues related to visual defects that cause diminished fine-grained feature perception and cumulative hallucinations in open-ended scenarios. It is applicable to various visual question-answering tasks without requiring any specific data or prior knowledge, and generalizes robustly to new tasks without additional training, opening new possibilities for advancing LVLM in various challenging applications.

Via

Access Paper or Ask Questions

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Sep 10, 2024

Zihan Liao, Jun Wang, Hang Yu, Lingxiao Wei, Jianguo Li, Wei Zhang

Abstract:In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of enhancing the long-context performance, reducing computational complexity, and leveraging pretrained models collectively termed the "impossible triangle." We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. Two training objectives, focusing on reconstruction of the encoder output and long-context instruction fine-tuning, are employed to facilitate the understanding of soft prompts by the LLM. Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models. Our framework thus represents a significant advancement in the field, contributing to effective long-text modeling.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

Pulse Shaping for Random ISAC Signals: The Ambiguity Function Between Symbols Matters

Jul 22, 2024

Zihan Liao, Fan Liu, Shuangyang Li, Yifeng Xiong, Weijie Yuan, Christos Masouros, Marco Lops

Abstract:Integrated sensing and communications (ISAC) has emerged as a pivotal enabling technology for next-generation wireless networks. Despite the distinct signal design requirements of sensing and communication (S&C) systems, shifting the symbol-wise pulse shaping (SWiPS) framework from communication-only systems to ISAC poses significant challenges in signal design and processing This paper addresses these challenges by examining the ambiguity function (AF) of the SWiPS ISAC signal and introducing a novel pulse shaping design for single-carrier ISAC transmission. We formulate optimization problems to minimize the average integrated sidelobe level (ISL) of the AF, as well as the weighted ISL (WISL) while satisfying inter-symbol interference (ISI), out-of-band emission (OOBE), and power constraints. Our contributions include establishing the relationship between the AFs of both the random data symbols and signaling pulses, analyzing the statistical characteristics of the AF, and developing algorithmic frameworks for pulse shaping optimization using successive convex approximation (SCA) and alternating direction method of multipliers (ADMM) approaches. Numerical results are provided to validate our theoretical analysis, which demonstrate significant performance improvements in the proposed SWiPS design compared to the root-raised cosine (RRC) pulse shaping for conventional communication systems.

Via

Access Paper or Ask Questions

D2LLM: Decomposed and Distilled Large Language Models for Semantic Search

Jun 25, 2024

Zihan Liao, Hang Yu, Jianguo Li, Jun Wang, Wei Zhang

Abstract:The key challenge in semantic search is to create models that are both accurate and efficient in pinpointing relevant sentences for queries. While BERT-style bi-encoders excel in efficiency with pre-computed embeddings, they often miss subtle nuances in search tasks. Conversely, GPT-style LLMs with cross-encoder designs capture these nuances but are computationally intensive, hindering real-time applications. In this paper, we present D2LLMs-Decomposed and Distilled LLMs for semantic search-that combines the best of both worlds. We decompose a cross-encoder into an efficient bi-encoder integrated with Pooling by Multihead Attention and an Interaction Emulation Module, achieving nuanced understanding and pre-computability. Knowledge from the LLM is distilled into this model using contrastive, rank, and feature imitation techniques. Our experiments show that D2LLM surpasses five leading baselines in terms of all metrics across three tasks, particularly improving NLI task performance by at least 6.45%. The source code is available at https://github.com/codefuse-ai/D2LLM.

Via

Access Paper or Ask Questions

Improving the Ranging Performance of Random ISAC Signals Through Pulse Shaping Design

May 07, 2024

Zihan Liao, Fan Liu, Shuangyang Li, Yifeng Xiong, Weijie Yuan, Marco Lops

Abstract:In this paper, we propose a novel pulse shaping design for single-carrier integrated sensing and communication (ISAC) transmission. Due to the communication information embedded in the ISAC signal, the resulting auto-correlation function (ACF) is determined by both the information-conveying random symbol sequence and the signaling pulse, where the former leads to random fluctuations in the sidelobes of the ACF, impairing the range estimation performance. To overcome this challenge, we first analyze the statistical characteristics of the random ACF under the symbol-wise pulse shaping (SWPS) regime. As a step further, we formulate an optimization problem to design ISAC pulse shaping filters, which minimizes the average integrated sidelobe level ratio (ISLR) while meeting the Nyquist criterion, subject to power and bandwidth constraints. We then show that the problem can be recast as a convex quadratic program by expressing it in the frequency domain, which can be readily solved through standard tools. Numerical results demonstrate that the proposed pulse shaping design achieves substantial ranging sidelobe reduction compared to the celebrated root-raised cosine (RRC) pulse shaping, given that the communication throughput is unchanged.

Via

Access Paper or Ask Questions

BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

Sep 03, 2023

Yi Zhang, Ce Zhang, Zihan Liao, Yushun Tang, Zhihai He

Figure 1 for BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

Figure 2 for BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

Figure 3 for BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

Figure 4 for BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

Abstract:Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP and ALIGN, have introduced a new paradigm for learning transferable visual representations. Recently, there has been a surge of interest among researchers in developing lightweight fine-tuning techniques to adapt these models to downstream visual tasks. We recognize that current state-of-the-art fine-tuning methods, such as Tip-Adapter, simply consider the covariance between the query image feature and features of support few-shot training samples, which only captures linear relations and potentially instigates a deceptive perception of independence. To address this issue, in this work, we innovatively introduce Brownian Distance Covariance (BDC) to the field of vision-language reasoning. The BDC metric can model all possible relations, providing a robust metric for measuring feature dependence. Based on this, we present a novel method called BDC-Adapter, which integrates BDC prototype similarity reasoning and multi-modal reasoning network prediction to perform classification tasks. Our extensive experimental results show that the proposed BDC-Adapter can freely handle non-linear relations and fully characterize independence, outperforming the current state-of-the-art methods by large margins.

* Accepted by BMVC 2023

Via

Access Paper or Ask Questions

Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Jun 26, 2023

Zihan Liao, Fan Liu, Ang Li, Christos Masouros

Figure 1 for Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Figure 2 for Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Figure 3 for Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Figure 4 for Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Abstract:In this paper, we present an innovative symbol-level precoding (SLP) approach for a wideband multi-user multi-input multi-output (MU-MIMO) downlink Integrated Sensing and Communications (ISAC) system employing faster-than-Nyquist (FTN) signaling. Our proposed technique minimizes the minimum mean squared error (MMSE) for the sensed parameter estimation while ensuring the communication per-user quality-of-service through the utilization of constructive interference (CI) methodologies. While the formulated problem is non-convex in general, we tackle this issue using proficient minorization and successive convex approximation (SCA) strategies. Numerical results substantiate that our FTN-ISAC-SLP framework significantly enhances communication throughput while preserving satisfactory sensing performance.

Via

Access Paper or Ask Questions

Symbol-Level Precoding for Integrated Sensing and Communications: A Faster-Than-Nyquist Approach

Feb 27, 2023

Zihan Liao, Fan Liu

Abstract:In this paper, we propose a novel symbol-level precoding (SLP) method for a multi-user multi-input multi-output (MU-MIMO) downlink Integrated Sensing and Communications (ISAC) system based on the faster-than-Nyquist (FTN) signaling, where an ISAC signal is designed to simultaneously accomplish target sensing and wireless communication tasks. In particular, we minimize the minimum mean squared error (MMSE) for target parameter estimation, while guaranteeing the per-user quality-of-service by exploiting both multi-user and inter-symbol interference with constructive interference (CI) techniques. While the formulated problem is non-convex in general, we propose an efficient successive convex approximation (SCA) method, which solves a convex second-order cone program (SOCP) subproblem at each iteration. Numerical results demonstrate the effectiveness of the proposed FTN-ISAC-SLP design, showing that out method significantly outperforms conventional benchmark approaches in terms of both communication and sensing performance.

Via

Access Paper or Ask Questions