Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuqi Wang

Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model

Jun 05, 2025

Zelu Qi, Ping Shi, Chaoyang Zhang, Shuqi Wang, Fei Zhao, Da Pan, Zefeng Ying

Abstract:The development of AI-Generated Video (AIGV) technology has been remarkable in recent years, significantly transforming the paradigm of video content production. However, AIGVs still suffer from noticeable visual quality defects, such as noise, blurriness, frame jitter and low dynamic degree, which severely impact the user's viewing experience. Therefore, an effective automatic visual quality assessment is of great importance for AIGV content regulation and generative model improvement. In this work, we decompose the visual quality of AIGVs into three dimensions: technical quality, motion quality, and video semantics. For each dimension, we design corresponding encoder to achieve effective feature representation. Moreover, considering the outstanding performance of large language models (LLMs) in various vision and language tasks, we introduce a LLM as the quality regression module. To better enable the LLM to establish reasoning associations between multi-dimensional features and visual quality, we propose a specially designed multi-modal prompt engineering framework. Additionally, we incorporate LoRA fine-tuning technology during the training phase, allowing the LLM to better adapt to specific tasks. Our proposed method achieved \textbf{second place} in the NTIRE 2025 Quality Assessment of AI-Generated Content Challenge: Track 2 AI Generated video, demonstrating its effectiveness. Codes can be obtained at https://github.com/QiZelu/AIGVEval.

Via

Access Paper or Ask Questions

HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

Feb 27, 2025

Taiqiang Wu, Chenchen Ding, Wenyong Zhou, Yuxin Cheng, Xincheng Feng, Shuqi Wang, Chufan Shi, Zhengwu Liu, Ngai Wong

Abstract:Low-rank adaptation (LoRA) is a predominant parameter-efficient finetuning method to adapt large language models (LLMs) for downstream tasks. In this paper, we first propose to deploy the LoRA-finetuned LLMs on the hybrid compute-in-memory (CIM) architecture (i.e., pretrained weights onto RRAM and LoRA onto SRAM). To address performance degradation from RRAM's inherent noise, we design a novel Hardware-aware Low-rank Adaption (HaLoRA) method, aiming to train a LoRA branch that is both robust and accurate by aligning the training objectives under both ideal and noisy conditions. Experiments finetuning LLaMA 3.2 1B and 3B demonstrate HaLoRA's effectiveness across multiple reasoning tasks, achieving up to 22.7 improvement in average score while maintaining robustness at various noise levels.

* 7 pages

Via

Access Paper or Ask Questions

Comprehensive Subjective and Objective Evaluation Method for Text-generated Video

Jan 15, 2025

Zelu Qi, Ping Shi, Shuqi Wang, Zhaoyang Zhang, Zefeng Ying, Da Pan

Figure 1 for Comprehensive Subjective and Objective Evaluation Method for Text-generated Video

Figure 2 for Comprehensive Subjective and Objective Evaluation Method for Text-generated Video

Figure 3 for Comprehensive Subjective and Objective Evaluation Method for Text-generated Video

Figure 4 for Comprehensive Subjective and Objective Evaluation Method for Text-generated Video

Abstract:Recent text-to-video (T2V) technology advancements, as demonstrated by models such as Gen3, Pika, and Sora, have significantly broadened its applicability and popularity. This progress has created a growing demand for accurate quality assessment metrics to evaluate the perceptual quality of text-generated videos and optimize video generation models. However, assessing the quality of text-generated videos remains challenging due to the presence of highly complex distortions, such as unnatural actions and phenomena that defy human cognition. To address these challenges, we constructed a large-scale benchmark dataset for \textbf{T}ext-generated \textbf{V}ideo \textbf{eval}uation, \textbf{T2VEval-Bench}, comprising 148 textual words and 1,783 videos generated by 12 models. During the subjective evaluation, we collected five key scores: overall impression, video quality, aesthetic quality, realness, and text-video consistency. For objective evaluation, we developed the \textbf{T2VEval} model, which assesses videos across three branches: quality, authenticity, and consistency. Using an attention-based fusion module, T2VEval effectively integrates features from each branch and predicts scores with the aid of a large oracle model. Additionally, we implemented a progressive training strategy, enabling each branch to learn targeted knowledge while maintaining synergy with the others. Experimental results demonstrate that T2VEval achieves state-of-the-art performance across multiple metrics. The dataset and code will be open-sourced upon completion of the follow-up work.

Via

Access Paper or Ask Questions

Does ChatGPT resemble humans in language use?

Mar 10, 2023

Zhenguang G. Cai, David A. Haslett, Xufeng Duan, Shuqi Wang, Martin J. Pickering

Abstract:Large language models (LLMs) and LLM-driven chatbots such as ChatGPT have shown remarkable capacities in comprehending and producing language. However, their internal workings remain a black box in cognitive terms, and it is unclear whether LLMs and chatbots can develop humanlike characteristics in language use. Cognitive scientists have devised many experiments that probe, and have made great progress in explaining, how people process language. We subjected ChatGPT to 12 of these experiments, pre-registered and with 1,000 runs per experiment. In 10 of them, ChatGPT replicated the human pattern of language use. It associated unfamiliar words with different meanings depending on their forms, continued to access recently encountered meanings of ambiguous words, reused recent sentence structures, reinterpreted implausible sentences that were likely to have been corrupted by noise, glossed over errors, drew reasonable inferences, associated causality with different discourse entities according to verb semantics, and accessed different meanings and retrieved different words depending on the identity of its interlocutor. However, unlike humans, it did not prefer using shorter words to convey less informative content and it did not use context to disambiguate syntactic ambiguities. We discuss how these convergences and divergences may occur in the transformer architecture. Overall, these experiments demonstrate that LLM-driven chatbots like ChatGPT are capable of mimicking human language processing to a great extent, and that they have the potential to provide insights into how people learn and use language.

Via

Access Paper or Ask Questions

Mesoscopic modeling of hidden spiking neurons

May 26, 2022

Shuqi Wang, Valentin Schmutz, Guillaume Bellec, Wulfram Gerstner

Figure 1 for Mesoscopic modeling of hidden spiking neurons

Figure 2 for Mesoscopic modeling of hidden spiking neurons

Figure 3 for Mesoscopic modeling of hidden spiking neurons

Figure 4 for Mesoscopic modeling of hidden spiking neurons

Abstract:Can we use spiking neural networks (SNN) as generative models of multi-neuronal recordings, while taking into account that most neurons are unobserved? Modeling the unobserved neurons with large pools of hidden spiking neurons leads to severely underconstrained problems that are hard to tackle with maximum likelihood estimation. In this work, we use coarse-graining and mean-field approximations to derive a bottom-up, neuronally-grounded latent variable model (neuLVM), where the activity of the unobserved neurons is reduced to a low-dimensional mesoscopic description. In contrast to previous latent variable models, neuLVM can be explicitly mapped to a recurrent, multi-population SNN, giving it a transparent biological interpretation. We show, on synthetic spike trains, that a few observed neurons are sufficient for neuLVM to perform efficient model inversion of large SNNs, in the sense that it can recover connectivity parameters, infer single-trial latent population activity, reproduce ongoing metastable dynamics, and generalize when subjected to perturbations mimicking photo-stimulation.

* 22 pages, 7 figures

Via

Access Paper or Ask Questions

Multilayer Perceptron Based Stress Evolution Analysis under DC Current Stressing for Multi-segment Wires

May 17, 2022

Tianshu Hou, Peining Zhen, Ngai Wong, Quan Chen, Guoyong Shi, Shuqi Wang, Hai-Bao Chen

Figure 1 for Multilayer Perceptron Based Stress Evolution Analysis under DC Current Stressing for Multi-segment Wires

Figure 2 for Multilayer Perceptron Based Stress Evolution Analysis under DC Current Stressing for Multi-segment Wires

Figure 3 for Multilayer Perceptron Based Stress Evolution Analysis under DC Current Stressing for Multi-segment Wires

Figure 4 for Multilayer Perceptron Based Stress Evolution Analysis under DC Current Stressing for Multi-segment Wires

Abstract:Electromigration (EM) is one of the major concerns in the reliability analysis of very large scale integration (VLSI) systems due to the continuous technology scaling. Accurately predicting the time-to-failure of integrated circuits (IC) becomes increasingly important for modern IC design. However, traditional methods are often not sufficiently accurate, leading to undesirable over-design especially in advanced technology nodes. In this paper, we propose an approach using multilayer perceptrons (MLP) to compute stress evolution in the interconnect trees during the void nucleation phase. The availability of a customized trial function for neural network training holds the promise of finding dynamic mesh-free stress evolution on complex interconnect trees under time-varying temperatures. Specifically, we formulate a new objective function considering the EM-induced coupled partial differential equations (PDEs), boundary conditions (BCs), and initial conditions to enforce the physics-based constraints in the spatial-temporal domain. The proposed model avoids meshing and reduces temporal iterations compared with conventional numerical approaches like FEM. Numerical results confirm its advantages on accuracy and computational performance.

* The paper will be published in IEEE Transactions on COMPUTER-AIDED DESIGN of Integrated Circuits and Systems

Via

Access Paper or Ask Questions

Fitting summary statistics of neural data with a differentiable spiking network simulator

Jun 18, 2021

Guillaume Bellec, Shuqi Wang, Alireza Modirshanechi, Johanni Brea, Wulfram Gerstner

Figure 1 for Fitting summary statistics of neural data with a differentiable spiking network simulator

Figure 2 for Fitting summary statistics of neural data with a differentiable spiking network simulator

Figure 3 for Fitting summary statistics of neural data with a differentiable spiking network simulator

Figure 4 for Fitting summary statistics of neural data with a differentiable spiking network simulator

Abstract:Fitting network models to neural activity is becoming an important tool in neuroscience. A popular approach is to model a brain area with a probabilistic recurrent spiking network whose parameters maximize the likelihood of the recorded activity. Although this is widely used, we show that the resulting model does not produce realistic neural activity and wrongly estimates the connectivity matrix when neurons that are not recorded have a substantial impact on the recorded network. To correct for this, we suggest to augment the log-likelihood with terms that measure the dissimilarity between simulated and recorded activity. This dissimilarity is defined via summary statistics commonly used in neuroscience, and the optimization is efficient because it relies on back-propagation through the stochastically simulated spike trains. We analyze this method theoretically and show empirically that it generates more realistic activity statistics and recovers the connectivity matrix better than other methods.

Via

Access Paper or Ask Questions

Rethinking the Extraction and Interaction of Multi-Scale Features for Vessel Segmentation

Oct 09, 2020

Yicheng Wu, Chengwei Pan, Shuqi Wang, Ming Zhang, Yong Xia, Yizhou Yu

Figure 1 for Rethinking the Extraction and Interaction of Multi-Scale Features for Vessel Segmentation

Figure 2 for Rethinking the Extraction and Interaction of Multi-Scale Features for Vessel Segmentation

Figure 3 for Rethinking the Extraction and Interaction of Multi-Scale Features for Vessel Segmentation

Figure 4 for Rethinking the Extraction and Interaction of Multi-Scale Features for Vessel Segmentation

Abstract:Analyzing the morphological attributes of blood vessels plays a critical role in the computer-aided diagnosis of many cardiovascular and ophthalmologic diseases. Although being extensively studied, segmentation of blood vessels, particularly thin vessels and capillaries, remains challenging mainly due to the lack of an effective interaction between local and global features. In this paper, we propose a novel deep learning model called PC-Net to segment retinal vessels and major arteries in 2D fundus image and 3D computed tomography angiography (CTA) scans, respectively. In PC-Net, the pyramid squeeze-and-excitation (PSE) module introduces spatial information to each convolutional block, boosting its ability to extract more effective multi-scale features, and the coarse-to-fine (CF) module replaces the conventional decoder to enhance the details of thin vessels and process hard-to-classify pixels again. We evaluated our PC-Net on the Digital Retinal Images for Vessel Extraction (DRIVE) database and an in-house 3D major artery (3MA) database against several recent methods. Our results not only demonstrate the effectiveness of the proposed PSE module and CF module, but also suggest that our proposed PC-Net sets new state of the art in the segmentation of retinal vessels (AUC: 98.31%) and major arteries (AUC: 98.35%) on both databases, respectively.

Via

Access Paper or Ask Questions