Abstract:Wiener filtering in the joint time-vertex fractional Fourier transform (JFRFT) domain has shown high effectiveness in denoising time-varying graph signals. Traditional filtering models use grid search to determine the transform-order pair and compute filter coefficients, while learnable ones employ gradient-descent strategies to optimize them; both require complete prior information of graph signals. To overcome this shortcoming, this letter proposes a data-model co-driven denoising approach, termed neural-network-aided joint time-vertex fractional Fourier filtering (JFRFFNet), which embeds the JFRFT-domain Wiener filter model into a neural network and updates the transform-order pair and filter coefficients through a data-driven approach. This design enables effective denoising using only partial prior information. Experiments demonstrate that JFRFFNet achieves significant improvements in output signal-to-noise ratio compared with some state-of-the-art methods.
Abstract:This paper presents a summary of the VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models (LMMs), hosted as part of the ICCV 2025 Workshop on Visual Quality Assessment. The challenge aims to evaluate and enhance the ability of state-of-the-art LMMs to perform open-ended and detailed reasoning about visual quality differences across multiple images. To this end, the competition introduces a novel benchmark comprising thousands of coarse-to-fine grained visual quality comparison tasks, spanning single images, pairs, and multi-image groups. Each task requires models to provide accurate quality judgments. The competition emphasizes holistic evaluation protocols, including 2AFC-based binary preference and multi-choice questions (MCQs). Around 100 participants submitted entries, with five models demonstrating the emerging capabilities of instruction-tuned LMMs on quality assessment. This challenge marks a significant step toward open-domain visual quality reasoning and comparison and serves as a catalyst for future research on interpretable and human-aligned quality evaluation systems.
Abstract:The graph fractional Fourier transform (GFRFT) applies a single global fractional order to all graph frequencies, which restricts its adaptability to diverse signal characteristics across the spectral domain. To address this limitation, in this paper, we propose two types of multiple-parameter GFRFTs (MPGFRFTs) and establish their corresponding theoretical frameworks. We design a spectral compression strategy tailored for ultra-low compression ratios, effectively preserving essential information even under extreme dimensionality reduction. To enhance flexibility, we introduce a learnable order vector scheme that enables adaptive compression and denoising, demonstrating strong performance on both graph signals and images. We explore the application of MPGFRFTs to image encryption and decryption. Experimental results validate the versatility and superior performance of the proposed MPGFRFT framework across various graph signal processing tasks.
Abstract:This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspects: image-text alignment and image structural distortion detection, and is divided into the alignment track and the structural track. The alignment track uses the EvalMuse-40K, which contains around 40K AI-Generated Images (AIGIs) generated by 20 popular generative models. The alignment track has a total of 371 registered participants. A total of 1,883 submissions are received in the development phase, and 507 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. The structure track uses the EvalMuse-Structure, which contains 10,000 AI-Generated Images (AIGIs) with corresponding structural distortion mask. A total of 211 participants have registered in the structure track. A total of 1155 submissions are received in the development phase, and 487 submissions are received in the test phase. Finally, 8 participating teams submitted their models and fact sheets. Almost all methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on T2I model quality assessment.
Abstract:The rapid development of text-to-image (T2I) generation approaches has attracted extensive interest in evaluating the quality of generated images, leading to the development of various quality assessment methods for general-purpose T2I outputs. However, existing image quality assessment (IQA) methods are limited to providing global quality scores, failing to deliver fine-grained perceptual evaluations for structurally complex subjects like humans, which is a critical challenge considering the frequent anatomical and textural distortions in AI-generated human images (AGHIs). To address this gap, we introduce AGHI-QA, the first large-scale benchmark specifically designed for quality assessment of AGHIs. The dataset comprises 4,000 images generated from 400 carefully crafted text prompts using 10 state of-the-art T2I models. We conduct a systematic subjective study to collect multidimensional annotations, including perceptual quality scores, text-image correspondence scores, visible and distorted body part labels. Based on AGHI-QA, we evaluate the strengths and weaknesses of current T2I methods in generating human images from multiple dimensions. Furthermore, we propose AGHI-Assessor, a novel quality metric that integrates the large multimodal model (LMM) with domain-specific human features for precise quality prediction and identification of visible and distorted body parts in AGHIs. Extensive experimental results demonstrate that AGHI-Assessor showcases state-of-the-art performance, significantly outperforming existing IQA methods in multidimensional quality assessment and surpassing leading LMMs in detecting structural distortions in AGHIs.
Abstract:Time-frequency concentration and resolution of the Cohen's class time-frequency distribution (CCTFD) has attracted much attention in time-frequency analysis. A variety of uncertainty principles of the CCTFD is therefore derived, including the weak Heisenberg type, the Hardy type, the Nazarov type, and the local type. However, the standard Heisenberg type still remains unresolved. In this study, we address the question of how the standard Heisenberg's uncertainty principle of the CCTFD is affected by fundamental properties. The investigated distribution properties are Parseval's relation and the concise frequency domain definition (i.e., only frequency variables are explicitly found in the tensor product), based on which we confine our attention to the CCTFD with some specific kernels. That is the unit modulus and v-independent time translation, reversal and scaling invariant kernel CCTFD (UMITRSK-CCTFD). We then extend the standard Heisenberg's uncertainty principles of the Wigner distribution to those of the UMITRSK-CCTFD, giving birth to various types of attainable lower bounds on the uncertainty product in the UMITRSK-CCTFD domain. The derived results strengthen the existing weak Heisenberg type and fill gaps in the standard Heisenberg type.
Abstract:This paper devotes to combine the chirp basis function transformation and symplectic coordinates transformation to yield a novel Wigner distribution (WD) associated with the linear canonical transform (LCT), named as the symplectic WD in the LCT domain (SWDL). It incorporates the merits of the symplectic WD (SWD) and the WD in the LCT domain (WDL), achieving stronger capability in the linear frequency-modulated (LFM) signal frequency rate feature extraction while maintaining the same level of computational complexity. Some essential properties of the SWDL are derived, including marginal distributions, energy conservations, unique reconstruction, Moyal formula, complex conjugate symmetry, time reversal symmetry, scaling property, time translation property, frequency modulation property, and time translation and frequency modulation property. Heisenberg's uncertainty principles of the SWDL are formulated, giving rise to three kinds of lower bounds attainable respectively by Gaussian enveloped complex exponential signal, Gaussian signal and Gaussian enveloped chirp signal. The optimal symplectic matrices corresponding to the highest time-frequency resolution are generated by solving the lower bound optimization (minimization) problem. The time-frequency resolution of the SWDL is compared with those of the SWD and WDL to demonstrate its superiority in LFM signals time-frequency energy concentration. A synthesis example is also carried out to verify the feasibility and reliability of the theoretical analysis.
Abstract:Graph signal processing (GSP) has emerged as a powerful framework for analyzing data on irregular domains. In recent years, many classical techniques in signal processing (SP) have been successfully extended to GSP. Among them, chirp signals play a crucial role in various SP applications. However, graph chirp signals have not been formally defined despite their importance. Here, we define graph chirp signals and establish a comprehensive theoretical framework for their analysis. We propose the graph fractional vertex-frequency energy distribution (GFED), which provides a powerful tool for processing and analyzing graph chirp signals. We introduce the general fractional graph distribution (GFGD), a generalized vertex-frequency distribution, and the reduced interference GFED, which can suppress cross-term interference and enhance signal clarity. Furthermore, we propose a novel method for detecting graph signals through GFED domain filtering, facilitating robust detection and analysis of graph chirp signals in noisy environments. Moreover, this method can be applied to real-world data for denoising more effective than some state-of-the-arts, further demonstrating its practical significance.
Abstract:Vertex-frequency analysis, particularly the windowed graph Fourier transform (WGFT), is a significant challenge in graph signal processing. Tight frame theories is known for its low computational complexity in signal reconstruction, while fractional order methods shine at unveil more detailed structural characteristics of graph signals. In the graph fractional Fourier domain, we introduce multi-windowed graph fractional Fourier frames (MWGFRFF) to facilitate the construction of tight frames. This leads to developing the multi-windowed graph fractional Fourier transform (MWGFRFT), enabling novel vertex-frequency analysis methods. A reconstruction formula is derived, along with results concerning dual and tight frames. To enhance computational efficiency, a fast MWGFRFT (FMWGFRFT) algorithm is proposed. Furthermore, we define shift multi-windowed graph fractional Fourier frames (SMWGFRFF) and their associated transform (SMWGFRFT), exploring their dual and tight frames. Experimental results indicate that FMWGFRFT and SMWGFRFT excel in extracting vertex-frequency features in the graph fractional Fourier domain, with their combined use optimizing analytical performance. Applications in signal anomaly detection demonstrate the advantages of FMWGFRFT.
Abstract:AI-driven video generation techniques have made significant progress in recent years. However, AI-generated videos (AGVs) involving human activities often exhibit substantial visual and semantic distortions, hindering the practical application of video generation technologies in real-world scenarios. To address this challenge, we conduct a pioneering study on human activity AGV quality assessment, focusing on visual quality evaluation and the identification of semantic distortions. First, we construct the AI-Generated Human activity Video Quality Assessment (Human-AGVQA) dataset, consisting of 3,200 AGVs derived from 8 popular text-to-video (T2V) models using 400 text prompts that describe diverse human activities. We conduct a subjective study to evaluate the human appearance quality, action continuity quality, and overall video quality of AGVs, and identify semantic issues of human body parts. Based on Human-AGVQA, we benchmark the performance of T2V models and analyze their strengths and weaknesses in generating different categories of human activities. Second, we develop an objective evaluation metric, named AI-Generated Human activity Video Quality metric (GHVQ), to automatically analyze the quality of human activity AGVs. GHVQ systematically extracts human-focused quality features, AI-generated content-aware quality features, and temporal continuity features, making it a comprehensive and explainable quality metric for human activity AGVs. The extensive experimental results show that GHVQ outperforms existing quality metrics on the Human-AGVQA dataset by a large margin, demonstrating its efficacy in assessing the quality of human activity AGVs. The Human-AGVQA dataset and GHVQ metric will be released in public at https://github.com/zczhang-sjtu/GHVQ.git